#create external table in hive | Explore Tumblr posts and blogs

govindhtech · 6 months ago

Text

Dataplex Automatic Discovery & Cataloging For Cloud Storage

Cloud storage data is made accessible for analytics and governance with Dataplex Automatic Discovery.

In a data-driven and AI-driven world, organizations must manage growing amounts of structured and unstructured data. A lot of enterprise data is unused or unreported, called “dark data.” This expansion makes it harder to find relevant data at the correct time. Indeed, a startling 66% of businesses say that at least half of their data fits into this category.

Google Cloud is announcing today that Dataplex, a component of BigQuery’s unified platform for intelligent data to AI governance, will automatically discover and catalog data from Google Cloud Storage to address this difficulty. This potent potential enables organizations to:

Find useful data assets stored in Cloud Storage automatically, encompassing both structured and unstructured material, including files, documents, PDFs, photos, and more.

When data changes, you can maintain schema definitions current with integrated compatibility checks and partition detection to harvest and catalog metadata for your found assets.

With auto-created BigLake, external, or object tables, you can enable analytics for data science and AI use cases at scale without having to duplicate data or build table definitions by hand.

How Dataplex automatic discovery and cataloging works

The following actions are carried out by Dataplex Automatic Discovery and cataloging process:

With the help of the BigQuery Studio UI, CLI, or gcloud, users may customize the discovery scan, which finds and categorizes data assets in your Cloud Storage bucket containing up to millions of files.

Extraction of metadata: From the identified assets, pertinent metadata is taken out, such as partition details and schema definitions.

Database and table creation in BigQuery: BigQuery automatically creates a new dataset with multiple BigLake, external, or object tables (for unstructured data) with precise, current table definitions. These tables will be updated for planned scans as the data in the cloud storage bucket changes.

Preparation for analytics and artificial intelligence: BigQuery and open-source engines like Spark, Hive, and Pig can be used to analyze, process, and conduct data science and AI use cases using the published dataset and tables.

Integration with the Dataplex catalog: Every BigLake table is linked into the Dataplex catalog, which facilitates easy access and search.

Dataplex automatic discovery and cataloging Principal advantages

Organizations can benefit from Dataplex automatic discovery and cataloging capability in many ways:

Increased data visibility: Get a comprehensive grasp of your data and AI resources throughout Google Cloud, doing away with uncertainty and cutting down on the amount of effort spent looking for pertinent information.

Decreased human work: By allowing Dataplex to scan the bucket and generate several BigLake tables that match your data in Cloud Storage, you can reduce the labor and effort required to build table definitions by hand.

Accelerated AI and analytics: Incorporate the found data into your AI and analytics processes to gain insightful knowledge and make well-informed decisions.

Streamlined data access: While preserving the necessary security and control mechanisms, give authorized users simple access to the data they require.

Please refer to Understand your Cloud Storage footprint with AI-powered queries and insights if you are a storage administrator interested in managing your cloud storage and learning more about your whole storage estate.

Realize the potential of your data

Dataplex’s automated finding and cataloging is a big step toward assisting businesses in realizing the full value of their data. Dataplex gives you the confidence to make data-driven decisions by removing the difficulties posed by dark data and offering an extensive, searchable catalog of your Cloud Storage assets.

FAQs

What is “dark data,” and why does it pose a challenge for organizations?

Data that is unused or undetected in an organization’s systems is referred to as “dark data.” It presents a problem since it might impede well-informed decision-making and represents lost chances for insights.

How does Dataplex address the issue of dark data within Google Cloud Storage?

By automatically locating and cataloguing data assets in Google Cloud Storage, Dataplex tackles dark data and makes them transparent and available for analysis.

Read more on Govindhtech.com

#Dataplex #DataplexAutomatic #CloudStorage #AI #cloudcomputing #BigQuery #BigLaketable #News #Technews #Technology #Technologynews #Technologytrends #govindhtech

0 notes

lynnpack · 1 year ago

Text

"Honey Buckets: Sweetening Your Storage Game with Premium Quality Containers"

Honey, the golden elixir of nature, deserves nothing but the best when it comes to storage. In the realm of honey preservation, the unsung heroes are the honey buckets. These unassuming containers play a pivotal role in safeguarding the purity, flavor, and integrity of honey, ensuring it remains a delectable delight from hive to table.

Why Quality Matters

Not all containers are created equal, especially when it comes to honey. Premium honey buckets, crafted from food-grade materials like high-density polyethylene , offer a superior solution for storing this precious liquid. With their impeccable construction and design, these containers protect honey from external factors and maintain its natural goodness for longer periods.

Preserving Nature's Sweetness

The primary function of honey buckets is to preserve the freshness and flavor of honey. Equipped with airtight seals and UV-resistant properties, these containers shield honey from light, moisture, and air, ensuring it retains its distinct taste and nutritional benefits. Say goodbye to premature crystallization and hello to honey that stays as delicious as the day it was harvested.

A Shield Against External Forces

Honey is delicate and vulnerable to external forces like temperature fluctuations and contaminants. Premium honey buckets act as a fortress, providing a protective barrier against these elements. By maintaining a stable environment and minimizing exposure to harmful agents, these containers prolong the shelf life of honey while safeguarding its purity.

Ease of Handling and Storage

Convenience is key when it comes to handling and storing honey. Honey buckets are designed with practicality in mind, featuring sturdy handles for easy transportation and stackable designs for efficient storage. Their smooth interiors not only make cleaning a breeze but also ensure hygienic conditions, essential for preserving the integrity of honey.

Compliance with Food Safety Standards

Quality is non-negotiable, especially when it comes to food safety. Premium honey buckets adhere to stringent food safety regulations, guaranteeing that the honey stored within remains safe for consumption. By choosing reputable suppliers and certified containers, producers demonstrate their commitment to quality assurance and consumer trust.

Meeting Diverse Packaging Needs

Whether it's for retail shelves or industrial production, honey comes in various quantities and packaging preferences. Honey buckets offer versatility, with options available in different sizes and capacities to suit diverse needs. From small artisanal batches to bulk commercial quantities, there's a honey bucket for every requirement.

Embracing Sustainability

In an age of environmental consciousness, sustainability is paramount. Many premium honey buckets are crafted from recyclable materials, minimizing waste and reducing the environmental footprint. By opting for eco-friendly packaging solutions, producers can align with sustainable practices and meet the growing demand for responsible stewardship of resources.

Conclusion: Elevating Your Honey Experience

In the world of honey storage, quality reigns supreme. Honey buckets serve as the guardians of nature's sweetness, ensuring that every drop of honey retains its pure, unadulterated essence. By investing in premium-quality containers, producers not only protect the integrity of their product but also enhance the honey experience for consumers, one sweet moment at a time.

For more info- Visit Website: LynnPack P: 0426 110 671 E: [email protected] Address : 96 Sette Circuit, Pakenham VIC 3810

#Honey Tubs #Honey Buckets For Sale #Honey Buckets

0 notes

milindjagre · 8 years ago

Text

Post 51 | HDPCD | Set Hadoop or Hive Configuration property

Set Hadoop or Hive Configuration property

Hello, everyone. Welcome to the last technical tutorial in the HDPCD certification series.

It’s funny! This beautiful journey is coming to an end.

In the last tutorial, we saw how to sort the output of a Hive query across multiple reducers.

In this tutorial, we are going to see how to set a Hadoop or Hive configuration property.

Let us begin, then.

It is one of the easiest tutorials in this…

View On WordPress

0 notes

thessaliah · 4 years ago

Text

Imaginary Scramble: a lazy collection of plot bullet points underneath the shallow harem pandering

Disclaimer: I didn't like Imaginary Scramble, but I tend to dislike forced harem-like events like this one. Admittedly, I expected something better than this, after hearing it was like Ooku, an event I disliked and had enormous issues with but still had a plot relevance I can't deny with the appearance of Beast III/L. Imaginary Scramble, on the other hand, seems like it doesn't introduce anything new or relevant (other than “evil god” lore which shows they are about accurate to Lovecraft as Greek Pantheon was when Nasu turned them into the parts of an alien Robot Megazord). There are plot hints echoed in the event structure but none of them are new. It was like reinforcing what we already know in a very obnoxious and fanservice way. Most of these was covered with more class and effective parallels in Epic of Remnants (ok, fine, Agartha was an exception in the classy aspect) and previous Lostbelts or events. But I was asked about them a while ago. I'll cover that now, from what I remember. Just keep in mind this is my interpretation, and I haven't re-read (nor I plan to because I disliked the story) so I could commit mistakes or bypass other hints:

First plotpoint: We're trapped in a Wolf Game.

Fate/Requiem collab event involved playing the game, where the objective was to find the "wolf" among the flock (that was Marie Alter in the end). This event was written by someone who wrote a VN version of the Wolf game. This event started to push that strongly from intro with Sion's phrase about how opposing foes might become crewmates to later chapter, with Raikou's warning about how the external forces are a distraction to internal enemies. It did that, but was it necessary? Not at all, when Person of Chaldea already blatantly warns you at least of a person you shouldn't trust in the Storm Border. But it reinforces maybe that the idea of an internal enemy is multilayered:

The most obvious layer is the "traitor/suspicious person plot" brought up by the Chaldea Man, that seems to be, twist pending, Sherlock Holmes. But it doesn't stop there. Like the Wolf Game in Requiem and this event, there could be multiple players that end up being the wolves (Yang Guifei, Clytie=Van Gogh, Hokusai). Sion could very well another suspicious element, or Gran Cavallo, or Munierre, that aren't the Person of Chaldea's concern (if they knew of his presence or not).

The other layer is the general plot: the enemy that was played as an "alien invasion" (an eternal force) seemed to be a mask for Chaldea own resources and members, the Crypters first, and then Olgamarie and possibly her father or their entire family who created Chaldea.

Second plotpoint: The enemy side is not a harmonious hive but rather everyone has their own agenda and will sabotage their peers which gives Chaldea a chance of victory.

In the event, there is an explanation of how some evil gods faction are in harmony and others are in conflict, and even those end up backstabbing each other for resources. The Foreigner in question also goes against them for her own goals. I've spoken of this before, unlike part 1 which had a hive with was 99% after the same goal and smoothly carried it out, part 2 is a cluster of shaky alliances between multiple players with agendas who are wary of each other. In part 1, Goetia was supportive of Tiamat's release, but in part 2, the Beasts regard each other as potential competition and rivals even with the professional contract between two of them. Not just among the Beasts, but also between the Apostles (for example, when Rasputin saves Kadoc after Douman injures him), and between Crypters (Beryl being the best example). And Crypters toward the "God" (Kirschtaria, Beryl, potentially Daybit) or the Apostles (Pepe vs Douman). This inner dispute was able to disrupt Beast VII's optimal manifestation and Kirschtaria's Human Order Reorganization far more than Chaldea's actions. It would have been better for this event to have existed before Atlantis and Olympus, because it seems superfluous to remind us the enemy has factions and agendas when the main story chapters already highlight them.

Third plotpoint: the nature of reality, dreams, and fiction, and how it connects to the plot.

The story takes place in the Imaginary Numbers Sea, but also within a dream inside it. Guda never left their bed and was asleep all the time as were everyone they saw. One of the most important lines is about how if a real object is in contact with a fictional world, it could absorb it and affect it. And vice-versa. While it could also be a hint about the potential connection of Chaldea's computer simulations, the lostbelts, the Tabula Rasa, Chaldeas, and Specimen E story. I think we don't have the full picture of it to affirm it as a certainty, what we can affirm is how this affected Kirschtaria's body after he 'dreamed' of those simulations where he went through the part 1 Singularities seven times over. He was the 'real' person in the fictional world.

Fourth plotpoint: a hero can't turn away from a pitiful person.

I put "person" instead of girl as the story does, because Guda has tried to help pitiful people regardless of their gender since Goredolf's rescue scene. This has to do with the "hint" of Mash's and Guda's resolution to save Olgamarie, like they tried to save those Foreigners. And in a "direct sequel", of Charlotte Corday in Atlantis and Europa (=Hera) in Olympus. It's possible a similar scene to what happened with Yang (appealing to what lingers of the human self) takes place. This said, I don't think the method of creation of the Foreigners is going to be similar to Beast VII dilemma. Abigail in Salem was a better foreshadowing with her background too of a sheltered miko girl. The foreshadowing exists, but it was redundant like most of the things the event brought to the table.

Fifth point: Patchwork Servants (Phantoms)

Rather than a foreshadowing, this was a continuation of what was introduced in Shinjuku. The ability to mix or combine Saint Graphs, sometimes resulting in hybrids (Hessian-Lobo and Nemo) and sometimes with a more dominant base (the one providing the body) with sprinkles of others (Moriarty, Clytia = Van Gogh). Could this be related to the Person of Chaldea? Perhaps, but he wasn't called a Servant, and I think Carter and Raum, or Surtr and Sigurd cases will serve as a more solid set up. Also because Lev, Goetia, Solomon and Roman should have shared or similar Saint Graphs (like Enkidu and Kingu, or the Oda Sibling connection) to be proper patchwork Servants that look completely unrelated to each other. I don't rule out the possibility, however, I'm looking at this more connected to Sherlock Holmes' secrets because was introduced in the chapter where James Moriarty faced him.

I still think he could be a Beast, and the opposite L or R to the Fox because she acknowledged him in a special way. Going by this event ‘three enemies of the same class’ it could work as: Yang (Holmes) who was on your side to sabotage the other two who were cooperating/sabotaging each other (Tamamo Vitch and U-Olga). Something like a lower scale of what happens in the main story with the Beasts.

#fate grand order #fgo spoilers #lostbelt spoilers #imaginary scramble #meta #speculation

14 notes · View notes

softnquebd · 4 years ago

Text

Complete Flutter and Dart Roadmap 2020

Mohammad Ali Shuvo

Oct 30, 2020·4 min read

DART ROADMAP

Basics

Arrays, Maps

Classes

Play On Dart Compiler

String Interpolation

VARIABLES

var

dynamic

int

String

double

bool

runes

symbols

FINAL AND CONST

differences

const value and const variable

NUMBERS

hex

exponent

parse methods

num methods

math library

STRINGS

methods

interpolation

multi-line string

raw string

LISTS

List (Fixed and Growable)

methods

MAPS

Map (Fixed and Growable)

methods

SETS

Set ((Fixed and Growable)

methods

FUNCTIONS

Function as a variabl

optional and required parameters

fat arrow

named parameters

@required keyword

positional parameters

default parameter values

Function as first-class objects

Anonymous functions

lexical scopes

Lexical closures

OPERATORS

unary postfix expr++ expr — () [] . ?.

unary prefix -expr !expr ~expr ++expr — expr await expr

multiplicative * / % ~/

additive + -

shift << >> >>>

bitwise AND &

bitwise XOR ^

bitwise OR |

relational and type test >= > <= < as is is!

equality == !=

logical AND &&

logical OR ||

if null ??

conditional expr1 ? expr2 : expr3

cascade ..

assignment = *= /= += -= &= ^= etc.

CONTROL FLOW STATEMENTS

if and else

for loops

while and do-while

break and continue

switch and case

assert

EXCEPTIONS (ALL ARE UNCHECKED)

Throw

Catch

rethrow

finally

CLASSES

Class members

Constructors

Getting object type

instance variables

getters and setters

Named constructors

Initializer lists

Constant constructors

Redirecting constructors

Factory constructors

instance methods

abstract methods

abstract classes

Inheritance

Overriding

Overriding operators

noSuchMethod()

Extension methods

Enums

Mixins (on keyword in mixins)

Static keyword, static variables and methods

GENERICS

Restricting the parameterized type

Using generic methods

LIBRARIES AND VISIBILITY

import

show

hide

deferred

ASYNCHRONY SUPPORT

Futures

await

async

Streams

Stream methods

OTHER IMPORTANT TOPICS

Widget Tree, Element Tree and Render Tree

App Lifecycle

Dynamic Theming

Flare

Overlay widget

Visibility Widget

Spacer Widget

Universal error

Search Layout

CustomPainter

WidgetsBindingObserver

RouteObserver

SystemChrome

Internet connectivity

Http Interceptor

Google Map

Firebase Auth

Cloud FireStore DB

Real time DB

File/Image Upload

Firebase database

Firestore

Semantic versioning

Finding size and position of widget using RenderObject

Building release APK

Publishing APK on Play Store

RxDart

USEFUL TOOLS

Dev Tools

Observatory

Git and GitHub

Basics

Add ,Commit

Push

Pull

Github,Gitlab And Bitbucket

Learn How to Become UI Pro

Recreate Apps

Animations

Dribble -App Ui

Make Custom Widgets

Native Components

Native Share

Permissions

Local Storage

Bluetooth

WIFI

IR Sensor

API -REST/GRAPH

Consume API

Basics of Web Dev

Server

TESTING AND DEBUGGING

Debugging

Unit Testing

UI (Widget) Testing

Integration Testing

WRITING CUSTOM PLATFORM-SPECIFIC CODE

Platform Channel

Conclusion: There are some courses out there but I believe self-learning is the best. However, you can take help whenever you feel like it. Continue Your Journey By making Apps and also You can clone the existing apps for learning the concept more clearly like Ecommerce , Instagram , Expense Manager , Messenger ,bla bla …….

Most important thing to remember that don’t depend on others too much , when you face any problem just google it and a large flutter community is always with you.

Best of luck for your Flutter journey

Get Ready and Go………..

1 note · View note

yahoodevelopers · 5 years ago

Text

Data Disposal - Open Source Java-based Big Data Retention Tool

By Sam Groth, Senior Software Engineer, Verizon Media

Do you have data in Apache Hadoop using Apache HDFS that is made available with Apache Hive? Do you spend too much time manually cleaning old data or maintaining multiple scripts? In this post, we will share why we created and open sourced the Data Disposal tool, as well as, how you can use it.

Data retention is the process of keeping useful data and deleting data that may no longer be proper to store. Why delete data? It could be too old, consume too much space, or be subject to legal retention requirements to purge data within a certain time period of acquisition.

Retention tools generally handle deleting data entities (such as files, partitions, etc.) based on: duration, granularity, or date format.

Duration: The length of time before the current date. For example, 1 week, 1 month, etc.

Granularity: The frequency that the entity is generated. Some entities like a dataset may generate new content every hour and store this in a directory partitioned by date.

Date Format: Data is generally partitioned by a date so the format of the date needs to be used in order to find all relevant entities.

Introducing Data Disposal

We found many of the existing tools we looked at lacked critical features we needed, such as configurable date format for parsing from the directory path or partition of the data and extensible code base for meeting the current, as well as, future requirements. Each tool was also built for retention with a specific system like Apache Hive or Apache HDFS instead of providing a generic tool. This inspired us to create Data Disposal.

The Data Disposal tool currently supports the two main use cases discussed below but the interface is extensible to any other data stores in your use case.

File retention on the Apache HDFS.

Partition retention on Apache Hive tables.

Disposal Process

The basic process for disposal is 3 steps:

Read the provided yaml config files.

Run Apache Hive Disposal for all Hive config entries.

Run Apache HDFS Disposal for all HDFS config entries.

The order of the disposals is significant in that if Apache HDFS disposal ran first, it would be possible for queries to Apache Hive to have missing data partitions.

Key Features

The interface and functionality is coded in Java using Apache HDFS Java API and Apache Hive HCatClient API.

Yaml config provides a clean interface to create and maintain your retention process.

Flexible date formatting using Java's SimpleDateFormat when the date is stored in an Apache HDFS file path or in an Apache Hive partition key.

Flexible granularity using Java's ChronoUnit.

Ability to schedule with your preferred scheduler.

The current use cases all use Screwdriver, which is an open source build platform designed for continuous delivery, but using other schedulers like cron, Apache Oozie, Apache Airflow, or a different scheduler would be fine.

Future Enhancements

We look forward to making the following enhancements:

Retention for other data stores based on your requirements.

Support for file retention when configuring Apache Hive retention on external tables.

Any other requirements you may have.

Contributions are welcome! The Data team located in Champaign, Illinois, is always excited to accept external contributions. Please file an issue to discuss your requirements.

#open source #big data

2 notes · View notes

bigdataschool-moscow · 2 years ago

Link

#BigData #Hadoop #Hive #NoSQL #SQL #Большиеданные #обработкаданных

0 notes

nivi13 · 2 years ago

Text

What is azure data factory?

Data-driven cloud workflows for orchestrating and automating data movement and transformation can be created with Azure Data Factory, a cloud-based data integration service.

ADF itself does not store any data. Data-driven workflows can be created to coordinate the movement of data between supported data stores and then processed using compute services in other regions or an on-premise environment.

It also lets you use both UI and programmatic mechanisms to monitor and manage workflows.

What is Azure Data Factory's operation?

Data pipelines that move and transform data can be created with the Data Factory service and run on a predetermined schedule (hourly, daily, weekly, etc.).

This indicates that workflows consume and produce time-sliced data, and the pipeline mode can be scheduled (once per day) or one time.

There are typically three steps in data-driven workflows called Azure Data Factory pipelines.

Step 1: Connect and CollectConnect to all of the necessary processing and data sources, including file shares, FTP, SaaS services, and web services.

Use the Copy Activity in a data pipeline to move data from both on-premise and cloud source data stores to a centralization data store in the cloud for further analysis, then move the data as needed to a centralized location for processing.

Step 2: Transform and Enrich Once data is stored in a cloud-based centralized data store, compute services like HDInsight Hadoop, Spark, Azure Data Lake Analytics, and Machine Learning are used to transform it.

Step 3: PublishDeliver transferred data from the cloud to on-premise sources like SQL Server or stored it in cloud storage for BI and analytics tools and other applications to use.

Important components of Azure Data Factory

In order to define input and output data, processing events, and the timetable and resources required to carry out the desired data flow, Azure Data Factory consists of four key components that collaborate with one another: Within the data stores, data structures are represented by datasets.

The input for an activity in the pipeline is represented by an input dataset. The activity's output is represented by an output dataset. An Azure Blob dataset, for instance, specifies the Azure Blob Storage folder and blob container from which the pipeline should read data.

Or, the table to which the activity writes the output data is specified in an Azure SQL Table dataset. A collection of tasks is called a pipeline.

They are used to organize activities into a unit that completes a task when used together. There may be one or more pipelines in a data factory.

For instance, a pipeline could contain a gathering of exercises that ingests information from a Purplish blue mass and afterward runs a Hive question on an HDInsight bunch to segment the information.

The actions you take with your data are referred to as activities. At the moment, two kinds of activities are supported by Azure Data Factory data transformation and movement.

The information required for Azure Data Factory to connect to external resources is defined by linked services. To connect to the Azure Storage account, for instance, a connection string is specified by the Azure Storage-linked service.

Integration Runtime

The interface between ADF and the actual data or compute resources you require is provided by an integration runtime.ADF can communicate with native Azure resources like an Azure Data Lake or Databricks if you use it to marshal them.

There is no need to set up or configure anything; all you have to do is make use of the integrated integration runtime.

But suppose you want ADF to work with computers and data on your company's private network or data stored on an Oracle Database server under your desk.

In these situations, you must use a self-hosted integration runtime to set up the gateway.

The integrated integration runtime is depicted in this screenshot. When you access native Azure resources, it is always present and comes pre-installed.

Linked Service

A linked service instructs ADF on how to view the specific computers or data you want to work on. You must create a linked service for each Azure storage account and include access credentials in order to access it. You need to create a second linked service in order to read or write to another storage account. Your linked service will specify the Azure subscription, server name, database name, and credentials to enable ADF to operate on an Azure SQL database.

I hope that my article was beneficial to you. To learn more, click the link here

0 notes

milindjagre · 8 years ago

Text

Post 50 | HDPCD | Order Hive query output across multiple reducers

Order Hive query output across multiple reducers

Hello, everyone. Welcome to one more tutorial in the HDPCD certification series.

In the last tutorial, we saw how to enable vectorization in Hive.

In this tutorial, we are going to see how to run a subquery within a Hive query.

Let us begin, then.

The following infographics show the step-by-step process of performing this operation.

Apache Hive: Ordering output across multiple reducers

From the…

View On WordPress

0 notes

nitendratech · 4 years ago

Text

Types of Table in Apache Hive

Types of Table in Apache Hive. #hive #bigdata #hdfs #data #warehouse

Apache Hive has mainly two types of tables : Managed and External table. Managed Table: When hive creates managed(default) tables, it follows the “schema on read” principle and loads the complete file as it is, without any parsing or modification to the Hive data warehouse directory. And its schema information would be saved in hive metastore for later operational use. When we drop an internal…

View On WordPress

#Hadoop #Hive #SQL

0 notes

bnhco · 4 years ago

Text

The Creative + The Chaos

FINDING BALANCE IN THE HUSTLE AND BUSTLE OF MODERN LIFE, AND PRIORITIZING ARTISTIC PURSUITS

Creative personalities certainly prefer to spend precious time pursuing passions and mastering our craft, than to be busied with the likes of cleaning and organization. Meanwhile, the supposed real-world tasks may continue to pile up around us, and that’s ok! Finding a good balance in our daily living is significant to the creative process.

I hope to highlight a few general areas of the workflow to explore, where we may be able to tighten up our practice as artists and creators. This is where my process is currently, but even the process itself may change later on. I may have to adapt to my personal needs from day to day. Adjustments are always good to keep your awareness and skillset agile. I accept every step of the journey. I strongly encourage everyone to seek out and tune in to a system that flows best with your personal rhythm.

“A journey of a thousand miles begins with a single step…” ~ Tao Te Ching, Lao Tzu

A starting point, with some semblance of a finish line is most helpful.

We must begin with the intent to win and commit to it.

DESIGNATE : SPACE

Do you have a designated space in your home where you can comfortably create? A craft corner? Garage workshop? Art studio? Is it truly your own or shared space?

There is a red wooden desk in our living room with a computer hooked up to it that mostly my son uses for online school distance learning. It used to be my desk, but I had to relocate. I found a new-to-me refurbished antique roll-top desk and set it up in the large back room of the house, with enough space to share it with the washer, dryer and folding table… and a mountain of laundry piled high. I hung an opaque curtain to divide the room in half and voila, I’ve made a private corner for myself. Works for me!

It’s important to claim a creative space as your own, so you know there is a spot just for you to freely work your magic, like a wizard behind the curtain.

Do you have a lot of clutter to dig through, burying all your supplies, making it difficult to get into a steady rhythm of productivity? How could you create an environment that is most conducive to your style or creativity? What does that look like for you?

“There’s a method to the madness, I swear!”

~ Famous Line by Any/All of Us

Let the clutter eliminate itself. The best way to get rid of the trinkets and nonsense items unnecessarily laying around, is to envision how you want your creation station set up, then arrange it as such! Think of it like staging and propping up a showroom. You are essentially creating it. You will remove the articles and particles that no longer serve a purpose in your creative space. It may be difficult to let go of the knick-knacks and bric-a-brac, I know, because I am totally guilty of possessing so many trinket treasures! Items that can be re-homed would be happily accepted at your local donation centers. It really is a good idea to refresh and tidy up your space from time to time. Reset it. Doing so clears up any dense or stagnant energy, and helps to keep the flow moving. You might even catch a glimpse of inspiration coming in.

Everything in its right place…

When you have your own creation station set up, it provides a sense of ease. It doesn’t have to be spotless and perfect, that’s not the true aim. Lived-in is still a good status.

Having all the conditions to be right or ripe is not necessary to begin creating.

If the sparks of ideas and inspiration are shooting fireworks for you, fly with it!

No question, just go!

Let’s consider that we are creating our inner landscape and mirroring that internal process outwardly to our external spheres. Whatever is going on within ourselves becomes what is projected out to the world. Why not try it the other way around? Meaning we could even try making adjustments or changing our physical surroundings — our home, office, studio space — to see if that has a good influence on our mental clarity and focus. I believe so. Finding this crossing where the mental and physical spaces meet is key in keeping a balance in all our activities. It’s a point of calibration. Be in your center, spruce it up, move things around, Fengh Shui and enjoy creating that designated space!

DEDICATE : TIME

Ask yourself if you are truly committed to honing your craft. Have you allotted the time slots in your schedule to fit your practice? Do you engage in collaborative conversations with peers, other artists? Are you dedicated to investing in yourself? What are the barriers you believe are holding you back?

If you are a dancer, dance hard! If you are a painter, splash paint! A singer, sing your heart out!

As individual artists it is important to take the time to check-in with ourselves and reflect on how we value our own work, which ultimately is most important. If thoughts of doubt or uncertainty come into the frame, it would behoove you to examine why and where that perspective could stem from. We are our personal best worst critics after all. Even so, it is good practice to assess our creations with healthy feedback.

“As iron sharpens iron, so one person sharpens another.” ~ Book of Proverbs

Being within a community of artists would certainly be valuable in gaining more insight on different disciplines, processes, and pure exposure to what wonders we all create for the world. At first it may be intimidating to open up to a new network of people for fear of judgement. Though when you do find that circle that is warm, welcoming, and feels right, the set becomes fertile soil for the artist to be able to root down, grow and eventually blossom into their own. It’s beautiful when the vibes are tuned in harmony and the hive mind arises.

How can we maximize the hours of the day to make the most with our creativity?

So the dirty dishes in the sink begin to rink a stink. The laundry is a mountain to sort through, or you are totally out of undies for today. Way to go, commando!

Of course, we would rather spend our free time doing all the things that light us up, as we damn well please and should. For some, maybe the demanding day job gets in the way. Others, a full family schedule with children, parents and partners to take care of. Or other obligations, what have you. Option D: All of the above…

We each have unique stations in life that call us to duty. It is understandable how this may lead to seemingly less and less time to be able to dig our hands deep in our creative flow. However, it isn’t impossible to accomplish all that we desire to do.

Carve out the time. Look over your calendar, morning, noon, night, anywhere in between, and work in time to practice, even when you feel uninspired or unmotivated. Build the muscle memory needed to advance your skills. This applies in any practice. All the great masters did not attain their levels without putting forth the effort and energy. Forming good habits will carry you and your craft forward and up to the next degree. Here I am stretching my rusty writing muscles to see what my baseline is at this phase. It’s been a while. After long periods of not using muscle groups, they will begin to atrophy and waste. Get the motion going and the circulation flowing.

Start at any point… the point is just start.

DRIVE : FAR OUT

“I AM THE VEHICLE FOR CHANGE.” ~ Me/YouDRIVE!

The open road is calling and ready for new adventures to be created!

…

This part is entirely up to YOU.

Your Art. …

Seeing beauty in every moment of creation is the essence of why art exists.

To authentically embody and express through art form is the pinnacle.

I desire to capture those moments caught in my perception, so I can feel like I am holding on to life much longer than it takes for it to dissipate through my senses. I aim to turn around and translate it with the tools I have on hand, hoping another being will see what I see. It is certainly worth all the trials.

Keep a journal. Document. Photograph. Record it. Commit it to memory.

There is an infinite supply of good ideas floating in the collective ether. When inspiration lands in our midst, it would be wise to court it with intent to bring the fantastic idea to life.

Connect…

DESIGNATE : SPACE DEDICATE : TIME DRIVE : FAR OUT

Sending us off with good intent, that we find our groove again. May this be a space of inspiration, growth and development, and collective regeneration. Thank You for Being Here. Peace + Love. rjx

0 notes

awsexchage · 5 years ago

Photo

Amazon Athenaでパーティション数が多いJSONのテーブルをParquet形式のテーブルに変換できずにハマった https://ift.tt/2TvWDEe

Amazon Athenaを利用してS3バケットにあるJSONファイルをParquet形式に変換するときにHIVE_TOO_MANY_OPEN_PARTITIONSというエラーが発生したので原因調査して対策を考えてみました。

Parquet形式とは

なんぞ？という方は下記が参考になると思います。

カラムナフォーマットのきほん〜データウェアハウスを支える技術〜 – Retty Tech Blog https://engineer.retty.me/entry/columnar-storage-format

Amazon Athena: カラムナフォーマット『Parquet』でクエリを試してみた #reinvent ｜ Developers.IO https://dev.classmethod.jp/cloud/aws/amazon-athena-using-parquet/

Apache Parquet https://parquet.apache.org/documentation/latest/

データを列指向フォーマットにすることで、クエリ実行時のデータ読み込みサイズを抑えてコスト削減できて(ﾟдﾟ)ｳﾏｰとなります。

Parquetはパーケイと読むそうです。(未だに読めないorz

Spark Meetup 2015 で SparkR について発表しました #sparkjp – ほくそ笑む https://hoxo-m.hatenablog.com/entry/20150910/p1

再現手順

エラーを再現させて対策する手順となります。

下準備

S3バケットを用意してJSONファイルをアップロードします。

# バケット作成 > aws s3 mb s3://<S3バケット名>/ \ --region <YOUR RIGION> make_bucket: <S3バケット名> # JSONファイル作成 > cat <<EOF > example-001.json {"hoge1": 1, "hoge2": 11,"hoge3": 111} EOF > ls example-001.json # S3バケットにコピー > aws s3 cp example-001.json s3://<S3バケット名>/json/test=001/ upload: ./example-001.json to s3://<S3バケット名>/json/test=001/example-001.json # S3バケットにたくさんコピー > for i in {002..200} ; do aws s3 cp s3://<S3バケット名>/json/test=001/example-001.json s3://<S3バケット名>/json/test=$(printf '%03d' $i)/example-$(printf '%03d' $i).json; done copy: s3://<S3バケット名>/json/test=001/example-001.json to s3://<S3バケット名>/json/test=002/example-002.json copy: s3://<S3バケット名>/json/test=001/example-001.json to s3://<S3バケット名>/json/test=003/example-003.json (略) copy: s3://<S3バケット名>/json/test=001/example-001.json to s3://<S3バケット名>/json/test=198/example-198.json copy: s3://<S3バケット名>/json/test=001/example-001.json to s3://<S3バケット名>/json/test=199/example-199.json copy: s3://<S3バケット名>/json/test=001/example-001.json to s3://<S3バケット名>/json/test=200/example-200.json > aws s3 ls --recursive s3://<S3バケット名>/json/ | wc -l 200

Amazon Athenaでテーブル作成

S3バケットへJSONファイルがアップロードできたらAmazon Athenaでテーブルを作成します。事前にAmazon Athenaでワークグループの設定やクエリ実行結果を保存するS3バケットを指定済みとします。

json/test=xxx/とパーティション区切りしているので、PARTITIONED BYで指定します。

CREATE EXTERNAL TABLE IF NOT EXISTS sampledb.hoge_json ( hoge1 int, hoge2 int, hoge3 int ) PARTITIONED BY ( test string ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://<S3バケット名>/json/';

AWSマネジメントコンソールでクエリを実行して完了すると以下のようなメッセージが表示されるので、パーティションをロードします。

Query successful. If your table has partitions, you need to load these partitions to be able to query data. You can either load all partitions or load them individually. If you use the load all partitions (MSCK REPAIR TABLE) command, partitions must be in a format understood by Hive. Learn more.

MSCK REPAIR TABLE sampledb.hoge_json;

これでS3バケットからデータが読み込めるようになります。

SELECT count(*) FROM sampledb.hoge_json;

Parquet形式に変換する

Amazon AthenaのCTAS（CREATE TABLE AS）で新しいテーブルとデータファイルを作成することができるので、これをJSONからParquet形式への変換に利用します。

Amazon Athena が待望のCTAS（CREATE TABLE AS）をサポートしました！｜ Developers.IO https://dev.classmethod.jp/cloud/aws/amazon-athena-support-ctas/

新しいテーブルhoge_parquetをCREATE TABLE AS SELECTクエリで作成します。 WITHでパーティションやデータ形式、データファイルを保存するS3バケットを指定します。

CREATE TABLE sampledb.hoge_parquet WITH ( partitioned_by = ARRAY['test'], format = 'PARQUET', external_location = 's3://<S3バケット名>/parquet' ) AS SELECT * FROM sampledb.hoge_json;

これを実行するとエラーとなります。

エラー内容

エラー内容は下記となり、1度に開くことができるパーティションは100まで。とのことです。要は1度に作成できるパーティション数は100まで。

テーブルあたりのパーティション数の制限は？

サービス制限 – Amazon Athena https://docs.aws.amazon.com/ja_jp/athena/latest/ug/service-limits.html

AWS Glue データカタログにまだ移行していない場合、テーブルあたりのパーティションの数は 20,000 です。制限の引き上げをリクエストできます。

Amazon Athenaでテーブル作成する場合、AWS Glueと連携しているので、AWS Glueの制限をみるとテーブルあたりのパーティションの数は10,000,000 ！！！とあります。

AWS Glue との統合 – Amazon Athena https://docs.aws.amazon.com/ja_jp/athena/latest/ug/glue-athena.html

AWS Glue がサポートされるリージョンの場合、Athena は AWS アカウント全体のテーブルメタデータの一元的な保存および取得の場所として AWS Glue データカタログを使用します。

AWS サービスの制限 – AWS 全般のリファレンス https://docs.aws.amazon.com/ja_jp/general/latest/gr/aws_service_limits.html#limits_glue

なので、あくまでも1度に作成するパーティション数の上限は100ということみたいです。

エラー詳細

HIVE_TOO_MANY_OPEN_PARTITIONS: Too many open partitions. Maximum number of partitions allowed to write: 100. You may need to manually clean the data at location 's3://<S3バケット名>/athena-results/tables/f15cd9b9-9e96-4f44-9306-a8d9c78895d2' before retrying. Athena will not delete data in your account. This query ran against the "sampledb" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: f15cd9b9-9e96-4f44-9306-a8d9c78895d2.

HIVE_TOO_MANY_OPEN_PARTITIONSをキーワードに情報を探してみましたが、どうやらAmazon Athenaの裏で動いているPrestoというクエリエンジンの制限のようです。

too many open partitions? – Google グループ https://groups.google.com/forum/#!topic/presto-users/5gFbvUoOF5I

パーティションの書き込み時に多くのマシンに分散するように設計されていてhive.max-partitions-per-writerrsって設定でデフォルト100になってる。そうです。

パーティション分割にはHiveを利用しているそうなのでそうなのでしょう。

データのパーティション分割 – Amazon Athena https://docs.aws.amazon.com/ja_jp/athena/latest/ug/partitions.html

データをパーティション分割することで、各クエリでスキャンするデータの量を制限し、パフォーマンスの向上とコストの削減を達成できます。Athena では、データのパーティション分割に Hive を使用します。

クエリ実行はPrestoだそうです。

よくある質問 – Amazon Athena | AWS https://aws.amazon.com/jp/athena/faqs/

Amazon Athena では、標準 SQL をフルサポートした Presto を使用し、CSV、JSON、ORC、Apache Parquet、Avro を含むさまざまな標準データ形式で機能します。

HiveとPrestoの違いについて調べてみた – Qiita https://qiita.com/haramiso/items/122d4ea0e5660e0b4e41

もう少し調べてみたところしっかりと公式ドキュメントにも記載がありました。

CTAS クエリに関する考慮事項と制約事項 – Amazon Athena https://docs.aws.amazon.com/ja_jp/athena/latest/ug/considerations-ctas.html

Athena では、100 個の一意のパーティションとバケットの組み合わせへの書き込みがサポートされます。たとえば、送信先テーブルにバケットが定義されていない場合、最大 100 個のパーティションを指定できます。バケットを 5 個指定すると、(それぞれ 5 個のバケットを持つ) 20 個のパーティションが許可されます。この数を超えると、エラーが発生します。

バケット化というのがまだわかってないので追って調べようかと思います。

対策

作成するパーティション数が100を超える場合はクエリを複数に分けるのがよさそうなので試してみます。

CREATE TABLE AS SELECTは複数回実行できませんので、最近サポートされたINSERT INTO SELECTでパーティション作成されるようにします。

Amazon Athena がついにINSERT INTOをサポートしたので実際に試してみました！｜ Developers.IO https://dev.classmethod.jp/cloud/aws/20190920-amazon-athena-insert-into-support/

CREATE TABLE AS SELECTクエリではlimit 0としてデータ投入されないようにします。

CREATE TABLE sampledb.hoge_parquet WITH ( partitioned_by = ARRAY['test'], format = 'PARQUET', external_location = 's3://<S3バケット名>/parquet' ) AS SELECT * FROM sampledb.hoge_json limit 0;

INSERT INTO SELECTクエリのWHEREで投入するデータを絞り込みます。

INSERT INTO sampledb.hoge_parquet SELECT * FROM sampledb.hoge_json WHERE test BETWEEN '001' AND '100'; INSERT INTO sampledb.hoge_parquet SELECT * FROM sampledb.hoge_json WHERE test BETWEEN '101' AND '200';

SELECT count(*) FROM sampledb.hoge_parquet;

これでパーティション数が100を超える場合にも対処できるようになりました。初期投入時以外は細かく変換して投入するのが良さそうです。

ちなみに

INSERT INTO SELECT クエリでも作成するパーティション数が100を超えるとエラーとなります。

INSERT INTO sampledb.hoge_parquet SELECT * FROM sampledb.hoge_json;

また、INSERT INTO SELECTで同じデータを投入した場合、主キーがないのでエラーもなくデータは重複して投入されるので注意が必要です。

パーティション数の制限値は厳密に100ではなさそう

エラーメッセージに100となるので、それに従っておけばよいわけですが、エラー再現させるのにしきい値を確認してみたら、どうも厳密には100ではなさそうでした。スクリーンショットを取り忘れましたが最大でパーティション数が150でもエラーがでなかったりと。。。裏でスケールアウトして制限値が調整されていたり？？？マネージドなサービスで裏の仕組みはわからないので、おとなしくエラーメッセージに従うのが良さそうです。

参考