#BigQuerydata | Explore Tumblr posts and blogs

govindhtech · 1 month ago

Text

BigQuery Data Products: Create, Utilize And Shares Your Data

BigQuery Products

To address that issue, Google Cloud Next unveiled BigQuery data solutions for testing.

By treating data as the product, BigQuery helps you organise, distribute, and exploit your most valuable asset. A ‘Customer Sales’ data product combines regional sales data with customer order data in a carefully selected BigQuery view. Sales Analytics provides a single point of contact, freshness assurances, and business context for campaign analysis as the data product owner. With context and guarantees, data consumers may utilise this data offering to make well-informed client sales decisions.

BigQuery data products simplify data producer-consumer transactions by allowing data producers to bundle one or more BigQuery tables or views that meet a use case and provide them as logical blocks. Although BigQuery currently offers a strong way to share data through datasets listed in data exchanges, its data products provide more abstraction and insight on the business case they address. BigQuery data packages allow consumers locate relevant information in one consumable unit.

A data product helps data producers manage their data as a product, including:

Build for use cases: Determine the client and use case, then utilise one or more resources to produce a data product that solves it.

To retain accountability and client confidence, identify the data product's owner and contact information.

To democratise context, provide backstory on the product's problems, examples of use, and expectations.

Streamline contracts: Allow data consumers to note data quality and freshness to boost confidence and speed understanding.

Manage assets: Control who can view the data product.

Data discovery: Allow data users to easily search for BigQuery data products.

Data distribution: Distribute data to the public through a data exchange or commercial consortiums.

Iterate and enhance products to meet client needs.

Data producers that build use case-based tools and treat data like a product may help data teams work better. This includes:

By establishing standardised and reusable BigQuery data products, data teams may avoid replicating the same datasets or procedures for different clients or objectives. This frees up their time and resources.

Data teams may better align their efforts with business goals by prioritising their work based on the impact and value of each data product.

By tracking data product usage, data teams can better assess and communicate their efforts to the organisation.

Future data solutions might incorporate governance standards and compliance procedures to ensure data is handled appropriately and consistently.

These reduce the effort needed to find the right asset, making data consumers more efficient. Data consumers get faster insight since everyone in the firm can search, browse, and subscribe to BigQuery data products. Clear, trustworthy, and well-documented data makes it easier to identify the right data for a use case, which increases confidence.

BigQuery data solutions give controls and building elements for product data management. Businesses benefit from business-outcome-driven data management because data consumers get insights quicker.

Will you explore your data's untapped potential? Register here for an exploratory look.

#BigQueryDataProducts #BigQuery #BigQuerydata #GoogleCloud #News #Technews #Technology #Technologynews #Technologytrends #govindhtech

0 notes

govindhtech · 8 months ago

Text

AI-assisted BigQuery Data Preparation Is Now In Preview

A preview of BigQuery data preparation is now available.

What is Data preparation?

The process of cleaning and converting unprocessed data so that it can be used for additional processing and analysis is known as data preparation. As it guarantees that the data is correct, consistent, and usable, it is an essential stage in any data-driven endeavor.

The capacity to effectively convert unprocessed data into useful insights is critical in today’s data-driven environment. Data cleaning and preparation, however, can frequently be very difficult.

Reducing this time and effectively turning unprocessed data into insights is essential to maintaining competitiveness. Google Cloud unveiled BigQuery data preparation earlier this month as part of Gemini in BigQuery, an AI-first solution that simplifies and expedites the data preparation process.

BigQuery data preparation now in preview offers several features:

AI-powered recommendations: Gemini in BigQuery is used for data preparation, analyzing your data and schema to generate intelligent recommendations for data enrichment, transformation, and cleansing. As a result, the time and effort needed for manual data preparation chores is greatly decreased.

Cleaning and standardizing data: You may quickly find and fix formatting mistakes, missing values, and discrepancies in your data.

Data pipelines that are visual: Using BigQuery’s powerful and extendable SQL capabilities and designing complicated data pipelines is made simple for both technical and non-technical users by the user-friendly, low-code visual interface.

Data pipeline orchestration: Automate your data pipelines’ execution and oversight. You may use CI/CD to install and orchestrate a Dataform data engineering pipeline that includes the SQL produced by BigQuery data preparation, resulting in a collaborative development experience.

You may make better business decisions by ensuring the accuracy and dependability of your data with BigQuery data preparation. A consistent and scalable environment for your data needs is provided by BigQuery data preparation, which automates data quality checks and interfaces with other Google Cloud services like Dataform and Cloud Storage.

Data Preparation process

It’s simple to get going. In order to create data preparation recommendations, such as filter and transformation suggestions, when you sample a BigQuery table in BigQuery data preparation, it employs cutting-edge foundation models to assess the data and schema utilizing Gemini in BigQuery. For instance, it can quickly speed up the data engineering process by determining which columns can serve as join keys and which date formats are acceptable per nation.

Two distinct date formats are included in the Birthdate column of type STRING in the example above (which uses synthetic data). “Convert column Birthdate from type string to date with the following format(s): ‘%Y-%m-%d’,’%m/%d/%Y,” is the recommendation for BigQuery data preparation. The converted preview data can be checked in a DATE format column after applying the suggestion card.

BigQuery’s AI-powered data preparation allows you to:

Reduce the amount of time spent identifying and cleaning data quality concerns by using Gemini-assisted recommendation cards.

Use the data grid to create your own personalized suggestion cards by giving an example.

Use incremental data processing in conjunction with data preparation to boost operational efficiency.

Customer feedback on BigQuery

Numerous problems are already being resolved by customers using BigQuery data preparation.

In order to build data transformation pipelines on BigQuery, GAF, a significant roofing material company in North America, is implementing data preparation.

mCloud technologies assist companies in industries such as manufacturing, energy, and buildings in maximizing the sustainability, dependability, and performance of their assets.

A combined venture between two German public broadcasting organizations (ARD) is called Public Value Technologies.

Starting out

With its robust artificial intelligence capabilities, user-friendly interface, and close connection with the Google Cloud ecosystem, BigQuery data preparation is poised to transform how businesses handle and prepare their data. The time you spend preparing data decreases and your productivity increases with this creative solution that automates time-consuming procedures, enhances data quality, and empowers users.

Read more on Govindhtech.com

#BigQuery #Datapreparation #AI #SQL #CloudStorage #Gemini #BigQuerydata #CloudComputing #News #Technews #Technology #Technologynews #Technologytrends #govindhtech

0 notes

govindhtech · 1 year ago

Text

How BigQuery Data Canvas Makes AI-Powered Insights Easy

A Gemini feature in BigQuery, the BigQuery Studio data canvas provides a graphical interface for analysis processes and natural language prompts for finding, transforming, querying, and visualising data.

A directed acyclic graph (DAG) is used by BigQuery data canvas for analysis workflows, giving you a graphical representation of your workflow. Working with many branches of inquiry in one location and iterating on query results are both possible with BigQuery data canvas.

BigQuery data canvas

The BigQuery data canvas is intended to support you on your path from data to insights. Working with data doesn’t require technical expertise of particular products or technologies. Using natural language, BigQuery data canvas and Dataplex metadata combine to find relevant tables.

Gemini in BigQuery is used by BigQuery data canvas to locate your data, build charts, create SQL, and create data summaries.

Capabilities

BigQuery data canvas lets you do the following:

Use keyword search syntax along with Dataplex metadata to find assets such as tables, views, or materialized views.

Use natural language for basic SQL queries such as the following:

Queries that contain FROM clauses, math functions, arrays, and structs.

JOIN operations for two tables.

Visualize data by using the following types graphic types:

Bar chart

Heat map

Line graph

Pie chart

Scatter chart

Create custom visualizations by using natural language to describe what you want.

Automate data insights.

Limitations

Natural language commands might not work well with the following:

BigQuery ML

Apache Spark

Object tables

BigLake

INFORMATION_SCHEMA views

JSON

Nested and repeated fields

Complex functions and data types such as DATETIME and TIMEZONE

Data visualizations don’t work with geomap charts.

A ground-breaking data analytics tool, BigQuery data canvas, a Gemini in BigQuery feature, streamlines the whole data analysis process from data preparation and discovery to analysis, visualisation, and collaboration – all in one location, all within BigQuery. You may ask questions in both plain English and a variety of other languages about your data using the BigQuery data canvas, which makes use of natural language processing.

Because sophisticated SQL queries don’t need to be developed using this easy method, data analysis is now accessible to both technical and non-technical people. You may examine, modify, and display your BigQuery data using data canvas without ever leaving the environment in which it is stored.

This blog post provides a technical walkthrough of a real-world scenario utilising the public github_repos dataset, along with an overview of BigQuery data canvas. Over 3TB of activity from 3M+ open-source repositories are included in this dataset. We’ll look at how to respond to inquiries like:

In a year, how many commits were made to a particular repository?

In a particular year, who authored the most repositories?

Over time, how many non-authored commits were applied?

Which users, at what time, contributed to a certain file?

You’ll see how data canvas manages intricate SQL operations from your natural language prompts, such as joining tables, extracting particular data items, unnesting fields, and converting timestamps. We’ll even show you how to use just one click to create intelligent summaries and visualisations.

BigQuery data canvas quickly overview

BigQuery data canvas is mostly used for three types of tasks: finding data, generating SQL, and generating insights.Image credit to Google Cloud

Find Data

To locate data in BigQuery using a rapid keyword search or a natural language text prompt, use data canvas.

Generate SQL

Additionally, you may use the BigQuery data canvas to have SQL code written for you using natural language prompts powered by Gemini.

Create Insights

At last, use a single click to uncover insights concealed within your data! Gemini creates visualisations for you automatically so you can see the story your data is telling.

Using the BigQuery data canvas

Let’s look at an example to help you better understand the potential impact that the BigQuery data canvas can have in your company. Businesses of all kinds, from big corporations to tiny startups, can gain from having a better grasp of the productivity of their development staff. Google Cloud will demonstrate in this in-depth technical tutorial how to leverage data canvas and the public dataset github_repos to provide insightful results in a shared workspace.

You’ll learn how data canvas simplifies the creation of sophisticated SQL queries by working through this example, which demonstrates how to create joins and unnested columns, convert timestamps, extract the month and year from date fields, and more. Gemini’s features make it simple to create these queries and use natural language to examine your data with illuminating visualisations.

Please be aware that using any LLM-enabled application successfully requires strong prompt engineering abilities, just like using many of the new AI products and services available today. Many people might believe that large language models (LLMs) aren’t very excellent at producing SQL right out of the box. However, in our experience, Gemini in BigQuery via data canvas may produce sophisticated SQL queries using the context of your data corpus if you use the appropriate prompting mechanisms. It is evident that data canvas uses natural language queries to decide the ordering, grouping, sorting, record count limitation, and SQL structure.

The github_repos dataset, which is 3TB+ in size and can be found in Bigquery Public Datasets, comprises information in numerous tables regarding commits, watch counts, and other activity on 3M+ open-source projects. We want to look at the Google Cloud Platform repository for this example. As always, before you begin, make sure you have the necessary IAM permissions. In addition, make sure you have the necessary rights to access the datasets and data canvas in order to run nodes properly.

Using data canvas makes it simple to explore every table in the github_repos dataset. Here, Google Cloud evaluate schema, details, and preview data in one panel while comparing datasets side by side.Image credit to Google cloud

After choosing your dataset, you can hover over the bottom of the node to branch it to query or join it with another table. The dataset for the following transformation node is shown by arrows. For clarity, you can give each node a name when sharing the canvas. You can delete, debug, duplicate, or run all of the nodes in a series using the options in the upper right corner. Results can be downloaded, and data can be exported to Looker Studio or Sheets. In the navigation panel, you can also inspect the DAG structure, restore previous versions, and rate SQL suggestions.

Google examine four main facets of their data while examining the github_repos dataset. They will attempt to ascertain the following:

1) The total number of commitments made in a single year

2) The quantity of written repos for a specific year

3) The total number of non-authored commits that were applied throughout time

4) Determine how many user commits there have been for a specific file at a specific time.

Utilise BigQuery data canvas to simplify data analysis

It might be challenging to interpret data for a new project or use case when working with large datasets that span multiple disciplines. This procedure can be streamlined by using data canvas. Data canvas helps you work more efficiently and quickly by streamlining data analysis using natural language-based SQL creation and visualisations. It also reduces the need for repetitive queries and lets you plan automatic data refreshes.

Read more on Govindhtech.com

#BigQuery #bigquerydata #BigQuerydatacanvas #Dataplexmetadata #GeminiinBigQuery #generatingSQL #GoogleCloud #LargeLanguageModels #googlecloudplatform #datacanvas #cloudcomputing #news #technews #technology #technologynews #technologytrends #govindhtech

0 notes

govindhtech · 1 year ago

Text

Deutsche Bank’s Serverless Trade Surveillance Data System

Trade Surveillance Regulations

Every bank must meet regulatory standards. An investment bank like Deutsche Bank must recognize and prevent market manipulation and abuse, even though financial regulation is wide. This is trade surveillance.

Deutsche Bank’s Compliance Technology division implements this control function technically. The Compliance Technology team monitors all bank business lines’ transactions by retrieving data from front office operational systems and performing scenario computations. Any worrisome trends trigger an internal alert for a compliance officer to examine and resolve.

Many systems provide input data, but market, trade, and reference data are most important. The team had to duplicate data between, and sometimes inside, several analytical systems to source data for compliance technology applications from front-office systems, which caused data quality and lineage difficulties and architectural complexity. Trade surveillance situations demand a system that can store and interpret massive amounts of data utilizing distributed computation frameworks like Apache Spark.

Innovation in architecture

With its full data analytics ecosystem of products and services, Google Cloud can assist big organizations tackle difficult data processing and sharing challenges. BigQuery, Google Cloud’s serverless data warehouse, and Dataproc, a managed Apache Spark service, can enable data-heavy corporate use cases like trade surveillance.

The Compliance Technology team used Google Cloud managed services in their new trade surveillance architecture. In the new design, operational front-office systems submit data to BigQuery tables. BigQuery now provides trade, market, and reference data to data users like Trade Surveillance. The Compliance Technology team doesn’t need all the front-office data, so they may generate numerous views using simply the input data that contains the necessary information for trade surveillance situations.

In BigQuery, Spark, Dataproc, and other technologies, trade surveillance business logic is executed as data transformations. This business logic detects abnormal trade patterns that indicate market abuse or manipulation. Written to output BigQuery tables, suspicious cases are processed through research and investigation workflows by compliance officers, who detect false positives and file a Suspicious Activity Report to the regulator if the case indicates a compliance violation.

Surveillance alerts are kept to measure detection effectiveness and reduce false positives. Dataproc uses Spark while BigQuery uses SQL for these computations. They are conducted frequently and reported back into trade surveillance scenario execution to enhance monitoring systems. Cloud Composer, a managed Apache Airflow workflow orchestration solution, orchestrates ETL procedures for trade surveillance scenarios and effectiveness calibrations.

The advantages of serverless data architecture

The architecture above indicates that trade monitoring needs several data sources. Using BigQuery to source this data allows Deutsche Bank data users to use it without copying it. By reducing hops, a simpler design enhances data quality and decreases cost.

BigQuery’s lack of instances and clusters eliminates the need to duplicate data. Instead, data consumers may access any table if they have the necessary rights and query the table URI (i.e., the Google Cloud project-id, dataset name, and table name). Thus, users may access the data from their Google Cloud projects without copying and storing it.

The Compliance Technology team needs only query BigQuery views with input data and tables with derived data from compliance-specific ETLs to conduct trade surveillance scenarios. This avoids data duplication, making data more trustworthy and architecture more robust owing to fewer data hops. Above all, this zero-copy strategy lets data consumers in other bank teams besides trade surveillance utilize market, trade, and reference data in BigQuery.

BigQuery has another benefit. ETL orchestration is easy with Apache Airflow’s BigQuery operators since it’s connected with Google Cloud services like Dataproc and Cloud Composer. No data copying is needed to handle BigQuery data with Spark. Instead, an out-of-the-box connection reads data using the BigQuery Storage API, which streams massive amounts of data straight to Dataproc workers in parallel for quick processing.

Finally, BigQuery lets data producers use Google Cloud’s inherent data quality tools like Dataplex automated data quality. This service lets you set criteria for data freshness, correctness, uniqueness, completeness, timeliness, and other aspects and apply them to BigQuery data. This is serverless and automated without infrastructure for rules execution and data quality enforcement. Thus, the Compliance Technology team can guarantee that front-office data meets data quality criteria, giving value to the new architecture.

The new design uses integrated and serverless data analytics tools and managed services from Google Cloud, allowing the Compliance Technology team to concentrate on Trade Surveillance application business logic. Unlike a big, on-premises Hadoop cluster, BigQuery doesn’t need maintenance periods, version updates, upfront sizing, or hardware replacements.

The new architecture’s cost-effectiveness is the last benefit. The architecture uses pay-as-you-go services to let team members concentrate on business-relevant features instead of infrastructure. Compute power is solely used for batch activities like compliance-specific ETLs, trade surveillance scenarios, and effectiveness calibration, rather than 24/7 machine operation. This reduces the cost even more than an always-on, on-prem option.