#bigqueryml | Explore Tumblr Posts and Blogs

govindhtech · 8 days

Text

BigQuery ML Contribution Analysis Shows Significant Insights

Use contribution analysis in BigQuery ML to discover important insights.

Organizations struggle to interpret data changes as data volumes expand. Organizations struggle to determine the origin of major trends and swings, limiting decision-making. A corporation may question, “What factors drove revenue growth between Q1 and Q2?” or “Why did an advertisement click-through rate decrease 5% over the last week?”

It takes tools to analyze data segments at a time to uncover statistically significant important drivers. Google Cloud is announcing the public preview of contribution analysis in BigQuery ML to help enterprises identify insights and patterns in their data interactively and at scale.

Contribution analysis

Contribution analysis, sometimes referred to as key driver analysis, can help you learn more about how important indicators in your multi-dimensional data have changed. Contribution analysis can be used, for instance, to compare two sets of training data to understand changes in an ML model’s performance or to see changes in revenue figures over the course of two quarters. To establish a contribution analysis model in BigQuery, use the establish MODEL statement.

Augmented analytics, or the application of artificial intelligence (AI) to improve and automate data analysis and comprehension, includes contribution analysis. Finding trends in users’ data is one of the main objectives of augmented analytics, and it helps to achieve this goal.

By contrasting a test set of data with a control set of data, a contribution analysis model finds data points that exhibit statistically significant changes in a metric over time. This lets you track data changes by time, location, customer segment, or other metrics. You can compare a table snapshot from 2023 to 2022 to see how data evolves over two years.

The difference between the test and control data is measured and compared using a metric, which is a numerical number used in contribution analysis models. With contribution analysis models, you can provide a summable ratio metric or a summable metric.

A segment is a portion of the data that is distinguished by a particular set of dimension values. Every possible combination of the store_number, customer_id, and day dimensions, for instance, constitutes a segment in a contribution analysis model.

You can examine relevant metrics from your dataset across specified test and control subgroups by using contribution analysis. It functions by determining which combinations of “contributors” result in unexpected changes, and it grows well by minimizing the search space through pruning optimization. Numerous businesses and use cases can benefit from this kind of examination. Among the instances are:

Telemetry monitoring: Examine variations in occurrences that software programs have logged.

Sales and advertising: Investigate user involvement to adjust campaigns and ads according to click-through rates.

Retail: To maximize stock levels, assess the effects of price adjustments and inventory management techniques.

Healthcare: Look at important variables that affect patients’ health to aid improve prognoses and treatment approaches.

Contribution analysis model in BigQuery ML

How does it operate?

A single table with rows of a control set of baseline data and a test set to compare against the control, a metric to analyze (like revenue), and a list of contributors (like product_sku, category, etc.) are all you need to create a contribution analysis model in BigQuery ML. Next, using a certain combination of contributor values, the model finds significant data slices, or what we call segments.

With contribution analysis, you can examine summable metrics and summable ratio metrics, two distinct categories of metrics. Summable metrics aggregate a single measure of interest, such revenue, to summarize each data segment. indicators known as summable ratios examine the relationship between two important indicators, like earnings per share.

Additionally, contribution analysis models have pruning optimizations by default, allowing you to use the Apriori pruning technique to obtain insights more quickly. The model may quickly identify important segments by narrowing down the search space based on a minimum support value. The size of a segment in relation to the whole population is indicated by the support value. You can focus on the largest segments by excluding segments with low support values, which also shortens the query execution time.

BigQuery now offers Contribution Analysis in preview.

Read more on govindhtech.com

#BigQueryMLContribution #AnalysisShows #ai #GoogleCloud #SignificantInsights #BigQueryML #artificialintelligence #AI #contributionanalysis #BigQuery #technology #technews #news #govindhtech

0 notes

phungthaihy · 4 years

Photo

BigQuery ML: Machine Learning with Standard SQL (AI Adventures) http://ehelpdesk.tk/wp-content/uploads/2020/02/logo-header.png [ad_1] In this episode of AI Adventures... #ai #aiadventures #androiddevelopment #angular #artificialintelligence #bigdata #bigquery #bigqueryml #bqml #c #classification #cloud #cloudandmachinelearning #css #dataanalysis #datascience #deeplearning #development #docker #gcp #gcpmachinelearning #gdsyes #googlecloudplatform #howtousemachinelearning #iosdevelopment #java #javascript #learnmachinelearning #linearregression #machinelearning #machinelearningmodels #machinelearningwithstandardsql #ml #node.js #notebooks #python #react #sql #tensorflow #training #unity #webdevelopment #whatismachinelearning #yufengguo

0 notes

dillten · 6 years

Link

Use #BigQueryML to create a linear #ML regression model in 30 seconds without leaving #BigQuery, allowing you to drive faster decisions at scale → https://t.co/EhVrWrikJt #GoogleNext18 pic.twitter.com/UoSUra7Lt9

— Google Cloud Platform (@GCPcloud) July 27, 2018

#Tweets

0 notes

govindhtech · 3 months

Text

Introducing Datastream GCP’s new Stream Recovery Features

Replication pipelines can break in the complicated and dynamic world of data replication. Restarting replication with little impact on data integrity requires a number of manual procedures that must be completed after determining the cause and timing of the failure.

With the help of Datastream GCP‘s new stream recovery capability, you can immediately resume data replication in scenarios like database failover or extended network outages with little to no data loss.

Think about a financial company that replicates transaction data to BigQuery for analytics using DataStream from their operational database. A planned failover to a replica occurs when there is a hardware failure with the primary database instance. Due to the unavailability of the original source, Datastream’s replication pipeline is malfunctioning. In order to prevent transaction data loss, stream recovery enables replication to continue from the failover database instance.

Consider an online shop that replicates user input to BigQuery for sentiment analysis using BigQuery ML utilising Datastream. An extended network disruption breaks the source database connection. Some of the updates are no longer accessible on the database server by the time network connectivity is restored. Here, the user can rapidly resume replication from the first available log point thanks to stream recovery. For ongoing sentiment analysis and trend identification, the merchant prioritises obtaining the most recent data, even though some feedback may be lost.

The advantages of recovering streams

Benefits of stream recovery include the following

Reduced data loss: Get back data lost by events like unintentional log file deletion and database instance failovers.

Minimise downtime: Get back up and running as soon as possible to resume continuous CDC consumption and swiftly restore your stream.

Recovery made easier: A user-friendly interface makes it simple to retrieve your stream.

Availables of Datastream GCP

Minimise latency in data replication and synchronisation

Reduce the impact on source performance while ensuring reliable, low-latency data synchronization across diverse databases, storage systems, and applications.

Adapt to changing needs using a serverless design

Quickly get up and running with a simple, serverless application that scales up and down without any hassle and requires no infrastructure management.

Google Cloud services offer unparalleled flexibility

Utilise the best Google Cloud services, such as BigQuery, as Spanner, Dataflow, and Data Fusion, to connect and integrate data throughout your company.

Important characteristics

The unique strategy of Datastream GCP

Data streaming via relational databases

Your MySQL, PostgreSQL, AlloyDB, SQL Server, and Oracle databases can be read and updated by Datastream, which then transfers the changes to BigQuery, Cloud SQL, Cloud Storage, and Spanner. It consistently broadcasts every event as it happens and is Google native and agentless. More than 500 trillion events are processed monthly by Datastream.

Robust pipelines with sophisticated recovery

Unexpected disruptions may incur high expenses. You can preserve vital business activities and make defensible decisions based on continuous data pipelines thanks to Datastream GCP‘s strong stream recovery, which reduces downtime and data loss.

Resolution of schema drift

Datastream GCP enables quick and easy resolution of schema drift when source schemas change. Every time a schema is changed, Datastream rotates the files, adding a new file to the target bucket. With a current, versioned Schema Registry, original source data types are only a call away via an API.

Safe by design

To safeguard data while it’s in transit, Datastream GCP offers a variety of private, secure connectivity options. You can relax knowing your data is safe while it streams because it is also encrypted while it is in transit and at rest.

The application of stream recovery

Depending on the particular failure circumstance and the availability of current log files, stream recovery offers you a few options to select from. You have three options when it comes to MySQL and Oracle: stream from the most recent position, skip the current position and stream from the next available position, or retry from the current log position. Additionally, you can provide the stream a precise log position to resume from for example, the Log Sequence Number (LSN) or Change Sequence Number (CSN) giving you more precise control over making sure that no data is lost or duplicated in the destination.

You can tell Datastream to start streaming again from the new replication slot after creating a new one in your PostgreSQL database for PostgreSQL sources.

From a given position, begin a stream

Apart from stream recovery, there are several situations in which you might need to begin or continue a stream from a particular log location. For instance, when the source database is being upgraded or moved, or when historical data from a particular point in time (where the historical data terminates) is already present in the destination and you would like to merge it. In these situations, you can utilise the stream recovery API to set a starting position before initiating the stream.

Get going

For all available Datastream sources across all Google Cloud regions, stream recovery is now widely accessible via the Google Cloud dashboard and API.

Read more on Govindhtech.com

#DatastreamGCP #BigQuery #GoogleCloud #spanner #dataflow #BigQueryML #SQLServer #PostgreSQL #oracle #MySQL #cloudstorage #news #technews #technology #technologynews #technologytrends #govindhtech

1 note · View note

govindhtech · 5 months

Text

Speech to text Model in client reviews using BigQuery ML

Best Speech To Text Models

The integrated speech-to-text feature of BigQuery ML provides an effective means of extracting insightful information from audio data. With this service, audio files like customer review calls are converted from audio to text so they can be analyzed on BigQuery’s powerful data platform. You can uncover consumer sentiment, spot reoccurring product problems, and learn more about your customers’ voices by fusing speech-to-text with BigQuery’s analytics features.

A deeper knowledge of consumer interactions across numerous channels is made possible by BigQuery ML speech-to-text, which converts audio data into actionable insights with potential benefits across sectors.

Speech to text feature

Moreover, BigQuery ML’s native speech-to-text feature may be used to utilize Gemini 1.0 Pro to obtain more insights and data formatting, including entity extraction and sentiment analysis, for text retrieved from audio recordings. Some application cases and their business worth for particular sectors are listed below:IndustryUse CasesBusiness PotentialRetail/E-commerceAnalyzing customer call recordings to identify common pain points, product preferences, and overall sentimentImproved product development by addressing issues mentioned in feedback Enhanced customer service through personalization and targeted assistance Enhanced marketing campaigns based on insights discovered in customer calls.HealthcareTranscribing patient-doctor interactions to automatically populate medical records, summarize diagnoses, and track treatment progressMore streamlined workflows for healthcare providers, reducing administrative burden Comprehensive patient records for better decision-making Potential identification of trends in patient concerns for research and improved careFinanceAnalyzing earnings calls and shareholder meetings to gauge market sentiment, identify potential risks, and extract key insightsSupport for more informed investment decisions Prompt identification of emerging trends or potential issues Proactive Risk Management strategiesMedia & EntertainmentTranscribing podcasts, interviews, and focus groups for content analysis and audience insightsEarlier identification of trending topics and themes for new content creation Understanding audience preferences for program development or advertising Accessibility improvements through automated closed-captioning

By utilizing sophisticated AI functionalities like BigQuery ML, you can leverage all the inherent governance features of BigQuery, including access control passthrough. This allows you to limit insights from client audio files according to row-level security settings on your BigQuery Object Table.

Are you prepared to extract insights from your audio data?

Let’s explore BigQuery’s speech-to-text capabilities:

Speech To Text models

Consider that you have a number of audio recordings of calls you’ve had from customers that are kept in a Google Cloud Storage bucket. These audio recordings can be automatically converted into readable text within BigQuery using the ML.TRANSCRIBE function, which is linked to a pre-trained speech-to-text model hosted on Google’s Vertex AI platform. Consider it as a specialist translator for audio information.

The location of your audio files (in your object table) and the speech-to-text model you want to use are specified to the ML.TRANSCRIBE method. The transcription process is then managed by it, making use of machine learning capabilities, and the text results are sent straight to BigQuery. This facilitates the analysis of corporate data in conjunction with consumer conversations.

Advantages of BigQuery ML’s speech-to-text capabilities:

Efficiency: Compared to manual transcribing, automated transcription saves a great deal of time and money. Scalability: BigQuery is perfect for companies that receive a lot of customer evaluations since it can manage massive amounts of audio data. Expense-effectiveness: Uses Google’s speech-to-text algorithms that have already been trained, saving costly third-party solutions. Actionable Insights: Offers insightful information that may be applied to enhance customer satisfaction and spur company expansion.

Together, let’s navigate the BigQuery procedure:

How to set up:

Choose your Google Cloud project, connect a payment account, and enable the required APIs before you begin (all instructions available here).

Make a recognizer. A recognizer is optional to construct and stores the speech recognition configuration.

Get a service account for the cloud resource connection by creating one; complete instructions are available here.

Follow these instructions to grant access to the service account.

Using the instructions provided, create a dataset containing the object table and the model.

Take a listen and save the audio files to Google Cloud Storage.

Here are five audio tracks that you can download.

Open Google Cloud Storage and create a bucket and folder inside of it.

Place the downloaded audio tracks in the “Follow-up Ideas” folder:

Utilising Gemini 1.0 Pro and BigQuery ML’s ML.generate_text function, take the text retrieved from the audio files and extract desired entity data (e.g., product names, stock prices) and format it in JSON.

Measure the sentiment analysis of the captured text using Gemini 1.0 Pro with BigQuery ML, then organize the good and negative sentiments into JSON.

Integrate verbatim and sentiment scores from customer feedback with the Customer Lifetime Total Value score or other pertinent customer data to observe the relationship between quantitative and qualitative data.

Create embeddings over the text that has been extracted, and then utilize vector search to look for certain information in the audio recordings.

Obtaining client feedback is now simpler than ever in the digital era. But sorting through a tone of audio reviews can be a difficult undertaking. Here’s where BigQuery ML’s speech-to-text functionality shines, providing an effective means of revealing the insightful information concealed in your customers’ audio data.

This is how speech-to-text in BigQuery ML can change customer feedback:

Upload your audio review files to Google Cloud Storage for effortless transcription. With the speech-to-text feature of BigQuery ML’s, users can save numerous hours of human labor by having the audio automatically transcribed into text.

Unlocking Hidden Gems: Users can take advantage of BigQuery’s robust analytics features once the audio has been translated to text.

Assess feedback sentiment, look for reoccurring themes, and group it into categories to learn more about customer satisfaction levels and potential areas for development.

Practical insights: Anyone may spot trends and patterns in client feedback by looking at the transcription of the text. In the end, this serves to enhance the client experience by helping them to priorities problems and effectively resolve concerns.

Data Preparation: Use a Cloud Storage bucket to upload the sound review files (such as.mp3 or.wav).

Speech-to-Text Transcription: To connect to a pre-trained speech-to-text model stored on Vertex AI, use the ML.TRANSCRIBE function in a BigQuery script. This function converts the audio automatically into text and saves it in a BigQuery table.

Analysis of Data: Use the data analysis tools that BigQuery comes with to look through the transcribed text.

One may: Examine sentiment Determine whether the reviews are neutral, positive, or negative.

Take out the terms and phrases: Find out what subjects and themes clients frequently bring up.

Sort and classify comments: Sort the reviews according to a particular product, feature, or problem for more focused study.

Action and Improvement: In light of your study, take appropriate measures to resolve client complaints, enhance the quality of your offering, and raise total client satisfaction.

In summary

The speech-to-text function of BigQuery ML’s provides a strong and affordable means of extracting the insights from the audio evaluations left by your customers. One may learn more about your consumers and take steps to enhance their experience by turning audio into text and using BigQuery’s analytics features.

Read more on govindhtech.com

#BigQuery #BigQueryML #GoogleCloud #GeminiAI #news #TechNews #technology #technologynews #technologytrends #govindhtech

0 notes

govindhtech · 6 months

Text

How BigQuery ML Sorted Palo Alto Networks’ Data

BigQuery ML and Its Applications

Google cloud goal at Palo Alto Networks is to make safe digital transformation possible for everyone. Their expansion through mergers and acquisitions has resulted in a sizable, dispersed organization with numerous engineering teams that work together to produce they well-known goods. On Google Cloud, their teams are working on over 170,000 projects, each with its own resource hierarchy and naming scheme.

The group in charge of the company’s central cloud operations is their Cloud Center of Excellence team. they assumed responsibility for this intricate brownfield landscape, which has been expanding rapidly. It is they responsibility to ensure that this growth is secure, compliant with cloud hygiene, and economical, all the while enabling the Palo Alto Networks product engineering teams to perform at the highest level.

The first step in their team’s work is determining which project belongs to which team, cost center, and environment, but this proved to be difficult. Three years ago, they started a significant automated labeling project that resulted in over 95% coverage for team, owner, cost center, and environment tagging. The final 5% proved to be more challenging, though. At that point, they came to the conclusion that machine learning could improve the efficiency of their operations and make they lives easier. This is the tale of how they used BigQuery ML a built-in machine learning feature to accomplish that.

Cutting the two-week turnaround time for ML prototyping to two hours

The sheer volume of cloud projects and their disparate naming conventions made it difficult to determine the owner, environment, and cost center for each one. Mislabeled projects that were assigned to the wrong teams or to no teams at all were frequently discovered. Because of this, it was challenging to ascertain how much money teams were spending on cloud resources.

A member of the finance team had to manually sort hundreds of projects and get in touch with potential owners in order to accurately assign team owners on dashboards and reports. This process took weeks to complete. If the results of their investigation were not clear-cut, the projects were labeled as “undecided.” As this list got longer, they only investigated expensive projects, leaving low-cost projects unlabeled as owned.

Palo Alto Networks

Google cloud team searched for terms in a project’s name or path that provided us with hints about which team was involved when concerns about project ownership arose. However, they trusted their gut feeling based on keywords, and they were aware that they could replicate this behavior with machine learning. The time had come to mechanize this tedious procedure.

It took us nearly two weeks to create a functional model that would enable us to begin training end-to-end prediction algorithms at first. they did this by writing the code from scratch using Python libraries and Scikit-learn for machine learning. Although the results were good, the small-scale prototype was unable to process the large amounts of data that they needed to process.

BigQuery is already widely used by Palo Alto Networks, so accessing their data for this project was simple. It made sense to follow the Google Cloud team’s suggestion to prototype their project using BigQuery ML instead. Prototyping the entire project took a few hours using BigQuery ML. That same afternoon, they were up and running with 99.9% accuracy. they tested it on hundreds of projects and consistently obtained accurate label predictions.

Increasing developer output and democratizing AI

Google cloud were able to use and test a number of models from BigQuery ML’s library right away after it was deployed, ultimately deciding on the boosted trees model as the best fit for they project. In the past, they had to spend up to three hours training various algorithms for testing using Python Scikit-learn each time they discovered that they weren’t accurate enough. That loop of trial and error is significantly reduced with BigQuery ML. To test a new model, they just swap out the keyword and perform an hour of training.

In a similar vein, this project’s developer time requirements have drastically decreased. they had over 300 lines of Python code in their previous iteration. Now that’s reduced to ten lines of SQL in BigQuery, it’s much simpler to read, comprehend, and maintain.

And that gets me to the democratization of AI. An extensive background in Python and machine learning was previously necessary for a project such as this one, so they initially gave this prototype to an experienced colleague. No one else on their team could have completed this manually because it would have taken a long time to read 300 lines of ML Python code and explain it.

However, they can examine the code sequence and provide an explanation in five minutes using BigQuery ML. If anyone on google cloud team knows even a little bit about the theoretical workings of each algorithm, they can understand and modify it. This work becomes much more accessible with BigQuery ML, even for those without years of experience with machine learning.

Maximizing visibility while maintaining 99.9% accuracy

Palo Alto Networks‘ label prediction project currently supports the backend infrastructure for every cloud operations team. Financial teams can see cloud cost information as it helps to sort mislabeled projects and determine which team each project belongs to. With the least amount of manual labor, their new labeling system provides us with precise, trustworthy information about they cloud projects.

As of right now, this solution can identify the cost center, environment, and team that a particular project belongs to with 99.9% accuracy. This seems like an introduction to a gateway. they have been discussing how to expand the advantages of BigQuery ML to more teams and use cases now that they have seen its value and how quickly it can produce results.

Google cloud plan to implement this model, for example, as a service for information security and finance teams that may need more information about a particular project. If a project hasn’t been mapped yet and is being used suspiciously, their model can be used to quickly determine who owns it and who has been compromised. they have mapping for 95–98% of they projects, but the last bit of unexplored ground is the riskiest. If something happens someplace and no one knows who is to blame, how can it be fixed? BigQuery ML will help us solve that in the end.

Looking forward to fascinating advancements in generative AI

They are also excited about a project that uses BigQuery and generative AI to provide non-technical users with natural language responses to business inquiries. their goal is to develop a financial operations companion that can provide all the necessary cost, asset, and optimization information from their BigQuery-stored Data Lake. It will do this by learning about each employee, their team, the projects they own, and the cloud resources they use.

Finding this kind of data used to require knowing where to put and how to write a BigQuery query. Anyone who isn’t familiar with SQL can now ask questions in plain language and get an appropriate response, from directors to interns. By utilizing a natural language prompt to write queries and combining data from several BigQuery tables to surface a contextualized response, generative AI democratizes access to information. For this project, they alpha version is now available and delivering positive outcomes. they are eager to incorporate this into each and every one of they financial operations tools.

Read more on govindhtech.com

#bigquery #ml #ai #googlecloud #bigqueryml #generativeai #technology #technews #news #govindhtech

0 notes

govindhtech · 6 months

Text

Apache Spark Stored Procedures Arrive in BigQuery

Apache Spark tutorial

Large data volumes can be handled with standard SQL by BigQuery’s highly scalable and powerful SQL engine, which also provides advanced features like BigQuery ML, remote functions, vector search, and more. To expand BigQuery data processing beyond SQL, you might occasionally need to make use of pre-existing Spark-based business logic or open-source Apache Spark expertise. For intricate JSON or graph data processing, for instance, you might want to use community packages or legacy Spark code that was written before BigQuery was migrated. In the past, this meant you had to pay for non-BigQuery SKUs, enable a different API, utilize a different user interface (UI), and manage inconsistent permissions. You also had to leave BigQuery.

Google was created an integrated experience to extend BigQuery’s data processing capabilities to Apache Spark in order to address these issues, and they are pleased to announce the general availability (GA) of Apache Spark stored procedures in BigQuery today. BigQuery users can now create and run Spark stored procedures using BigQuery APIs, allowing them to extend their queries with Spark-based data processing. It unifies Spark and BigQuery into a unified experience that encompasses billing, security, and management. Code written in Scala, Java, and PySpark can support Spark procedures.

Here are the comments from DeNA, a BigQuery customer and supplier of internet and artificial intelligence technologies”A seamless experience with unified API, governance, and billing across Spark and BigQuery is provided by BigQuery Spark stored procedures. With BigQuery, they can now easily leverage our community packages and Spark expertise for sophisticated data processing.

PySpark Create evaluate and implement PySpark code within BigQuery Studio To create, test, and implement your PySpark code, BigQuery Studio offers a Python editor as part of its unified interface for all data practitioners. In addition to other options, procedures can be configured with IN/OUT parameters. Iteratively testing the code within the UI is possible once a Spark connection has been established. The BigQuery console displays log messages from underlying Spark jobs in the same context for debugging and troubleshooting purposes. By providing Spark parameters to the process, experts in Spark can further fine-tune Spark execution.

PySpark SQL After testing, the process is kept in a BigQuery dataset, and it can be accessed and controlled in the same way as your SQL procedures.

Apache Spark examples Utilizing a large selection of community or third-party packages is one of Apache Spark’s many advantages. BigQuery Spark stored procedures can be configured to install packages required for code execution.

You can import your code from Google Cloud Storage buckets or a custom container image from the Container Registry or Artifact Registry for more complex use cases. Customer-managed encryption keys (CMEK) and the use of an existing service account are examples of advanced security and authentication options that are supported. BigQuery billing combined with serverless execution

With this release, you can only see BigQuery fees and benefit from Spark within the BigQuery APIs. Our industry-leading Serverless Spark engine, which enables serverless, autoscaling Spark, is what makes this possible behind the scenes. But when you use this new feature, you don’t have to activate Dataproc APIs or pay for Dataproc. Pay-as-you-go (PAYG) pricing for the Enterprise edition (EE) will be applied to your usage of Spark procedures. All BigQuery editions, including the on-demand model, have this feature. Regardless of the edition, you will be charged for Spark procedures with an EE PAYG SKU. See BigQuery pricing for further information.

What is Apache Spark?

For data science, data engineering, and machine learning on single-node computers or clusters, Apache Spark is a multi-language engine.

Easy, Quick, Scalable, and Unified

Apache Spark Actions Apache Spark java Combine batch and real-time streaming data processing with your choice of Python, SQL, Scala, Java, or R.

SQL analysis Run distributed, fast ANSI SQL queries for ad hoc reporting and dashboarding. surpasses the speed of most data warehouses.

Large-scale data science Utilize petabyte-scale data for exploratory data analysis (EDA) without the need for downsampling

Machine learning with apache spark quick start guide On a laptop, train machine learning algorithms, and then use the same code to scale to thousands of machines in fault-tolerant clusters.

The most popular scalable computing engine

Thousands use Apache Spark, including 80% of Fortune 500 companies. Over 2,000 academic and industrial contributors to the open source project. Ecosystem Assisting in scaling your preferred frameworks to thousands of machines, Apache Spark integrates with them.

Spark SQL engine: internal components An advanced distributed SQL engine for large-scale data is the foundation of Apache Spark.

Flexible Query Processing

Runtime modifications to the execution plan are made by Spark SQL, which automatically determines the quantity of reducers and join algorithms.

Assistance with ANSI SQL Make use of the same SQL that you are familiar with.

Data both organized and unstructured Both structured and unstructured data, including JSON and images, can be handled by Spark SQL.

Read more on Govindhtech.com

#google #bigquery #sql #bigqueryml #ApacheSpark #googlecloud #java #technonogy #technews #govindhtech

0 notes

govindhtech · 7 months

Text

Examine Gemini 1.0 Pro with BigQuery and Vertex AI

BigQuery and Vertex AI to explore Gemini 1.0 Pro

Innovation may be stifled by conventional partitions separating data and AI teams. These disciplines frequently work independently and with different tools, which can result in data silos, redundant copies of data, overhead associated with data governance, and budgetary issues. This raises security risks, causes ML deployments to fail, and decreases the number of ML models that make it into production from the standpoint of AI implementation.

It can be beneficial to have a single platform that removes these obstacles in order to accelerate data to AI workflows, from data ingestion and preparation to analysis, exploration, and visualization all the way to ML training and inference in order to maximize the value from data and AI investments, particularly around generative AI.

Google is recently announced innovations that use BigQuery and Vertex AI to further connect data and AI to help you achieve this. They will explore some of these innovations in more detail in this blog post, along with instructions on how to use Gemini 1.0 Pro in BigQuery.

What is BigQuery ML?

With Google BigQuery’s BigQuery ML capability, you can develop and apply machine learning models from within your data warehouse. It makes use of BigQuery’s processing and storing capability for data as well as machine learning capabilities, all of which are available via well-known SQL queries or Python code.

Utilize BigQuery ML to integrate AI into your data

With built-in support for linear regression, logistic regression, and deep neural networks; Vertex AI-trained models like PaLM 2 or Gemini Pro 1.0; or imported custom models based on TensorFlow, TensorFlow Lite, and XGBoost, BigQuery ML enables data analysts and engineers to create, train, and execute machine learning models directly in BigQuery using familiar SQL, helping them transcend traditional roles and leverage advanced ML models directly in BigQuery. Furthermore, BigQuery allows ML engineers and data scientists to share their trained models, guaranteeing that data is used responsibly and that datasets are easily accessible.

Every element within the data pipeline may employ distinct tools and technologies. Development and experimentation are slowed down by this complexity, which also places more work on specialized teams. With the help of BigQuery ML, users can create and implement machine learning models using the same SQL syntax inside of BigQuery. They took it a step further and used Vertex AI to integrate Gemini 1.0 Pro into BigQuery in order to further streamline generative AI. Higher input/output scale and improved result quality are key features of the Gemini 1.0 Pro model, which is intended to be used for a variety of tasks such as sentiment analysis and text summarization.

BigQuery ML allows you to integrate generative models directly into your data workflow, which helps you scale and optimize them. By doing this, bottlenecks in data movement are removed, promoting smooth team collaboration and improving security and governance. BigQuery’s tested infrastructure will help you achieve higher efficiency and scale.

There are many advantages to applying generative AI directly to your data:

Reduces the requirement for creating and maintaining data pipelines connecting BigQuery to APIs for generative AI models

Simplifies governance and, by preventing data movement, helps lower the risk of data loss

Lessens the requirement for managing and writing unique Python code to call AI models

Allows petabyte-scale data analysis without sacrificing performance

Can reduce your ownership costs overall by using a more straightforward architecture

In order to perform sentiment analysis on their data, Faraday, a well-known customer prediction platform, had to previously create data pipelines and join multiple datasets. They streamlined the process by giving LLMs direct access to their data, merging it with more first-party customer information, and then feeding it back into the model to produce hyper-personalized content—all inside BigQuery. To find out more, view this sample video.

Gemini 1.0 Pro and BigQuery ML

Create the remote model that reflects a hosted Vertex AI large language model before using Gemini 1.0 Pro in BigQuery. Usually, this process only takes a few seconds. After the model is built, use it to produce text by merging data straight from your BigQuery tables. Then, to access the Gemini 1.0 Pro via Vertex AI and carry out text-generation tasks, use the ML.GENERATE_TEXT construct. The database record and your PROMPT statement are appended by CONCAT. The prompt parameter that controls response randomness is temperature; the lower the temperature, the more relevant the response will be. The boolean flatten_json_output, when set to true, yields a flat, comprehensible text that has been taken from the JSON response.

What your data can achieve with generative AI

They think that the potential of AI technology for your business data is still largely unrealized. Data analysts’ responsibilities are growing with generative AI, going beyond just gathering, processing, and analyzing massive datasets to include proactively influencing data-driven business impact.

Data analysts can, for instance, use generative models to compile past email marketing data (open rates, click-through rates, conversion rates, etc.) and determine whether personalized offers outperform generic promotions or not, as well as which subject line types consistently result in higher open rates. Analysts can use these insights to direct the model to generate a list of interesting options for the subject line that are specific to the identified preferences. With just one platform, they can also use the generative AI model to create interesting email content.

Early adopters have shown a great deal of interest in resolving a variety of use cases from different industries. For example, the following advanced data processing tasks can be made simpler by using ML.GENERATE_TEXT:

Content generation

Without the need for sophisticated tools, analyze user feedback to create customized email content directly within BigQuery. “Create a marketing email using customer sentiment from [table name] “is a prompt

Summarize

Summarize text that is kept in BigQuery columns, like chat transcripts or online reviews. Example prompt “Combine client testimonials in [table name].”

Enhance data

For a given city name, get the name of the country. Example: “Give me the name of the city in column Y for each zip code in column X.”

Rephrasing

Spelling and grammar in written material, including voice-to-text transcriptions, should be done correctly. “Rephrase column X and add results to column Y” is an example of a prompt.

Feature extraction

Feature extraction is the process of removing important details or terms from lengthy text files, like call transcripts and internet reviews. “Extract city names from column X” is the example given.

Sentiment analysis

Recognize how people feel about particular topics in a text. Example prompt: “Incorporate findings into column Y by extracting sentiment from column X.”

Retrieval-augmented generation (RAG)

Utilizing BigQuery vector search, obtain pertinent data related to a task or question and supply it to a model as context. Use a support ticket, for instance, to locate ten related prior cases that you can pass to a model as context so that it can summarize and offer a solution.

Integrating unstructured data into your Data Cloud is made simpler, easier, and more affordable with BigQuery’s expanded support for cutting-edge foundation models like Vertex AI’s Gemini 1.0 Pro.

Come explore the future of generative AI and data with Google

Refer to the documentation to find out more about these new features. With the help of this tutorial, you can operationalize ML workflows, deploy models, and apply Google’s best-in-class AI models to your data without transferring any BigQuery data. Additionally, you can view a demonstration that shows you how to use BigQuery to build an end-to-end data analytics and AI application that fully utilizes the power of sophisticated models like Gemini, as well as a behind-the-scenes look at the development process. View Google’s most recent webcast on product innovation to find out more about the newest features and how BigQuery ML can be used to create and utilize models with just basic SQL.

Read more on govindhtech.com

#gemini #vertexai #bigquery #generativeai #bigqueryml #geminipro #rag #google #technology #technews #news #govindhtech

0 notes