#GoogleKubernetesEngine
Explore tagged Tumblr posts
govindhtech · 7 months ago
Text
Hyperdisk ML: Integration To Speed Up Loading AI/ML Data
Tumblr media
Hyperdisk ML can speed up the loading of AI/ML data. This tutorial explains how to use it to streamline and speed up the loading of AI/ML model weights on Google Kubernetes Engine (GKE). The main method for accessing Hyperdisk ML storage with GKE clusters is through the Compute Engine Persistent Disk CSI driver.
What is Hyperdisk ML?
You can scale up your applications with Hyperdisk ML, a high-performance storage solution. It is perfect for running AI/ML tasks that require access to a lot of data since it offers high aggregate throughput to several virtual machines at once.
Overview
It can speed up model weight loading by up to 11.9X when activated in read-only-many mode, as opposed to loading straight from a model registry. The Google Cloud Hyperdisk design, which enables scalability to 2,500 concurrent nodes at 1.2 TB/s, is responsible for this acceleration. This enables you to decrease pod over-provisioning and improve load times for your AI/ML inference workloads.
The following are the high-level procedures for creating and utilizing Hyperdisk ML:
Pre-cache or hydrate data in a disk image that is persistent: Fill Hyperdisk ML volumes with serving-ready data from an external data source (e.g., Gemma weights fetched from Cloud Storage). The disk image’s persistent disk needs to work with Google Cloud Hyperdisk.
Using an existing Google Cloud Hyperdisk, create a Hyperdisk ML volume: Make a Kubernetes volume that points to the data-loaded Hyperdisk ML volume. To make sure your data is accessible in every zone where your pods will operate, you can optionally establish multi-zone storage classes.
To use it volume, create a Kubernetes deployment: For your applications to use, refer to the Hyperdisk ML volume with rapid data loading.
Multi-zone Hyperdisk ML volumes
There is just one zone where hyperdisk ML disks are accessible. Alternatively, you may dynamically join many zonal disks with identical content under a single logical PersistentVolumeClaim and PersistentVolume by using the Hyperdisk ML multi-zone capability. The multi-zone feature’s referenced zonal disks have to be in the same area. For instance, the multi-zone disks (such as us-central1-a and us-central1-b) must be situated in the same area if your regional cluster is established in us-central1.
Running Pods across zones for increased accelerator availability and cost effectiveness with Spot VMs is a popular use case for AI/ML inference. Because it is zonal, GKE will automatically clone the disks across zones if your inference server runs several pods across zones to make sure your data follows your application.Image Credit To Google Cloud
The limitations of multi-zone Hyperdisk ML volumes are as follows:
There is no support for volume resizing or volume snapshots.
Only read-only mode is available for multi-zone Hyperdisk ML volumes.
GKE does not verify that the disk content is consistent across zones when utilizing pre-existing disks with a multi-zone Hyperdisk ML volume. Make sure your program considers the possibility of inconsistencies between zones if any of the disks have divergent material.
Requirements
The following Requirements must be met by your clusters in order to use it volumes in GKE:
Use Linux clusters with GKE 1.30.2-gke.1394000 or above installed. Make sure the release channel contains the GKE version or above that is necessary for this driver if you want to use one.
A driver for the Compute Engine Persistent Disk (CSI) must be installed. On new Autopilot and Standard clusters, the Compute Engine Persistent Disc driver is on by default and cannot be turned off or changed while Autopilot is in use. See Enabling the Compute Engine Persistent Disk CSI Driver on an Existing Cluster if you need to enable the Cluster’s Compute Engine Persistent Disk CSI driver.
You should use GKE version 1.29.2-gke.1217000 or later if you wish to adjust the readahead value.
You must use GKE version 1.30.2-gke.1394000 or later in order to utilize the multi-zone dynamically provisioned capability.
Only specific node types and zones allow hyperdisk ML.
Conclusion
This source offers a thorough tutorial on how to use Hyperdisk ML to speed up AI/ML data loading on Google Kubernetes Engine (GKE). It explains how to pre-cache data in a disk image, create a it volume that your workload in GKE can read, and create a deployment to use this volume. The article also discusses how to fix problems such a low it throughput quota and provides advice on how to adjust readahead numbers for best results.
Read more on Govindhtech.com
0 notes
gkcloudsolutions · 4 years ago
Text
GKE Getting Started | Google Kubernetes Engine | GKCS
Tumblr media
Course Description
This GKE Getting Started course will teach you how to containerize workloads in Docker containers, deploy them to Kubernetes clusters provided by Google Kubernetes Engine, and scale those workloads to handle increased traffic. You'll also learn how to continuously deploy new code in a Kubernetes cluster to provide application updates.
Objectives
At the end of the course, you will be able to:
Understand how     software containers work.
Understand the     architecture of Kubernetes.
Understand the     architecture of Google Cloud.
Understand how pod     networking works in Google Kubernetes Engine.
Create and manage     Kubernetes Engine clusters using the Google Cloud Console and     gcloud/kubectl commands.
Audience
This class is intended for the following participants:
Application     developers, Cloud Solutions Architects, DevOps Engineers, IT managers
Individuals using     Google Cloud Platform to create new solutions or to integrate existing     systems, application environments, and infrastructure with the Google     Cloud Platform
Prerequisites
To get the most out of this course, participants should have:
Basic proficiency     with command-line tools and Linux operating system environments, as well     as Web server
Systems     Operations experience including deploying and managing applications,     either on-premises or in a public cloud environment
Content
Module 1: Introduction to Google Cloud
Module​ ​​2:​​ Containers and Kubernetes in Google Cloud
Module​ ​​3:​​ ​Kubernetes Architecture
Module​ ​​4:​​ ​Continuous​ ​Deployment with​ ​Jenkins
For more IT Training on Google Cloud Courses visit GK Cloud Solutions.
0 notes
akshay-09 · 5 years ago
Link
youtube
0 notes
friedpenguinwolf-blog1 · 5 years ago
Text
How To Set Up #kubernetes Cluster On Google Cloud Platform. #GCP #K8S #DevOps #Linux #Docker #GoogleKubernetesEngine #GKE
https://youtu.be/yBOjWr24C6o
0 notes
macronimous · 6 years ago
Text
Deploying a #NodeJS app to the #GoogleKubernetesEngine (GKE) https://t.co/MvJAl5UomC https://t.co/qJZOcXM1WG
Deploying a #NodeJS app to the #GoogleKubernetesEngine (GKE) https://t.co/MvJAl5UomC pic.twitter.com/qJZOcXM1WG
— Macronimous.com (@macronimous) July 8, 2019
from Twitter https://twitter.com/macronimous July 08, 2019 at 09:10PM via IFTTT
0 notes
govindhtech · 8 months ago
Text
Google VPC Flow Logs: Vital Network Traffic Analysis Tool
Tumblr media
GCP VPC Flow Logs
Virtual machine (VM) instances, such as instances utilized as Google Kubernetes Engine nodes, as well as packets transported across VLAN attachments for Cloud Interconnect and Cloud VPN tunnels, are sampled in VPC Flow Logs (Preview).
IP connections are used to aggregate flow logs (5-tuple). Network monitoring, forensics, security analysis, and cost optimization are all possible uses for these data.
Flow logs are viewable via Cloud Logging, and logs can be exported to any location supported by Cloud Logging export.
Use cases
Network monitoring
VPC Flow Logs give you insight into network performance and throughput. You could:
Observe the VPC network.
Diagnose the network.
To comprehend traffic changes, filter the flow records by virtual machines, VLAN attachments, and cloud VPN tunnels.
Recognize traffic increase in order to estimate capacity.
Recognizing network utilization and minimizing network traffic costs
VPC Flow Logs can be used to optimize network traffic costs by analyzing network utilization. The network flows, for instance, can be examined for the following:
Movement between zones and regions
Internet traffic to particular nations
Traffic to other cloud networks and on-premises
Top network talkers, such as cloud VPN tunnels, VLAN attachments, and virtual machines
Forensics of networks
VPC Flow Logs are useful for network forensics. For instance, in the event of an occurrence, you can look at the following:
Whom and when did the IPs speak with?
Analyzing all incoming and outgoing network flows will reveal any hacked IPs.
Specifications
Andromeda, the program that runs VPC networks, includes VPC Flow Logs. VPC Flow Logs don’t slow down or affect performance when they’re enabled.
Legacy networks are not compatible with VPC Flow Logs. You can turn on or off the Cloud VPN tunnel (Preview), VLAN attachment for Cloud Interconnect (Preview), and VPC Flow Logs for each subnet. VPC Flow Logs gathers information from all virtual machine instances, including GKE nodes, inside a subnet if it is enabled for that subnet.
TCP, UDP, ICMP, ESP, and GRE traffic are sampled by VPC Flow Logs. Samples are taken of both inbound and outgoing flows. These flows may occur within Google Cloud or between other networks and Google Cloud. VPC Flow Logs creates a log for a flow if it is sampled and collected. The details outlined in the Record format section are included in every flow record.
The following are some ways that VPC Flow Logs and firewall rules interact:
Prior to egress firewall rules, egress packets are sampled. VPC Flow Logs can sample outgoing packets even if an egress firewall rule blocks them.
Following ingress firewall rules, ingress packets are sampled. VPC Flow Logs do not sample inbound packets that are denied by an ingress firewall rule.
In VPC Flow Logs, you can create only specific logs by using filters.
Multiple network interface virtual machines (VMs) are supported by VPC Flow Logs. For every subnet in every VPC that has a network interface, you must enable VPC Flow Logs.
Intranode visibility for the cluster must be enabled in order to log flows across pods on the same Google Kubernetes Engine (GKE) node.
Cloud Run resources do not report VPC Flow Logs.
Logs collection
Within an aggregation interval, packets are sampled. A single flow log entry contains all of the packets gathered for a specific IP connection during the aggregation interval. After that, this data is routed to logging.
By default, logs are kept in Logging for 30 days. Logs can be exported to a supported destination or a custom retention time can be defined if you wish to keep them longer.
Log sampling and processing
Packets leaving and entering a virtual machine (VM) or passing via a gateway, like a VLAN attachment or Cloud VPN tunnel, are sampled by VPC Flow Logs in order to produce flow logs. Following the steps outlined in this section, VPC Flow Logs processes the flow logs after they are generated.
A primary sampling rate is used by VPC Flow Logs to sample packets. The load on the physical host that is executing the virtual machine or gateway at the moment of sampling determines the primary sampling rate, which is dynamic. As the number of packets increases, so does the likelihood of sampling any one IP connection. Neither the primary sampling rate nor the primary flow log sampling procedure are under your control.
Following their generation, the flow logs are processed by VPC Flow Logs using the steps listed below:
Filtering: You can make sure that only logs that meet predetermined standards are produced. You can filter, for instance, such that only logs for a specific virtual machine (VM) or logs with a specific metadata value are generated, while the rest are ignored. See Log filtering for further details.
Aggregation: To create a flow log entry, data from sampling packets is combined over a defined aggregation interval.
Secondary sampling of flow logs: This is a second method of sampling. Flow log entries are further sampled based on a secondary sampling rate parameter that can be adjusted. The flow logs produced by the first flow log sampling procedure are used for the secondary sample. For instance, VPC Flow Logs will sample all flow logs produced by the primary flow log sampling if the secondary sampling rate is set to 1.0, or 100%.
Metadata: All metadata annotations are removed if this option is turned off. You can indicate that all fields or a specific group of fields are kept if you wish to preserve metadata. See Metadata annotations for further details.
Write to Logging: Cloud Logging receives the last log items.
Note: The way that VPC Flow Logs gathers samples cannot be altered. However, as explained in Enable VPC Flow Logs, you can use the Secondary sampling rate parameter to adjust the secondary flow log sampling. Packet mirroring and third-party software-run collector instances are options if you need to examine every packet.
VPC Flow Logs interpolates from the captured packets to make up for lost packets because it does not capture every packet. This occurs when initial and user-configurable sampling settings cause packets to be lost.
Log record captures can be rather substantial, even though Google Cloud does not capture every packet. By modifying the following log collecting factors, you can strike a compromise between your traffic visibility requirements and storage cost requirements:
Aggregation interval: A single log entry is created by combining sampled packets over a given time period. Five seconds (the default), thirty seconds, one minute, five minutes, ten minutes, or fifteen minutes can be used for this time interval.
Secondary sampling rate:
By default, 50% of log items are retained for virtual machines. This value can be set between 1.0 (100 percent, all log entries are kept) and 0.0 (zero percent, no logs are kept).
By default, all log entries are retained for Cloud VPN tunnels and VLAN attachments. This parameter can be set between 1.0 and greater than 0.0.
The names of the source and destination within Google Cloud or the geographic location of external sources and destinations are examples of metadata annotations that are automatically included to flow log entries. To conserve storage capacity, you can disable metadata annotations or specify just specific annotations.
Filtering: Logs are automatically created for each flow that is sampled. Filters can be set to generate logs that only meet specific criteria.
Read more on Govindhtech.com
0 notes
govindhtech · 8 months ago
Text
VPC Flow Analyzer: Your Key to Network Traffic Intelligence
Tumblr media
Overview of the Flow Analyzer
Without writing intricate SQL queries to analyze VPC Flow Logs, you can quickly and effectively comprehend your VPC traffic flows with Flow Analyzer. With a 5-tuple granularity (source IP, destination IP, source port, destination port, and protocol), Flow Analyzer enables you to conduct opinionated network traffic analysis.
Flow Analyzer, created with Log Analytics and driven by BigQuery, allows you to examine your virtual machine instances’ inbound and outgoing traffic in great detail. It enables you to keep an eye on, troubleshoot, and optimize your networking configuration for improved security and performance, which helps to guarantee compliance and reduce expenses.
Data from VPC Flow Logs that are kept in a log bucket (record format) are examined by Flow Analyzer. You must choose a project with a log bucket containing VPC Flow Logs in order to use Flow Analyzer. Network monitoring, forensics, real-time security analysis, and cost optimization are all possible with VPC Flow Logs.
The fields contained in VPC Flow Logs are subjected to searches by Flow Analyzer.
The following tasks can be completed with Flow Analyzer:
Create and execute a basic VPC Flow Logs query.
Create a SQL filter for the VPC Flow Logs query (using a WHERE statement).
Sort the query results based on aggregate packets and total traffic, then arrange the results using the chosen attributes.
Examine the traffic at specific intervals.
See a graphical representation of the top five traffic flows over time in relation to the overall traffic.
See a tabular representation of the resources with the most traffic combined over the chosen period.
View the query results to see the specifics of the traffic between a given source and destination pair.
Utilizing the remaining fields in the VPC Flow Logs, drill down the query results.
How it operates
A sample of network flows sent from and received by VPC resources, including Google Kubernetes Engine nodes and virtual machine instances, are recorded in VPC Flow Logs.
The flow logs can be exported to any location supported by Logging export and examined in Cloud Logging. Log analytics can be used to perform queries that examine log data, and the results of those queries can subsequently be shown as tables and charts.
By using Log Analytics, Flow Analyzer enables you to execute queries on VPC Flow Logs and obtain additional information about the traffic flows. This includes a table that offers details about every data flow and a graphic that shows the largest data flows.
Components of a query
You must execute a query on VPC Flow Logs in order to examine and comprehend your traffic flows. In order to view and track your traffic flows, Flow Analyzer assists you in creating the query, adjusting the display settings, and drilling down.
Traffic Aggregation
You must choose an aggregation strategy to filter the flows between the resources in order to examine VPC traffic flows. The following is how Flow Analyzer arranges the flow logs for aggregation:
Source and destination: this option makes use of the VPC Flow Logs’ SRC and DEST data. The traffic is aggregated from source to destination in this view.
Client and server: this setting looks for the person who started the connection. The server is a resource that has a lower port number. Because services don’t make requests, it also views the resources with the gke_service specification as servers. Both directions of traffic are combined in this shot.
Time-range selector
The time-range picker allows you to center the time range on a certain timestamp, choose from preset time options, or define a custom start and finish time. By default, the time range is one hour. For instance, choose Last 1 week from the time-range selector if you wish to display the data for the previous week.
Additionally, you can use the time-range slider to set your preferred time zone.
Basic filters
By arranging the flows in both directions based on the resources, you may construct the question.
Choose the fields from the list and enter values for them to use the filters.
Filter flows that match the chosen key-value combinations can have more than one filter expression added to them. An OR operator is used if you choose numerous filters for the same field. An AND operator is used when selecting filters for distinct fields.
For instance, the following filter logic is applied to the query if you choose two IP address values (1.2.3.4 and 10.20.10.30) and two country values (US and France):
(Country=US OR Country=France) AND (IP=1.2.3.4 OR IP=10.20.10.30)
The outcomes may differ if you attempt to alter the traffic choices or endpoint filters. To see the revised results, you have to execute the query one more.
SQL filters
SQL filters can be used to create sophisticated queries. You can carry out operations like the following by using sophisticated queries:
Comparing the values of the fields
Using AND/OR and layered OR operations to construct intricate boolean logic
Utilizing BigQuery capabilities to carry out intricate operations on IP addresses
BigQuery SQL syntax is used in the SQL filter queries.
Query result
The following elements are included in the query results:
The highest data flows chart shows the remaining traffic as well as the top five largest traffic flows throughout time. This graphic can be used to identify trends, such as increases in traffic.
The top traffic flows up to 10,000 rows averaged during the chosen period are displayed in the All Data Flows table. The fields chosen to organize the flows while defining the query’s filters are shown in this table.
Read more on Govindhtech.com
0 notes
govindhtech · 8 months ago
Text
Application Performance Benchmarking Focused On Users
Tumblr media
How to compare the application performance from the viewpoint of the user
How can you know what kind of performance your application has? More importantly, how well does your application function in the eyes of your end users?
Knowing how scalable your application is is not only a technical issue, but also a strategic necessity for success in this age of exponential growth and erratic traffic spikes. Naturally, giving end customers the best performance is a must, and benchmarking it is a crucial step in living up to their expectations.
To get a comprehensive picture of how well your application performs in real-world scenarios, you should benchmark full key user journeys (CUJs) as seen by the user, not just the individual components. Component-by-component benchmarking may miss certain bottlenecks and performance problems brought on by network latency, external dependencies, and the interaction of multiple components. You can learn more about the real user experience and find and fix performance problems that affect user engagement and satisfaction by simulating entire user flows.
This blog will discuss the significance of integrating end-user-perceived performance benchmarking into contemporary application development and how to foster an organizational culture that assesses apps immediately and keeps benchmarking over time. Google Kubernetes Engine (GKE) also demonstrates how to replicate complicated user behavior using the open-source Locust tool for use in your end-to-end benchmarking exercises.
The importance of benchmarking
You should incorporate strong benchmarking procedures into your application development process for a number of reasons:
Proactive performance management: By identifying and addressing performance bottlenecks early in the development cycle, early and frequent benchmarking can help developers save money, speed up time to market, and create more seamless product launches. Furthermore, by quickly identifying and resolving performance regressions, benchmarking can be incorporated into testing procedures to provide a vital safety net that protects code quality and user experience.
Continuous performance optimization: Because applications are dynamic, they are always changing due to user behavior, scaling, and evolution. Frequent benchmarking makes it easier to track performance trends over time, enabling developers to assess the effects of updates, new features, and system changes. This keeps the application responsive and consistently performant even as things change.
Bridging the gap between development and production: A realistic evaluation of application performance in a production setting can be obtained as part of a development process by benchmarking real-world workloads, images, and scaling patterns. This facilitates seamless transitions from development to deployment and helps developers proactively address possible problems.
Benchmarking scenarios to replicate load patterns in the real world
Benchmarking your apps under conditions that closely resemble real-world situations, such as deployment, scalability, and load patterns, should be your aim as a developer. This method evaluates how well apps manage unforeseen spikes in traffic without sacrificing user experience or performance.
To test and improve cluster and workload auto scalers, the GKE engineering team conducts comprehensive benchmarking across a range of situations. This aids in the comprehension of how autoscaling systems adapt to changing demands while optimizing resource use and preserving peak application performance.Image credit to Google Cloud
Application Performance tools
Locust for performance benchmarking and realistic load testing
Locust is an advanced yet user-friendly load-testing tool that gives developers a thorough grasp of how well an application performs in real-world scenarios by simulating complex user behavior through scripting. Locust makes it possible to create different load scenarios by defining and instantiating “users” that carry out particular tasks.
Locust in one example benchmark to mimic consumers requesting the 30th Fibonacci number from a web server. To maintain load balancing among many pods, each connection was closed and reestablished, resulting in a steady load of about 200 ms per request.
from locust import HttpUser, task
Simulating these complex user interactions in your application is comparatively simple with Locust. On a single system, it can produce up to 10,000 queries per second. It can also expand higher through unconventional distributed deployment. With users who display a variety of load profiles, it allows you to replicate real-world load patterns by giving you fine-grained control over the number of users and spawn rate through bespoke load shapes. It is expandable to a variety of systems, such as XML-RPC, gRPC, and different request-based libraries/SDKs, and it natively supports HTTP/HTTPS protocols for web and REST queries.
To provide an end-to-end benchmark of a pre-release autoscaling cluster setup, it has included a GitHub repository with this blog post. It is advised that you modify it to meet your unique needs.Image credit to Google Cloud
Delivering outstanding user experiences requires benchmarking end users’ perceived performance, which goes beyond simply being a best practice. Developers may determine whether their apps are still responsive, performant, and able to satisfy changing user demands by proactively incorporating benchmarking into the development process.
You can learn more about how well your application performs in a variety of settings by using tools like Locust, which replicate real-world situations. Performance is a continuous endeavor. Use benchmarking as a roadmap to create outstanding user experiences.
Reda more on govindhtech.com
0 notes
govindhtech · 8 months ago
Text
GKE IP_SPACE_EXHAUSTED error: Find Cause & Fix Solution
Tumblr media
GKE IP_SPACE_EXHAUSTED
You’ve probably run across the debilitating “IP_SPACE_EXHAUSTED” problem if you use Google Kubernetes Engine (GKE) in your Google Cloud setup.
It is a typical situation: you are certain that your subnet architecture is future-proof and your IP address planning is perfect, and then all of a sudden, your GKE cluster encounters an unforeseen scaling obstacle. You begin to doubt your ability to subnet. How is it possible for a /24 subnet with 252 nodes to be used up with just 64 nodes in your cluster? The explanation is found in the subtle method that GKE distributes IP addresses, which frequently goes above the number of nodes.
Actually, node capacity in GKE is influenced by three main aspects. By learning about them, you may significantly reduce your risk of encountering the infamous IP_SPACE_EXHAUSTED error.
Cluster primary subnet: Gives your cluster’s internal load balancers and nodes IP addresses. The maximum scalability of your cluster is theoretically determined by the size of the subnet, but there is more to it than that.
Pod IPv4 range: The pods in your cluster are assigned IP addresses by this alias subnet, which is part of the primary subnet.
Maximum pods per node: This indicates the most pods that GKE is able to schedule on one node. It can be overridden at the node-pool level even though it is set at the cluster level.
GKE’s approach to IP allocation
GKE cleverly saves IP addresses for pods. After examining the “Maximum pods per node” choice, it allocates to each node the lowest subnet that can accommodate twice as many IP addresses (maximum pods per node). Kubernetes minimizes IP address reuse when pods are added to and deleted from a node by providing more than twice as many accessible IP addresses as the maximum number of pods that may be produced on a node. GKE determines the smallest subnet mask that may support 220 (2×110) IP addresses (i.e., /24), if the maximum is set to the default 110 for GKE Standard clusters. After that, it divides the pod IPv4 range into /24 slices and distributes them across the nodes.
The “aha!” moment
The main lesson is that the number of /24 slices your pod’s IPv4 range can offer, not just the number of IP addresses in your principal subnet, determines how scalable your cluster can be. Even if your principal subnet has plenty of addresses left, you will see the “IP_SPACE_EXHAUSTED” issue once each node has consumed a slice.
An example to illustrate
Let’s say you set up a GKE cluster with these parameters:
Cluster Primary Subnet: 10.128.0.0/22
Pod IPv4 Range: 10.0.0.0/18
Maximum pods per node: 110
You boldly declared that you could grow your cluster to 1020 nodes. However, the “IP_SPACE_EXHAUSTED” warning occurred when it reached 64 nodes. Why?
The pod IPv4 range is the source of the issue. GKE reserves a /24 subnet for every node, which can have up to 110 pods (2 x 110 = 220 IPs, necessitating a /24). Only 64 /24 subnets can be created from a /18 subnet. Despite having plenty of capacity in your principal subnet, you ran out of pod IP addresses at 64 nodes.
To determine how many nodes can fit into your pod’s IPv4 range, there are two methods:
Subnet bit difference: Determine how much the subnet masks differ. The amount of “subnet bits” and 2⁶ = 64 potential slices, or nodes, that can fit within the pod IPv4 range are obtained by subtracting the pod subnet mask from the subnet mask of the node (24–18 = 6).
Total pod capacity: Taking into account the pod IPv4 range’s total capacity is an additional method of determining the maximum number of nodes. A subnet with a size of /18 can support 2(32-18) = 16,384 IP addresses. You may get the maximum number of nodes by dividing the total pod capacity by the number of addresses per node, which is 16,384 / 256 = 64. This is because each node, with its /24 subnet, needs 256 addresses.
Finding the problem
Network Analyzer tool
The Network Analyzer is a useful Google Cloud service. Among its many capabilities is the ability to identify IP exhaustion problems and provide a summary of your pod IP subnet capacity and the point at which you are about to reach that limit. You will discover the pertinent Network Analyzer insights in the host project if your cluster arrangement consists of a service project for the cluster and a different host project for the VPC network.
A node pool with a /23 pod subnet and a maximum of 110 pods per node (effectively employing a /24 subnet per node) is depicted in the Network Analyzer insight for a GKE cluster. A medium priority warning is displayed when you try to scale this node pool to two nodes, indicating that you have exceeded the maximum number of nodes that the designated pod IP range can support.
Resolving the problem
You can easily extend the cluster’s principal subnet to add extra nodes if you’ve reached its maximum. You have limited choices, though, if the bottleneck is located within the pod IPv4 range:
Make a new cluster with a wider range of pod addresses: This is my least favorite approach and it’s easier said than done, but sometimes it’s essential.
Adding pod IPv4 address ranges: Adding a second pod IPv4 subnet will solve this problem. Consider it as bringing a new cake to the gathering; additional slices allow you to serve more people, or in this case, more nodes. The combined capacity of the old and new pod IPv4 ranges is then the cluster’s total node capacity.
Maximum pods per node: At the cluster and node-pool levels, this option cannot be changed. Nonetheless, you can improve IP address consumption by creating a separate node pool with different maximum pods per node number.
GKE Autopilot clusters
If autopilot clusters are not properly planned, they are susceptible to pod IP address exhaustion. To give extra addresses, you can add more pod IPv4 subnets, same like with Standard clusters. These extra ranges are subsequently used by GKE for pods on nodes built in subsequent node pools.
Determining how to start the creation of a new node pool to utilize those additional pod IP ranges is the less obvious challenge with Autopilot clusters. New node pools cannot be directly created in Autopilot mode. By employing workload separation to deploy a task, you can compel GKE to establish a new node pool. The additional pod IPv4 range will subsequently be accessed by this new pool.
Multiple node pools and varying maximum pods per node
The last jigsaw piece deals with situations in which several node pools have varying “maximum pods per node” values but share the same pod IPv4 range. It is somewhat more difficult to determine the maximum number of nodes in such a configuration.
The number of nodes in each node pool determines the calculation when several node pools share the same pod range. Let’s explain this with an example.
An example to illustrate
The following characteristics of your Standard GKE cluster are present:
Cluster Primary Subnet: 10.128.0.0/22
Pod IPv4 Range: 10.0.0.0/23
Maximum pods per node: 110
The settings of the default node pool are as follows:
Name: default-pool
Pod IPv4 Range: 10.0.0.0/23
Maximum pods per node: 110
Next, you add pool-1, a second node pool with a smaller maximum number of pods per node:
Name: pool-1
Pod IPv4 Range: 10.0.0.0/23
Maximum pods per node: 60
GKE will reserve a /24 subnet per node in the default-pool and a /25 subnet per node in pool-1 based on our understanding. Given that they share the /23 pod IPv4 range, the following combinations are possible:
Maximum of two nodes in default-pool and zero in pool-1 (Total = 2)
Maximum of one node in default-pool and two in pool-1 (Total = 3)
Maximum of zero nodes in default-pool and four in pool-1 (Total = 4)
As you can see, it’s more difficult to figure out how many nodes this cluster can have than it is when node pools have different pod ranges.
Conclusion
Avoiding the annoying “IP_SPACE_EXHAUSTED” warning requires an understanding of the subtleties of GKE’s IP allocation. With the maximum number of pods per node and future scaling in mind, carefully plan your subnets and pod ranges. Your GKE clusters can have the IP address space they require to expand and prosper if you plan ahead. To learn how to use the class E IPv4 address space to lessen IPv4 exhaustion problems in GKE, make sure to read this blog post as well.
Read more on Govindhtech.com
0 notes
govindhtech · 9 months ago
Text
Google Cloud Parallelstore Powering AI And HPC Workloads
Tumblr media
Parallelstore
Businesses process enormous datasets, execute intricate simulations, and train generative models with billions of parameters using artificial intelligence (AI) and high-performance computing (HPC) applications for a variety of use cases, including LLMs, genomic analysis, quantitative analysis, and real-time sports analytics. Their storage systems are under a lot of performance pressure from these workloads, which necessitate high throughput and scalable I/O performance that keeps latencies under a millisecond even when thousands of clients are reading and writing the same shared file at the same time.
Google Cloud is thrilled to share that Parallelstore, which is unveiled at Google Cloud Next 2024, is now widely available to power these next-generation AI and HPC workloads. Parallelstore, which is based on the Distributed Asynchronous Object Storage (DAOS) architecture, combines a key-value architecture with completely distributed metadata to provide high throughput and IOPS.
Continue reading to find out how Google Parallelstore meets the demands of demanding AI and HPC workloads by enabling you to provision Google Kubernetes Engine and Compute Engine resources, optimize goodput and GPU/TPU utilization, and move data in and out of Parallelstore programmatically.
Optimize throughput and GPU/TPU use
It employs a key-value store architecture along with a distributed metadata management system to get beyond the performance constraints of conventional parallel file systems. Due to its high-throughput parallel data access, it may overwhelm each computing client’s network capacity while reducing latency and I/O constraints. Optimizing the expenses of AI workloads is dependent on maximizing good output to GPUs and TPUs, which is achieved through efficient data transmission. Google Cloud may also meet the needs of modest-to-massive AI and HPC workloads by continuously granting read/write access to thousands of virtual machines, graphics processing units, and TPUs.
The largest Parallelstore deployment of 100 TiB yields throughput scaling to around 115 GiB/s, with a low latency of ~0.3 ms, 3 million read IOPS, and 1 million write IOPS. This indicates that a large number of clients can benefit from random, dispersed access to small files on Parallelstore. According to Google Cloud benchmarks, Parallelstore’s performance with tiny files and metadata operations allows for up to 3.7x higher training throughput and 3.9x faster training timeframes for AI use cases when compared to native ML framework data loaders.
Data is moved into and out of Parallelstore using programming
For data preparation or preservation, cloud storage is used by many AI and HPC applications. You may automate the transfer of the data you want to import into Parallelstore for processing by using the integrated import/export API. With the help of the API, you may ingest enormous datasets into Parallelstore from Cloud Storage at a rate of about 20 GB per second for files bigger than 32 MB and about 5,000 files per second for smaller files.
gcloud alpha parallelstore instances import-data $INSTANCE_ID –location=$LOCATION –source-gcs-bucket-uri=gs://$BUCKET_NAME [–destination-parallelstore-path=”/”] –project= $PROJECT_ID
You can programmatically export results from an AI training task or HPC workload to Cloud Storage for additional evaluation or longer-term storage. Moreover, data pipelines can be streamlined and manual involvement reduced by using the API to automate data transfers.
gcloud alpha parallelstore instances export-data $INSTANCE_ID –location=$LOCATION –destination-gcs-bucket-uri=gs://$BUCKET_NAME [–source-parallelstore-path=”/”]
GKE resources are programmatically provisioned via the CSI driver
The Parallelstores GKE CSI driver makes it simple to effectively manage high-performance storage for containerized workloads. Using well-known Kubernetes APIs, you may access pre-existing Parallelstore instances in Kubernetes workloads or dynamically provision and manage Parallelstore file systems as persistent volumes within your GKE clusters. This frees you up to concentrate on resource optimization and TCO reduction rather than having to learn and maintain a different storage system.
ApiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: parallelstore-class provisioner: parallelstore.csi.storage.gke.io volumeBindingMode: Immediate reclaimPolicy: Delete allowedTopologies:
matchLabelExpressions:
key: topology.gke.io/zone values:
us-central1-a
The fully managed GKE Volume Populator, which automates the preloading of data from Cloud Storage straight into Parallelstore during the PersistentVolumeClaim provisioning process, will be available to preload data from Cloud Storage in the upcoming months. This helps guarantee that your training data is easily accessible, allowing you to maximize GPU and TPU use and reduce idle compute-resource time.
Utilizing the Cluster Toolkit, provide Compute Engine resources programmatically
With the Cluster Toolkit’s assistance, deploying Parallelstore instances for Compute Engine is simple. Cluster Toolkit is open-source software for delivering AI and HPC workloads; it was formerly known as Cloud HPC Toolkit. Using best practices, Cluster Toolkit allocates computing, network, and storage resources for your cluster or workload. With just four lines of code, you can integrate the Parallelstore module into your blueprint and begin using Cluster Toolkit right away. For your convenience, we’ve also included starter blueprints. Apart from the Cluster Toolkit, Parallelstore may also be deployed using Terraform templates, which minimize human overhead and support operations and provisioning processes through code.
Respo.vision
Leading sports video analytics company Respo. Vision is using it to speed up its real-time system’s transition from 4K to 8K video. Coaches, scouts, and fans can receive relevant information by having Respo.vision help gather and label granular data markers utilizing Parallelstore as the transport layer. Respo.vision was able to maintain low computation latency while managing spikes in high-performance video processing without having to make costly infrastructure upgrades because to Parallelstore.
The use of AI and HPC is expanding quickly. The storage solution you need to maintain the demanding GPU/TPUs and workloads is Parallelstore, with its novel architecture, performance, and integration with Cloud Storage, GKE, and Compute Engine.
Read more on govindhtech.com
0 notes
govindhtech · 9 months ago
Text
Latest Updates For Confidential VMs Google In Compute Engine
Tumblr media
Confidential virtual machine overview
Confidential VMs Google
One kind of Compute Engine virtual machine is a confidential virtual machine (VM). To help guarantee that your data and applications cannot be read or altered while in use, they employ hardware-based memory encryption.
Below are some advantages of confidential virtual machine instances:
Isolation: Only specialized hardware, unavailable to the hypervisor, generates and stores encryption keys.
Attestation: To ensure that important parts haven’t been tampered with, you can confirm the identity and condition of the virtual machine.
A Trusted Execution Environment is a term used to describe this kind of hardware isolation and attestation (TEE).
When you create a new virtual machine instance, you have the option to activate the Confidential VM service.
Confidential computing technology
Depending on the machine type and CPU platform you select, different Confidential Computing technologies can be employed while setting up a Confidential VM instance. Make sure the technology you select for Confidential Computing meets your budget and performance requirements.
AMD SEV
AMD Secure Encrypted Virtualization (SEV) on Confidential VM provides boot-time attestation using Google’s vTPM and hardware-based memory encryption via the AMD Secure Processor.
AMD SEV provides excellent performance for computationally intensive activities. Depending on the workload, the performance difference between a normal Compute Engine VM and a SEV Confidential VM can be negligible or nonexistent.
AMD SEV systems using the N2D machine type offer live migration, in contrast to other Confidential Computing technologies using Confidential VM.
AMD SEV-SNP
Adding hardware-based security to assist thwart malicious hypervisor-based attacks like data replay and memory remapping, AMD Secure Encrypted Virtualization-Secure Nested Paging (SEV-SNP) builds on SEV. Direct attestation results from the AMD Secure Processor are available upon request at any time.
Compared to SEV, AMD SEV-SNP requires greater resources due to its increased security measures. In instance, you may see higher network latency and decreased network bandwidth based on the demand.
TDX Intel
A hardware-based TEE is Intel Trust Domain Extensions (TDX). In order to manage and encrypt memory, TDX employs hardware extensions to establish an isolated trust domain (TD) inside of a virtual machine (VM).
By enhancing the defense of the TD against specific types of attacks that require physical access to the platform memory, such as active attacks of DRAM interfaces that involve splicing, aliasing, capturing, altering, relocating, and modifying memory contents, Intel TDX enhances the defense of the TD.
Confidential VM service
Confidential VM is used by the following Google Cloud services in addition to Compute Engine:
Secret All of your Google Kubernetes Engine nodes are required to use Confidential VM.
With a mutually agreed-upon workload, Confidential Space employs Confidential VM to allow parties to share sensitive data while maintaining ownership and confidentiality of that data.
Confidential VM-using Dataproc clusters are part of Dataproc Confidential Compute.
Features of Dataflow Confidential VM worker for dataflow confidential virtual machines.
Google Cloud is dedicated to keeping your data under your complete control and safe and secure. To start, use Confidential Computing to strengthen the Compute Engine virtual machines (VMs), which are the cornerstone of your compute architecture.
Using a hardware-based Trusted Execution Environment (TEE), Confidential Computing safeguards data throughout use and processing. TEEs are safe, segregated spaces that guard against illegal access to or alteration of data and applications while they’re in use.
Confidential Computing technologies and solutions have been early adopters and investments for us at Google. Google customers have been utilizing their expanded Confidential Computing products and new capabilities for over four years, utilizing them in creative ways to improve the security and confidentiality of their workloads. It was excited to announce the broad release of upgrades to the Google Cloud attestation service as well as numerous additional Confidential Computing choices.
Currently accessible to most people: Segmented virtual machine with AMD SEV on C3D platforms
It is pleased to inform you that Confidential VMs Google equipped with AMD Secure Encrypted Virtualization (AMD SEV) technology are now widely accessible on the general purpose C3D machine line. Using hardware-based memory encryption, Confidential VMs with AMD SEV technology help guarantee that your data and apps cannot be read or changed while in use. With Google’s Titanium hardware, the C3D machine series is built to provide optimal, dependable, and consistent performance, and is powered by the 4th generation AMD EPYC (Genoa) processor.
Prior to this, only the general-purpose N2D and C2D machine series offered Confidential VMs. The latest general purpose hardware with enhanced performance and data secrecy is now available to security-conscious customers with the expansion to the C3D machine line. Better performance comes from using the newest gear. Read more about the performance of the C3D machine series and confidential virtual machines here.
In any region and zone where C3D machines are available, confidential virtual machines featuring AMD SEV are accessible.
Now widely accessible: Intel TDX-powered confidential virtual machine on the C3 machine series
Confidential VMs Google equipped with Intel Trust Domain Extensions (Intel TDX) technology are now widely accessible on the general-purpose C3 machine series. Using hardware-based memory encryption, Confidential VMs with Intel TDX technology help guarantee that your data and apps cannot be read or changed while in use.
There are no code changes needed to enable confidential computing on a C3 virtual machine. You can use Intel Trust Authority’s remote attestation service or your own attestation provider to confirm that your hardened virtual machine (VM) is operating in a TEE. The 4th generation Intel Xeon Scalable CPUs (code-named Sapphire Rapids), DDR5 memory, and Google Titanium power the C3 machine line.
Intel AMX integrated CPU acceleration
By default, all C3 virtual machines (VMs), including Confidential VMs, have Intel Advanced Matrix Extensions (Intel AMX) enabled. To speed up workloads related to machine learning and artificial intelligence, Intel AMX is a novel expansion to the instruction set architecture (ISA). Two of the most popular processes in AI and ML are matrix multiplication and convolution, which may be carried out with the new instructions that AMX offers. You can execute AI/ML applications with an extra degree of protection by combining Intel AMX with Confidential VMs.
Asia-southeast1, US-central1, and Europe-west4 are the regions where Confidential VM with Intel TDX on the C3 machine series is accessible.
Confidential VM with AMD SEV-SNP on the N2D machine series is now widely accessible
Customers now have access to Confidential VMs with hardware-rooted attestation, data integrity, and data confidentiality thanks to the release of AMD Secure Encrypted Virtualization-Secure Nested Paging (AMD SEV-SNP) on the general purpose N2D machine series this past June. Prior to this, users could only access private VMs with AMD Secure Encrypted Virtualization (SEV), a private computing solution that guaranteed data confidentiality.
All Confidential VMs give users an extra line of defense and data protection against cloud administrators, operators, and insiders while also enabling them to retain control over their data in the public cloud and achieve cryptographic isolation in a multi-tenant environment. Confidential VMs with AMD SEV-SNP, on the other hand, come with further security measures that guard against harmful hypervisor-based assaults such memory remapping and data replay.
AMD SEV-SNP on the N2D machine series makes it simple and doesn’t require any code changes to create Confidential VMs. You also get the security advantages with less impact on performance.
Asia-southeast1, US-central1, Europe-west3, and Europe-west 4 are the regions where confidential virtual machines with AMD SEV-SNP on the N2D machine series are accessible.
Signed Intel TDX and AMD SEV-SNP UEFI binaries for Confidential Virtual Machines
With the addition of signed startup measures (UEFI binaries and initial state) to its Confidential VMs running AMD SEV-SNP and Intel TDX technologies, it is thrilled to announce a major security improvement. By signing these files, provided an additional degree of security against unauthorized changes or tampering with UEFI, the firmware that manages a computer’s startup procedure.
Gaining further transparency and confidence that the firmware operating on your Confidential VMs is authentic and uncompromised can be achieved by signing the UEFI and enabling you to validate the signatures. Your authenticated devices are operating in a secure and reliable environment if you can confirm the validity and integrity of the firmware.
Google intends to take other actions to create a system that is more verifiably reliable and secure.
AMD SEV Confidential VM is now supported by Google Cloud attestation
If your trust model permits it, you can use the Google Cloud attestation service in place of creating and executing an attestation verifier yourself. Use the Go-TPM tools to obtain an attestation quote from the vTPM of an AMD SEV Confidential VM instance, then transmit it to the Google Cloud Attestation service using the./go-tpm token command for verification.
You can verify whether or not the virtual machine (VM) can be trusted by comparing its details with your own policy once the Google Cloud Attestation has verified the attestation quote. Only AMD SEV is currently supported by Google’s attestation service.
Confidential VM costs
In addition to the Compute Engine price, there are additional expenses for Confidential VM. The cost of a Confidential VM instance is determined by several factors, including the type of Confidential Computing technology (such as AMD SEV, Intel TDX, or AMD SEV-SNP) and whether the instance is preemptible or on demand. The fees are flat rate per vCPU and per GB for Confidential VM.
See here for the price of the Confidential VM. See this page for Compute Engine’s price list.
Read more on Govindhtech.com
0 notes
govindhtech · 9 months ago
Text
Class E IP Address Space Helps GKE Manage IPv4 Depletion
Tumblr media
Using Class E IPv4 Address space to help GKE address IPv4 depletion problems. The need for private IPv4 addresses is growing along with the amount of services and apps hosted on Google Kubernetes Engine (GKE) (RFC 1918). The RFC1918 address space is becoming harder to come by for a lot of big businesses, which makes IP address depletion a problem that affects their application scalability.
This precise address depletion problem is resolved by IPv6, which offers a large number of addresses. But not every business or application is prepared for IPv6 just yet. You may continue to expand your company by entering the IPv4 address space (240.0.0.0/4), which can handle these problems.
Class E addresses (240.0.0.0/4) are set aside for future usage, as indicated in RFC 5735 and RFC 1112, as stated in Google VPC network acceptable IPv4 ranges; nevertheless, this does not preclude you from using them in certain situations today. Google will also provide tips for organizing and using GKE clusters with Class E.
Recognizing Class E addresses
IPv4 addresses
Some typical criticisms or misunderstandings about the use of Class E addresses are as follows:
Other Google services do not function with class E addresses. This is untrue. Class E addresses are included in the acceptable address ranges for IPV4 that Google Cloud VPC offers. Furthermore, private connection techniques using Class E addresses provide access to a large number of Google controlled services.
Communicating with services outside of Google (internet/on-premises/other clouds) is limited when using Class E addresses. False. You may use NAT or IP masquerading to convert Class E addresses to public or private IPv4 addresses in order to access destinations outside of Google Cloud, since Class E addresses are not routable and are not published over the internet or outside of Google Cloud. Furthermore,
a. Nowadays, a large number of operating systems support Class E addresses, with Microsoft Windows being the prominent exception.
b. Routing the addresses for usage in private DCs is supported by several on-premises suppliers (Cisco, Juniper, Arista).
There are scale and performance restrictions on Class E addresses. This is untrue. Regarding performance, there is no difference between the addresses and other address ranges used by Google Cloud. Agents can grow to accommodate a high number of connections without sacrificing speed, even with NAT/IP Masquerade.
Therefore, you may utilize Class E addresses for private usage inside Google Cloud VPCs, for both Compute Engine instances and Kubernetes pods/services in GKE, even though they are reserved for future use, not routable over the internet, and shouldn’t be publicized over the public internet.
Advantages
Class E IP Addresses
Despite these limitations, Class E addresses provide some benefits:
Large address space: Compared to standard RFC 1918 private addresses (around 17.9 million addresses vs. about 268.4 million addresses for it), Class E addresses provide a much bigger pool of IP addresses. Organizations experiencing IP address depletion will benefit from this abundance as it will enable them to expand their services and applications without being constrained by a finite amount of address space.
Growth and scalability: It addressing’s wide reach facilitates the simple scalability of services and apps on Google Cloud and GKE. IP address restrictions do not prevent you from deploying and growing your infrastructure, which promotes innovation and development even during times of high consumption.
Effective resource utilization: By using Class E addresses to enhance your IP address allocation procedures, you may reduce the possibility of address conflicts and contribute to the efficient use of IP resources. This results in reduced expenses and more efficient operations.
Future-proofing: Although it is not supported by all operating systems, its use is anticipated to rise in response to the growing need for IP addresses. You can future-proof your infrastructure scalability to enable company development for many years to come by adopting Class E early on.
Class E IP addresses
Things to be mindful of
Even though Class E IP addresses provide many advantages, there are a few crucial things to remember:
Compatibility with operating systems: At the moment, not all operating systems enable Class E addressing. Make sure your selected operating system and tools are compatible before putting Class E into practice.
Software and hardware for networking: Check to see whether your firewalls and routers (or any other third-party virtual appliance solutions running on Google Compute Engine) are capable of handling the addresses. Make sure any programs or software that use IP addresses are updated to support it as well.
Migration and transition: To ensure there are no interruptions while switching from RFC 1918 private addresses to it, meticulous preparation and execution are needed.
How Snap implemented Class E
Network IP management is becoming more difficult due to the growing use of microservices and containerization systems such as GKE, particularly by major clients like Snap. Snap’s finite supply of RFC1918 private IPv4 addresses was rapidly depleted with hundreds of thousands of pods deployed, impeding cluster scalability and necessitating a large amount of human work to release addresses.
Originally contemplating an IPv6 migration, Snap ultimately opted to deploy dual-stack GKE nodes and GKE pods (IPv6 + Class E IPv4) due to concerns over application readiness and compatibility. In addition to preventing IP fatigue, this approach gave Snap the scale of IP addresses it required for many years to accommodate future expansion and cut down on overhead. Furthermore, this technique was in line with Snap’s long-term plan to switch to IPv6.
Fresh clusters
Requirement
Construct native VPC clusters.
Steps
Make a subnetwork with supplementary ranges for services and pods, if desired. It range (240.0.0.0/4) has CIDRs that may be used in the secondary ranges.
When creating the cluster for the pod and services CIDR ranges, use the previously generated secondary ranges. This is an example of the user-managed secondary range assignment mechanism.
Setup IP masquerading to source network address translation (SNAT) to map the IP address of the underlying node to the source network address.
Migrating clusters
Requirement
The clusters need to be native to the VPC.
Steps
It is not possible to modify the cluster’s default pod IPv4 range. For more recent node pools that support Class E ranges, you may add pod ranges.
Workloads from earlier node pools may potentially be moved to newer node pools.
IPv4 Vs IPv6
Making the switch from IPv4 to IPv6 Class E
For enterprises experiencing IP depletion, switching to dual-stack clusters with the IPv4 and IPv6 addresses now is a wise strategic step. By increasing the pool of IP addresses that are accessible, it offers instant relief and permits expansion and scalability inside Google Cloud and GKE. Furthermore, implementing dual-stack clusters is an essential first step toward a more seamless IPv6-only transition.
Read more on Govindhtech.com
0 notes
govindhtech · 9 months ago
Text
New GKE Ray Operator on Kubernetes Engine Boost Ray Output
Tumblr media
GKE Ray Operator
The field of AI is always changing. Larger and more complicated models are the result of recent advances in generative AI in particular, which forces businesses to efficiently divide work among more machines. Utilizing Google Kubernetes Engine (GKE), Google Cloud’s managed container orchestration service, in conjunction with ray.io, an open-source platform for distributed AI/ML workloads, is one effective strategy. You can now enable declarative APIs to manage Ray clusters on GKE with a single configuration option, making that pattern incredibly simple to implement!
Ray offers a straightforward API for smoothly distributing and parallelizing machine learning activities, while GKE offers an adaptable and scalable infrastructure platform that streamlines resource management and application management. For creating, implementing, and maintaining Ray applications, GKE and Ray work together to provide scalability, fault tolerance, and user-friendliness. Moreover, the integrated Ray Operator on GKE streamlines the initial configuration and directs customers toward optimal procedures for utilizing Ray in a production setting. Its integrated support for cloud logging and cloud monitoring improves the observability of your Ray applications on GKE, and it is designed with day-2 operations in mind.
- Advertisement -
Getting started
When establishing a new GKE Cluster in the Google Cloud dashboard, make sure to check the “Enable Ray Operator” function. This is located under “AI and Machine Learning” under “Advanced Settings” on a GKE Autopilot Cluster.
The Enable Ray Operator feature checkbox is located under “AI and Machine Learning” in the “Features” menu of a Standard Cluster.
You can set an addons flag in the following ways to utilize the gcloud CLI:
gcloud container clusters create CLUSTER_NAME \ — cluster-version=VERSION \ — addons=RayOperator
- Advertisement -
GKE hosts and controls the Ray Operator on your behalf after it is enabled. After a cluster is created, your cluster will be prepared to run Ray applications and build other Ray clusters.
Record-keeping and observation
When implementing Ray in a production environment, efficient logging and metrics are crucial. Optional capabilities of the GKE Ray Operator allow for the automated gathering of logs and data, which are then seamlessly stored in Cloud Logging and Cloud Monitoring for convenient access and analysis.
When log collection is enabled, all logs from the Ray cluster Head node and Worker nodes are automatically collected and saved in Cloud Logging. The generated logs are kept safe and easily accessible even in the event of an unintentional or intentional shutdown of the Ray cluster thanks to this functionality, which centralizes log aggregation across all of your Ray clusters.
By using Managed Service for Prometheus, GKE may enable metrics collection and capture all system metrics exported by Ray. System metrics are essential for tracking the effectiveness of your resources and promptly finding problems. This thorough visibility is especially important when working with costly hardware like GPUs. You can easily construct dashboards and set up alerts with Cloud Monitoring, which will keep you updated on the condition of your Ray resources.
TPU assistance
Large machine learning model training and inference are significantly accelerated using Tensor Processing Units (TPUs), which are custom-built hardware accelerators. Ray and TPUs may be easily used with its AI Hypercomputer architecture to scale your high-performance ML applications with ease.
By adding the required TPU environment variables for frameworks like JAX and controlling admission webhooks for TPU Pod scheduling, the GKE Ray Operator simplifies TPU integration. Additionally, autoscaling for Ray clusters with one host or many hosts is supported.
Reduce the delay at startup
When operating AI workloads in production, it is imperative to minimize start-up delay in order to maximize the utilization of expensive hardware accelerators and ensure availability. When used with other GKE functions, the GKE Ray Operator can significantly shorten this startup time.
You can achieve significant speed gains in pulling images for your Ray clusters by hosting your Ray images on Artifact Registry and turning on image streaming. Huge dependencies, which are frequently required for machine learning, can lead to large, cumbersome container images that take a long time to pull. For additional information, see Use Image streaming to pull container images. Image streaming can drastically reduce this image pull time.
Moreover, model weights or container images can be preloaded onto new nodes using GKE secondary boot drives. When paired with picture streaming, this feature can let your Ray apps launch up to 29 times faster, making better use of your hardware accelerators.
Scale Ray is currently being produced
A platform that grows with your workloads and provides a simplified Pythonic experience that your AI developers are accustomed to is necessary to stay up with the quick advances in AI. This potent trifecta of usability, scalability, and dependability is delivered by Ray on GKE. It’s now simpler than ever to get started and put best practices for growing Ray in production into reality with the GKE Ray Operator.
Read more on govindhtech.com
0 notes
govindhtech · 11 months ago
Text
How Visual Scout & Vertex AI Vector Search Engage Shoppers
Tumblr media
At Lowe’s, Google always work to give their customers a more convenient and pleasurable shopping experience. A recurring issue Google has noticed is that a lot of customers come to their mobile application or e-commerce site empty-handed, thinking they’ll know the proper item when they see it.
Google Cloud developed Visual Scout, an interactive tool for browsing the product catalogue and swiftly locating products of interest on lowes.com, to solve this problem and improve the shopping experience. It serves as an example of how  artificial intelligence suggestions are transforming modern shopping experiences across a variety of communication channels, including text, speech, video, and images.
Visual Scout is intended for consumers who consider products’ aesthetic qualities when making specific selections. It provides an interactive experience that allows buyers to learn about different styles within a product category. First, ten items are displayed on a panel by Visual Scout. Following that, users express their choices by “liking” or “disliking” certain display items. Visual Scout dynamically changes the panel with elements that reflect client style and design preferences based on this feedback.
This is an illustration of how a discovery panel refresh is influenced by user feedback from a customer who is shopping for hanging lamps.Image credit to Google Cloud
We will dive into the technical details and examine the crucial MLOps procedures and technologies in this post, which make this experience possible.
How Visual Scout Works
Customers usually know roughly what “product group” they are looking for when they visit a product detail page on lowes.com, although there may be a wide variety of product options available. Customers can quickly identify a subset of interesting products by using Visual Scout to sort across visually comparable items, saving them from having to open numerous browser windows or examine a predetermined comparison table.
The item on a particular product page will be considered the “anchor item” for that page, and it will serve as the seed for the first recommendation panel. Customers then iteratively improve the product set that is on show by giving each individual item in the display a “like” or “dislike” rating:
“Like” feedback: When a consumer clicks the “more like this” button, Visual Scout substitutes products that closely resemble the one the customer just liked for the two that are the least visually similar.
“Dislike” feedback: On the other hand, Visual Scout substitutes a product that is aesthetically comparable to the anchor item for a product that a client votes with a ‘X’.
Visual Scout offers a fun and gamified shopping experience that promotes consumer engagement and, eventually, conversion because the service refreshes in real time.
Would you like to give it a try?
Go to this product page and look for the “Discover Similar Items” section to see Visual Scout in action. It’s not necessary to have an account, but make sure you choose a store from the menu in the top left corner of the website. This aids Visual Scout in suggesting products that are close to you.
The technology underlying Visual Scout
Many Google Cloud services support Visual Scout, including:
Dataproc: Batch processing tasks that use an item’s picture to feed a computer vision model as a prediction request in order to compute embeddings for new items; the predicted values are the image’s embedding representation.
Vertex AI Model Registry: a central location for overseeing the computer vision model’s lifecycle
Vertex  AI Feature Store: Low latency online serving and feature management for product image embeddings
For low latency online retrieval, Vertex AI Vector Search uses a serving index and vector similarity search.
BigQuery: Stores an unchangeable, enterprise-wide record of item metadata, including price, availability in the user’s chosen store, ratings, inventories, and restrictions.
Google Kubernetes Engine: Coordinates the Visual Scout application’s deployment and operation with the remainder of the online buying process.
Let’s go over a few of the most important activities in the reference architecture below to gain a better understanding of how these components are operationalized in production:Image credit to Google cloud
For a given item, the Visual Scout API generates a vector match request.
To obtain the most recent image embedding vector for an item, the request first makes a call to Vertex AI Feature Store.
Visual Scout then uses the item embedding to search a Vertex AI Vector Search index for the most similar embedding vectors, returning the corresponding item IDs.
Product-related metadata, such as inventory availability, is utilised to filter each visually comparable item so that only goods that are accessible at the user’s chosen store location are shown.
The Visual Scout API receives the available goods together with their metadata so that lowes.com can serve them.
An update job is started every day by a trigger to calculate picture embeddings for any new items.
Any new item photos are processed by Dataproc once it is activated, and it then embeds them using the registered machine vision model.
Providing live updates update the Vertex AI Vector Search providing index with updated picture embeddings
The Vertex AI Feature Store online serving nodes receive new image embedding vectors, which are indexed by the item ID and the ingestion timestamp.
Vertex AI low latency serving
Visual Scout uses Vector Search and Feature Store, two Vertex AI services, to replace items in the recommendation panel in real time.
To keep track of an item’s most recent embedding representation, utilise the Vertex AI Feature Store. This covers any newly available photos for an item as well as any net new additions to the product catalogue. In the latter scenario, the most recent embedding of an item is retained in online storage while the prior embedding representation is transferred to offline storage. The most recent embedding representation of the query item is retrieved by the Feature Store look-up from the online serving nodes at serving time, and it is then passed to the downstream retrieval job.
Visual Scout then has to identify the products that are most comparable to the query item among a variety of things in the database by analyzing their embedding vectors. Calculating the similarity between the query and candidate item vectors is necessary for this type of closest neighbor search, and at this size, this computation can easily become a retrieval computational bottleneck, particularly if an exhaustive (i.e., brute-force) search is being conducted. Vertex AI Vector Search uses an approximate search to get over this barrier and meet their low latency serving needs for vector retrieval.
Visual Scout can handle a large number of queries with little latency thanks to these two services. Google Cloud performance objectives are met by the 99th percentile reaction times, which come in at about 180 milliseconds and guarantee a snappy and seamless user experience.
Why does Vertex AI Vector Search happen so quickly?
From a billion-scale vector database, Vertex AI Vector Search is a managed service that offers effective vector similarity search and retrieval. This offering is the culmination of years of internal study and development because these features are essential to numerous Google Cloud initiatives. It’s important to note that ScaNN, an open-source vector search toolkit from Google Research, also makes a number of core methods and techniques openly available. The ultimate goal of ScaNN is to create reliable and repeatable benchmarking, which will further the field’s research. Offering a scalable vector search solution for applications that are ready for production is the goal of Vertex  AI Vector Search.
ScaNN overview
The 2020 ICML work “Accelerating Large-Scale Inference with Anisotropic Vector Quantization” by Google Research is implemented by ScaNN. The research uses a unique compression approach to achieve state-of-the-art performance on nearest neighbour search benchmarks. Four stages comprise the high-level process of ScaNN for vector similarity search:
Partitioning: ScaNN partitions the index using hierarchical clustering to minimise the search space. The index’s contents are then represented as a search tree, with the centroids of each partition serving as a representation for that partition. Typically, but not always, this is a k-means tree.
Vector quantization: this stage compresses each vector into a series of 4-bit codes using the asymmetric hashing (AH) technique, leading to the eventual learning of a codebook. Because only the database vectors not the query vectors are compressed, it is “asymmetric.”
AH generates partial-dot-product lookup tables during query time, and then utilises these tables to approximate dot products.
Rescoring: recalculate distances with more accuracy (e.g., lesser distortion or even raw datapoint) given the top-k items from the approximation scoring.
Constructing a serving-optimized index
The tree-AH technique from ScaNN is used by Vertex AI Vector Search to create an index that is optimized for low-latency serving. A tree-X hybrid model known as “tree-AH” is made up of two components: (1) a partitioning “tree” and (2) a leaf searcher, in this instance “AH” or asymmetric hashing. In essence, it blends two complimentary algorithms together:
Tree-X, a k-means tree, is a hierarchical clustering technique that divides the index into search trees, each of which is represented by the centroid of the data points that correspond to that division. This decreases the search space.
A highly optimised approximate distance computing procedure called Asymmetric Hashing (AH) is utilised to score how similar a query vector is to the partition centroids at each level of the search tree.
It learns an ideal indexing model with tree-AH, which effectively specifies the quantization codebook and partition centroids of the serving index. Additionally, when using an anisotropic loss function during training, this is even more optimised. The rationale is that for vector pairings with high dot products, anisotropic loss places an emphasis on minimising the quantization error. This makes sense because the quantization error is negligible if the dot product for a vector pair is low, indicating that it is unlikely to be in the top-k. But since Google Cloud want to maintain the relative ranking of a vector pair, they need be much more cautious about its quantization error if it has a high dot product.
To encapsulate the final point:
Between a vector’s quantized form and its original form, there will be quantization error.
Higher recall during inference is achieved by maintaining the relative ranking of the vectors.
At the cost of being less accurate in maintaining the relative ranking of another subset of vectors, Google can be more exact in maintaining the relative ranking of one subset of vectors.
Assisting applications that are ready for production
Vertex AI Vector Search is a managed service that enables users to benefit from ScaNN performance while providing other features to reduce overhead and create value for the business. These features include:
Updates to the indexes and metadata in real time allow for quick queries.
Multi-index deployments, often known as “namespacing,” involve deploying several indexes to a single endpoint.
By automatically scaling serving nodes in response to QPS traffic, autoscaling guarantees constant performance at scale.
Periodic index compaction to accommodate for new updates is known as “dynamic rebuilds,” which enhance query performance and reliability without interpreting service
Complete metadata filtering and diversity: use crowding tags to enforce diversity and limit the use of strings, numeric values, allow lists, and refuse lists in query results.
Read more on Govindhtech.com
0 notes
govindhtech · 11 months ago
Text
With AI Agents, LiveX AI Reduces Customer Care Expenses
Tumblr media
With  AI agents trained and supported by GKE and NVIDIA  AI, LiveX AI can cut customer care expenses by up to 85%.
For consumer companies, offering a positive customer experience is a crucial competitive advantage, but doing so presents a number of difficulties. Even if a website draws visitors, failing to personalize it can make it difficult to turn those visitors into paying clients. Call centers are expensive to run, and long wait times irritate consumers during peak call volumes. Though they are more scalable, traditional chatbots cannot replace a genuine human-to-human interaction.
Google AI agents
At the forefront of generative AI technology, LiveX AI creates personalized, multimodal AI agents with vision, hearing, conversation, and show capabilities to provide customers with experiences that are genuinely human-like. LiveX AI, a company founded by a group of seasoned business owners and eminent IT executives, offers companies dependable AI agents that generate robust consumer engagement on a range of platforms.
Real-time, immersive, human-like customer service is offered by LiveX AI generative AI agents, who respond to queries and concerns from clients in a friendly, conversational style. Additionally, agents must be quick and reliable in order to provide users with a positive experience. A highly efficient and scalable platform that can do away with the response latency that many AI agents have is necessary to create that user experience, especially on busy days like Black Friday.
GKE offers a strong basis for sophisticated generative AI applications
Utilising Google Kubernetes Engine (GKE) and the NVIDIA AI platform, Google  Cloud and LiveX AI worked together from the beginning to accelerate LiveX AI’s development. Within three weeks, LiveX AI was able to provide a customized solution for its client thanks to the assistance of Google Cloud. Furthermore, LiveX AI was able to access extra commercial and technical tools as well as have their cloud costs covered while they were getting started by taking part in the Google for Startups  Cloud Programme and the NVIDIA Inception programme.
The LiveX  AI team selected GKE, which enables them to deploy and run containerized apps at scale on a safe and effective global infrastructure, since it was a reliable solution that would enable them to ramp up quickly. Taking advantage of GKE’s flexible integration with distributed computing and data processing frameworks, training and serving optimized  AI workloads on NVIDIA GPUs is made simple by the platform orchestration capabilities of GKE.
GKE Autopilot
Developing multimodal AI agents for companies with enormous quantities of real-time consumer interactions is made easier with GKE Autopilot in particular, since it facilitates the easy scalability of applications to multiple clients. LiveX AI does not need to configure or monitor a Kubernetes cluster’s underlying compute when GKE Autopilot takes care of it.
LiveX AI has achieved over 50% reduced TCO, 25% faster time-to-market, and 66% lower operational costs with the use of GKE Autopilot. This has allowed them to concentrate on providing value to clients rather than setting up or maintaining the system.
Over 50% reduced TCO, 25% quicker time to market, and 66% lower operating costs were all made possible with GKE Autopilot for LiveX AI.
Zepp Health
Zepp Health, a direct-to-consumer (D2C) wellness product maker, is one of these clients. Zepp Health worked with LiveX AI to develop an AI customer agent for their Amazfit wristwatch and smart ring e-commerce website in the United States. In order to provide clients with individualized experiences in real time, the agent had to efficiently handle large numbers of customer interactions.
GKE was coupled with A2 Ultra virtual machines (VMs) running NVIDIA A100 80GB Tensor Core GPUs and NVIDIA NIM inference microservices for the Amazfit project. NIM, which is a component of the NVIDIA AI Enterprise software platform, offers a collection of user-friendly microservices intended for the safe and dependable implementation of high-performance AI model inference.
Applications were upgraded more quickly after they were put into production because to the use of Infrastructure as Code (IaC) techniques in the deployment of NVIDIA NIM Docker containers on GKE. The development and deployment procedures benefited greatly from NVIDIA hardware acceleration technologies, which maximized the effects of hardware optimization.
Amazfit  AI
Overall, compared to running the Amazfit  AI agent on another well-known inference platform, LiveX AI was able to achieve an astounding 6.1x acceleration in average answer/response generation speed by utilising GKE with NVIDIA NIM and NVIDIA A100 GPUs. Even better, it took only three weeks to complete the project.
Running on GKE with NVIDIA NIM and GPUs produced 6.1x acceleration in average answer/response generation speed for the Amazfit AI agent as compared to another inference platform.
For users of LiveX AI, this Implies
If effective AI-driven solutions are implemented, customer assistance expenses might be reduced by up to 85%.
First reaction times have significantly improved, going from hours to only seconds, compared to industry standards.
Increased customer satisfaction and a 15% decrease in returns as a result of quicker and more accurate remedies
Five times more lead conversion thanks to a smart, useful AI agent.
Wayne Huang, CEO of Zepp Health, states, “They believe in delivering a personal touch in every customer interaction.” “LiveX AI makes that philosophy a reality by giving their clients who are shopping for Amazfit a smooth and enjoyable experience.
Working together fosters AI innovation
Ultimately, GKE has made it possible for LiveX AI to quickly scale and provide clients with cutting-edge generative AI solutions that yield instant benefits. GKE offers a strong platform for the creation and implementation of cutting-edge generative AI applications since it is a safe, scalable, and affordable solution for managing containerized apps.
It speeds up developer productivity, increases application dependability with automated scaling, load balancing, and self-healing features, and streamlines the development process by making cluster construction and management easy.
Read more on govindhtech.com
0 notes
govindhtech · 11 months ago
Text
Deckmatch uses Cloud SQL for PostgreSQL for VC insights
Tumblr media
Utilising  Cloud SQL for PostgreSQL, Deckmatch is able to provide venture capitalists with valuable insights.
The flow of investment deals for venture capitalists is being revolutionised by Deckmatch through the utilisation of artificial intelligence (AI) and smart analytics. Deckmatch is able to provide investors with full information on startups and their competitors in a short amount of time since it stores extensive and detailed company data in Cloud SQL for PostgreSQL.
When the human labour that is required to generate this data is eliminated, investors are freed up to concentrate on efforts that are more strategic, and they are given the ability to make confident decisions regarding investments in a timely manner.
Prior to the establishment of Deckmatch, google cloud had a crystal clear vision: they wanted to develop a leading solution that provides venture capitalists with profound insights into potential portfolio firms and the competitive contexts in which they operate, while requiring just a little amount of manual labour. In order to accomplish this,they required a database that was not only strong but also versatile, capable of managing massive amounts of data, giving us the ability to implement artificial intelligence (AI), and supporting advanced vector-based search options.
The managed PostgreSQL database option that Google Cloud provides, known as  Cloud SQL, was the obvious choice given that they have relied on PostgreSQL for a considerable amount of time for key data operations due to its high performance, flexibility, and extensive support system.
Money is being put on cloud computing architecture
A Cloud SQL database that is equipped with pgvector and interfaces with a robust range of Google Cloud services is the core component of their cloud architecture. With the help of the pgvector extension for PostgreSQL, they are able to provide their platform with the capabilities of doing a similarity search that is quick, effective, and pertinent.
From this point forward, they are able to effortlessly carry out the intricate vector operations that constitute the basis of their sophisticated embedding algorithms for competitive mapping and semantic search. On the other hand, effective embedding storage and querying capabilities are the driving force behind speedy, contextual search capabilities that swiftly reveal relevant insights.
Through the utilisation of FastAPI, they developed a high-performance REST API in order to provide their customers with direct access to the Deckmatch platform through an intuitive user experience. In addition, they host the complete application on Google Kubernetes Engine (GKE), which provides us with a platform that is scalable, dependable, and secure for the deployment of their respective service.
Finally, the natural integration between Cloud SQL, GKE, and Vertex AI enables us to provide investors who are looking for in-depth startup knowledge with an experience that is both robust and responsive. In addition to allowing search accuracy and advanced analytics, Vertex AI helps customers stay up with the intricacies of the market by reducing the resources that are required by traditional search methods during the search process.
Cloud SQL offers a wealth of benefits
Through the utilisation of the possibilities made available by Cloud SQL, google cloud customers have reaped significant benefits. Through the use of Deckmatch‘s competitor landscape mapping service, venture capitalists are able to acquire the essential information they require in a short amount of time in order to comprehend the competitive climate and strategically position themselves. In addition, the semantic search capabilities that are offered by  Cloud SQL’s pgvector support make it possible to conduct sophisticated searches for businesses based on a variety of parameters and obtain results that are accurate and aware of the context.
As a fully managed PostgreSQL service, Cloud SQL eliminates operational overhead such as backups and updates, which frees up their small staff to concentrate on building core application features. This is the most significant benefit that Cloud SQL brings to their organisation. In addition to significantly improving the developer experience by providing streamlined provisioning and management tools, Cloud SQL’s user-friendly interface and seamless connection with other Google Cloud products make the process of setting up and managing continuing operations quite uncomplicated. On top of that, Cloud SQL expands in tandem with their application, which enables us to scale database resources in a flexible manner on demand.
In the future, their seed cash will be provided by Google Cloud
Their objective is to make Deckmatch the resource of choice for obtaining thorough information about firms that are in the pre-seed and seed stages of development. While they continue to develop their application, google cloud will continue to rely on the environment that Google Cloud provides in order to make it the most efficient and user-friendly resource available on the market.
Google  Cloud’s knowledge and resources have proven to be extremely beneficial to their company as they have expanded and will continue to develop. As a result of their participation in the artificial intelligence startup programme that is comprised of the Google for Startups Cloud Programme, they were awarded substantial credits that enabled us to greatly offset the initial expenditures of their infrastructure.
Because of this, google cloud were able to free up vital cash, which allowed us to make more substantial investments in development and to speed up their growth. Google Cloud has been an invaluable resource for us throughout their journey, providing us with a wealth of documentation resources and experienced technical assistance that has assisted us in optimising their cloud architecture.
Google cloud are confident that their strong vision, in conjunction with the technical capabilities and assistance offered by Google Cloud, has provided us with credibility within the startup ecosystem and given us the foundation for success.
Read more on govindhtech.com
0 notes