#HyperdiskML | Explore Tumblr posts and blogs

govindhtech · 6 months ago

Text

Hyperdisk ML: Integration To Speed Up Loading AI/ML Data

Hyperdisk ML can speed up the loading of AI/ML data. This tutorial explains how to use it to streamline and speed up the loading of AI/ML model weights on Google Kubernetes Engine (GKE). The main method for accessing Hyperdisk ML storage with GKE clusters is through the Compute Engine Persistent Disk CSI driver.

What is Hyperdisk ML?

You can scale up your applications with Hyperdisk ML, a high-performance storage solution. It is perfect for running AI/ML tasks that require access to a lot of data since it offers high aggregate throughput to several virtual machines at once.

Overview

It can speed up model weight loading by up to 11.9X when activated in read-only-many mode, as opposed to loading straight from a model registry. The Google Cloud Hyperdisk design, which enables scalability to 2,500 concurrent nodes at 1.2 TB/s, is responsible for this acceleration. This enables you to decrease pod over-provisioning and improve load times for your AI/ML inference workloads.

The following are the high-level procedures for creating and utilizing Hyperdisk ML:

Pre-cache or hydrate data in a disk image that is persistent: Fill Hyperdisk ML volumes with serving-ready data from an external data source (e.g., Gemma weights fetched from Cloud Storage). The disk image’s persistent disk needs to work with Google Cloud Hyperdisk.

Using an existing Google Cloud Hyperdisk, create a Hyperdisk ML volume: Make a Kubernetes volume that points to the data-loaded Hyperdisk ML volume. To make sure your data is accessible in every zone where your pods will operate, you can optionally establish multi-zone storage classes.

To use it volume, create a Kubernetes deployment: For your applications to use, refer to the Hyperdisk ML volume with rapid data loading.

Multi-zone Hyperdisk ML volumes

There is just one zone where hyperdisk ML disks are accessible. Alternatively, you may dynamically join many zonal disks with identical content under a single logical PersistentVolumeClaim and PersistentVolume by using the Hyperdisk ML multi-zone capability. The multi-zone feature’s referenced zonal disks have to be in the same area. For instance, the multi-zone disks (such as us-central1-a and us-central1-b) must be situated in the same area if your regional cluster is established in us-central1.

Running Pods across zones for increased accelerator availability and cost effectiveness with Spot VMs is a popular use case for AI/ML inference. Because it is zonal, GKE will automatically clone the disks across zones if your inference server runs several pods across zones to make sure your data follows your application.Image Credit To Google Cloud

The limitations of multi-zone Hyperdisk ML volumes are as follows:

There is no support for volume resizing or volume snapshots.

Only read-only mode is available for multi-zone Hyperdisk ML volumes.

GKE does not verify that the disk content is consistent across zones when utilizing pre-existing disks with a multi-zone Hyperdisk ML volume. Make sure your program considers the possibility of inconsistencies between zones if any of the disks have divergent material.

Requirements

The following Requirements must be met by your clusters in order to use it volumes in GKE:

Use Linux clusters with GKE 1.30.2-gke.1394000 or above installed. Make sure the release channel contains the GKE version or above that is necessary for this driver if you want to use one.

A driver for the Compute Engine Persistent Disk (CSI) must be installed. On new Autopilot and Standard clusters, the Compute Engine Persistent Disc driver is on by default and cannot be turned off or changed while Autopilot is in use. See Enabling the Compute Engine Persistent Disk CSI Driver on an Existing Cluster if you need to enable the Cluster’s Compute Engine Persistent Disk CSI driver.

You should use GKE version 1.29.2-gke.1217000 or later if you wish to adjust the readahead value.

You must use GKE version 1.30.2-gke.1394000 or later in order to utilize the multi-zone dynamically provisioned capability.

Only specific node types and zones allow hyperdisk ML.

Conclusion

This source offers a thorough tutorial on how to use Hyperdisk ML to speed up AI/ML data loading on Google Kubernetes Engine (GKE). It explains how to pre-cache data in a disk image, create a it volume that your workload in GKE can read, and create a deployment to use this volume. The article also discusses how to fix problems such a low it throughput quota and provides advice on how to adjust readahead numbers for best results.

Read more on Govindhtech.com

#AI #ML #HyperdiskML #GoogleKubernetesEngine #GKE #VMs #Kubernetes #News #Technews #Technology #Technologynews #Technologytrends #Govindhtech

0 notes

govindhtech · 6 months ago

Text

Google Cloud Storage Fuse Speeds Model + Weight Load Times

Cloud Storage Fuse or Hyperdisk ML accelerates model + weight load times from Google Cloud Storage.

The amount of model data required to support more complex AI models is growing. Costs and the end-user experience may be impacted by the seconds or even minutes of scaling delay that comes with loading the models, weights, and frameworks required to provide them for inference.

Inference servers like Triton, Text Generation Inference (TGI), or vLLM, for instance, are packed as containers that are often larger than 10GB; this might cause them to download slowly and prolong the time it takes for pods to start up in Kubernetes. The data loading issue is exacerbated by the fact that the inference pod must load model weights, which may be hundreds of gigabytes in size, after it has started.

In order to reduce the total time required to load your AI/ML inference workload on Google Kubernetes Engine (GKE), this article examines methods for speeding up data loading for both downloading models + weights and inference serving containers.

Using secondary boot drives to cache container images with your inference engine and relevant libraries directly on the GKE node can speed up container load times.

Using Cloud Storage Fuse or Hyperdisk ML to speed up model + weight load times from Google Cloud Storage.

In order to skip the image download procedure during pod/container starting, the graphic above depicts a secondary boot disk (1) that saves the container image in advance. Additionally, Cloud Storage Fuse (2) and Hyperdisk ML (3) are choices to link the pod to model + weight data saved in Cloud Storage or a network connected disk for AI/ML inference applications with demanding performance and scalability requirements. Below, we’ll examine each of these strategies in further depth.

Accelerating container load times with secondary boot disks

During construction, GKE enables you to pre-cache your container image onto a secondary boot drive that is connected to your node. By loading your containers in this manner, you may start launching them right away and avoid the image download stage, which significantly reduces startup time. The graphic below illustrates how download times for container images increase linearly with picture size. A cached copy of the container image that is already loaded on the node is then used to compare those timings.

When a 16GB container image is cached on a secondary boot drive in advance, load times may be as much as 29 times faster than when the image is downloaded from a container registry. Furthermore, you may take advantage of the acceleration regardless of container size using this method, which makes it possible for big container pictures to load consistently quickly!

To use secondary boot disks, first construct a disk containing all of your images, then make an image from the disk and provide the disk image when you build secondary boot disks for your GKE node pools. See the documentation for more details.

Accelerating model weights load times

Checkpoints, or snapshots of model weights, are produced by many machine learning frameworks and stored in object storage like Google Cloud Storage, which is a popular option for long-term storage. The two primary solutions for retrieving your data at the GKE-pod level using Cloud Storage as the source of truth are Hyperdisk ML (HdML) and Cloud Storage Fuse.

There are two primary factors to take into account while choosing a product:

Performance: the speed at which the GKE node can load the data

Operational simplicity: how simple is it to make changes to this information?

For model weights stored in object storage buckets, Cloud Storage Fuse offers a direct connection to Cloud Storage. To avoid repeated downloads from the source bucket, which increase latency, there is also a caching option for files that must be read more than once.

Because a pod doesn’t need to do any pre-hydration operational tasks in order to download new files into a designated bucket, Cloud Storage Fuse is an attractive option. Note that you will need to restart the pod with an updated Cloud Storage Fuse configuration if you alter the buckets to which the pod is attached. You may increase speed even further by turning on parallel downloads, which cause many workers to download a model at once, greatly enhancing model pull performance.

Compared to downloading files straight to the pod from Cloud Storage or another online source, Hyperdisk ML offers you superior speed and scalability. Furthermore, a single Hyperdisk ML instance may support up to 2500 nodes with an aggregate bandwidth of up to 1.2 TiB/sec. Because of this, it is a good option for inference tasks involving several nodes and read-only downloads of the same data. In order to utilize Hyperdisk ML, put your data into the disk both before and after each update. Keep in mind that if your data changes often, this adds operational overhead.

As you can see, while designing a successful model loading approach, there are more factors to consider in addition to throughput.

In conclusion

Workload starting times may be prolonged when loading big AI models, weights, and container pictures into GKE-based AI models. Using Hyperdisk ML OR Cloud Storage Fuse for models + weights, or a secondary boot disk for container images, a mix of the three previously mentioned techniques Prepare to speed up your AI/ML inference apps’ data load times.