#AIWorkloads | Explore Tumblr posts and blogs

govindhtech · 9 months ago

Text

Dell PowerEdge XE9680L Cools and Powers Dell AI Factory

When It Comes to Cooling and Powering Your AI Factory, Think Dell. As part of the Dell AI Factory initiative, the company is thrilled to introduce a variety of new server power and cooling capabilities.

Dell PowerEdge XE9680L Server

As part of the Dell AI Factory, they’re showcasing new server capabilities after a fantastic Dell Technologies World event. These developments, which offer a thorough, scalable, and integrated method of imaplementing AI solutions, have the potential to completely transform the way businesses use artificial intelligence.

These new capabilities, which begin with the PowerEdge XE9680L with support for NVIDIA B200 HGX 8-way NVLink GPUs (graphics processing units), promise unmatched AI performance, power management, and cooling. This offer doubles I/O throughput and supports up to 72 GPUs per rack 107 kW, pushing the envelope of what’s feasible for AI-driven operations.

Integrating AI with Your Data

In order to fully utilise AI, customers must integrate it with their data. However, how can they do this in a more sustainable way? Putting in place state-of-the-art infrastructure that is tailored to meet the demands of AI workloads as effectively as feasible is the solution. Dell PowerEdge servers and software are built with Smart Power and Cooling to assist IT operations make the most of their power and thermal budgets.

Astute Cooling

Effective power management is but one aspect of the problem. Recall that cooling ability is also essential. At the highest workloads, Dell’s rack-scale system, which consists of eight XE9680 H100 servers in a rack with an integrated rear door heat exchanged, runs at 70 kW or less, as we disclosed at Dell Technologies World 2024. In addition to ensuring that component thermal and reliability standards are satisfied, Dell innovates to reduce the amount of power required to maintain cool systems.

Together, these significant hardware advancements including taller server chassis, rack-level integrated cooling, and the growth of liquid cooling, which includes liquid-assisted air cooling, or LAAC improve heat dissipation, maximise airflow, and enable larger compute densities. An effective fan power management technology is one example of how to maximise airflow. It uses an AI-based fuzzy logic controller for closed-loop thermal management, which immediately lowers operating costs.

Constructed to Be Reliable

Dependability and the data centre are clearly at the forefront of Dell’s solution development. All thorough testing and validation procedures, which guarantee that their systems can endure the most demanding situations, are clear examples of this.

A recent study brought attention to problems with data centre overheating, highlighting how crucial reliability is to data centre operations. A Supermicro SYS‑621C-TN12R server failed in high-temperature test situations, however a Dell PowerEdge HS5620 server continued to perform an intense workload without any component warnings or failures.

Announcing AI Factory Rack-Scale Architecture on the Dell PowerEdge XE9680L

Dell announced a factory integrated rack-scale design as well as the liquid-cooled replacement for the Dell PowerEdge XE9680.

The GPU-powered Since the launch of the PowerEdge product line thirty years ago, one of Dell’s fastest-growing products is the PowerEdge XE9680. immediately following the Dell PowerEdge. Dell announced an intriguing new addition to the PowerEdge XE product family as part of their next announcement for cloud service providers and near-edge deployments.

AI computing has advanced significantly with the Direct Liquid Cooled (DLC) Dell PowerEdge XE9680L with NVIDIA Blackwell Tensor Core GPUs. This server, shown at Dell Technologies World 2024 as part of the Dell AI Factory with NVIDIA, pushes the limits of performance, GPU density per rack, and scalability for AI workloads.

The XE9680L’s clever cooling system and cutting-edge rack-scale architecture are its key components. Why it matters is as follows:

GPU Density per Rack, Low Power Consumption, and Outstanding Efficiency

The most rigorous large language model (LLM) training and large-scale AI inferencing environments where GPU density per rack is crucial are intended for the XE9680L. It provides one of the greatest density x86 server solutions available in the industry for the next-generation NVIDIA HGX B200 with a small 4U form factor.

Efficient DLC smart cooling is utilised by the XE9680L for both CPUs and GPUs. This innovative technique maximises compute power while retaining thermal efficiency, enabling a more rack-dense 4U architecture. The XE9680L offers remarkable performance for training large language models (LLMs) and other AI tasks because it is tailored for the upcoming NVIDIA HGX B200.

More Capability for PCIe 5 Expansion

With its standard 12 x PCIe 5.0 full-height, half-length slots, the XE9680L offers 20% more FHHL PCIe 5.0 density to its clients. This translates to two times the capability for high-speed input/output for the North/South AI fabric, direct storage connectivity for GPUs from Dell PowerScale, and smooth accelerator integration.

The XE9680L’s PCIe capacity enables smooth data flow whether you’re managing data-intensive jobs, implementing deep learning models, or running simulations.

Rack-scale factory integration and a turn-key solution

Dell is dedicated to quality over the XE9680L’s whole lifecycle. Partner components are seamlessly linked with rack-scale factory integration, guaranteeing a dependable and effective deployment procedure.

Bid farewell to deployment difficulties and welcome to faster time-to-value for accelerated AI workloads. From PDU sizing to rack, stack, and cabling, the XE9680L offers a turn-key solution.

With the Dell PowerEdge XE9680L, you can scale up to 72 Blackwell GPUs per 52 RU rack or 64 GPUs per 48 RU rack.

With pre-validated rack infrastructure solutions, increasing power, cooling, and AI fabric can be done without guesswork.

AI factory solutions on a rack size, factory integrated, and provided with “one call” support and professional deployment services for your data centre or colocation facility floor.

Dell PowerEdge XE9680L

The PowerEdge XE9680L epitomises high-performance computing innovation and efficiency. This server delivers unmatched performance, scalability, and dependability for modern data centres and companies. Let’s explore the PowerEdge XE9680L’s many advantages for computing.

Superior performance and scalability

Enhanced Processing: Advanced processing powers the PowerEdge XE9680L. This server performs well for many applications thanks to the latest Intel Xeon Scalable CPUs. The XE9680L can handle complicated simulations, big databases, and high-volume transactional applications.

Flexibility in Memory and Storage: Flexible memory and storage options make the PowerEdge XE9680L stand out. This server may be customised for your organisation with up to 6TB of DDR4 memory and NVMe, SSD, and HDD storage. This versatility lets you optimise your server’s performance for any demand, from fast data access to enormous storage.

Strong Security and Management

Complete Security: Today’s digital world demands security. The PowerEdge XE9680L protects data and system integrity with extensive security features. Secure Boot, BIOS Recovery, and TPM 2.0 prevent cyberattacks. Our server’s built-in encryption safeguards your data at rest and in transit, following industry standards.

Advanced Management Tools

Maintaining performance and minimising downtime requires efficient IT infrastructure management. Advanced management features ease administration and boost operating efficiency on the PowerEdge XE9680L. Dell EMC OpenManage offers simple server monitoring, management, and optimisation solutions. With iDRAC9 and Quick Sync 2, you can install, update, and troubleshoot servers remotely, decreasing on-site intervention and speeding response times.

Excellent Reliability and Support

More efficient cooling and power

For optimal performance, high-performance servers need cooling and power control. The PowerEdge XE9680L’s improved cooling solutions dissipate heat efficiently even under intense loads. Airflow is directed precisely to prevent hotspots and maintain stable temperatures with multi-vector cooling. Redundant power supply and sophisticated power management optimise the server’s power efficiency, minimising energy consumption and running expenses.

A proactive support service

The PowerEdge XE9680L has proactive support from Dell to maximise uptime and assure continued operation. Expert technicians, automatic issue identification, and predictive analytics are available 24/7 in ProSupport Plus to prevent and resolve issues before they affect your operations. This proactive assistance reduces disruptions and improves IT infrastructure stability, letting you focus on your core business.

Innovation in Modern Data Centre Design Scalable Architecture

The PowerEdge XE9680L’s scalable architecture meets modern data centre needs. You can extend your infrastructure as your business grows with its modular architecture and easy extension and customisation. Whether you need more storage, processing power, or new technologies, the XE9680L can adapt easily.

Ideal for virtualisation and clouds

Cloud computing and virtualisation are essential to modern IT strategies. Virtualisation support and cloud platform integration make the PowerEdge XE9680L ideal for these environments. VMware, Microsoft Hyper-V, and OpenStack interoperability lets you maximise resource utilisation and operational efficiency with your visualised infrastructure.

Conclusion

Finally, the PowerEdge XE9680L is a powerful server with flexible memory and storage, strong security, and easy management. Modern data centres and organisations looking to improve their IT infrastructure will love its innovative design, high reliability, and proactive support. The PowerEdge XE9680L gives your company the tools to develop, innovate, and succeed in a digital environment.

Read more on govindhtech.com

2 notes · View notes

daniiltkachev · 4 days ago

Link

#AIworkloads #cloudcosts #CloudOptimization #FinOps #GPUSharing #Kubernetes #MLOps #RafaySystems

0 notes

techinewswp · 21 days ago

Text

#Nvidia #Semiconductors #AIRevolution #TechIndustry #GartnerReport #SemiconductorRevenue #TariffsImpact #TechStocks #GPUs #AIWorkloads #NvidiaStock #TechTrends2024 #GlobalEconomy

0 notes

ai-network · 3 months ago

Text

Enhancing Cloud Storage for AI Workloads - Lightbits Labs

How Lightbits Certification on Oracle Cloud Boosts AI-driven Efficiency

In today’s rapidly evolving digital landscape, the demand for high-performance, cost-effective, and reliable storage solutions is paramount, especially for AI-driven workloads. Lightbits Labs’ recent certification on Oracle Cloud Infrastructure (OCI) marks a significant milestone in cloud storage, bringing optimized storage capabilities that cater to AI’s unique needs for low latency and high-speed data access. Here’s how this development is set to reshape the handling of AI workloads, particularly for enterprises managing mission-critical applications and real-time analytics. The Need for Optimized Cloud Storage in AI Workloads AI workloads demand a robust infrastructure that can handle large volumes of data, process complex algorithms, and deliver insights with minimal delay. For companies operating in AI-heavy sectors such as finance, healthcare, and real-time analytics, latency can be a barrier, impacting the speed and accuracy of results. Traditional storage solutions may struggle to keep up with the sub-millisecond latencies required by high-intensity applications, which can result in inefficiencies, delays, and even operational risks. Lightbits Labs, a pioneer in NVMe® over TCP technology, recognized this gap and took steps to address it. With the recent certification of Lightbits on OCI, enterprises now have access to a cloud storage solution tailored to support AI and other data-intensive applications seamlessly. This certification opens new doors for enterprises needing high-speed, resilient, and scalable storage that can meet their AI demands without the high costs typically associated with high-performance storage options. Lightbits and Oracle Cloud Infrastructure: A Strategic Partnership Oracle Cloud Infrastructure (OCI) is renowned for its commitment to innovation, performance, and scalability. By certifying Lightbits Labs, OCI ensures that its clients gain access to Lightbits’ advanced storage capabilities, specifically optimized for AI workloads. This partnership aims to enable organizations to run latency-sensitive, input/output (I/O)-intensive applications on a platform that prioritizes speed, resilience, and scalability, all while maintaining operational efficiency. Key benefits include: - Cost-Effective Scaling: Lightbits on OCI allows organizations to scale dynamically based on workload demands. This elasticity is crucial for companies facing fluctuating data volumes in AI applications. - Superior Latency Management: With sub-millisecond tail latencies, Lightbits addresses one of AI’s most pressing challenges – reducing the delay between data retrieval and processing. - Seamless Integration: Lightbits’ compatibility with Kubernetes, OpenStack, and VMware environments enables companies to integrate the storage solution smoothly into their existing workflows. Benchmarks that Set New Standards One of the standout features of Lightbits’ certification on OCI is the impressive benchmark results, setting a new standard for cloud storage performance: - Random Read and Write Performance: In recent FIO benchmarks, Lightbits demonstrated 3 million 4K random read IOPS (Input/Output Operations Per Second) and 830K 4K random write IOPS per client on OCI, fully utilizing OCI’s 100GbE network card. This performance level is instrumental in supporting real-time analytics, machine learning model training, and other AI-intensive tasks that rely on fast data retrieval. - Sub-300 Microsecond Latency: For both 4K random read and write operations, Lightbits achieved sub-300 microsecond latencies, a feat that reduces operational delays, allowing AI models to retrieve and analyze data faster than ever before. These benchmarks highlight Lightbits' efficiency and power in a way that few other storage solutions can match, making it an attractive choice for enterprises that rely on AI-driven insights to make critical decisions. Real-world Applications and Benefits The real-world applications of this optimized cloud storage solution are expansive. For instance, financial institutions running risk analysis models or healthcare companies conducting diagnostic imaging require storage solutions that offer both speed and reliability. By implementing Lightbits on OCI, these organizations can expect faster processing times and more reliable storage solutions, empowering them to make timely and data-driven decisions. Another critical application is in e-commerce, where real-time customer behavior analytics play a role in targeted marketing and inventory management. With Lightbits on OCI, e-commerce businesses can harness fast data processing to drive their marketing campaigns and ensure stock availability, even during high-demand periods like holidays or flash sales. The Future of AI-driven Storage Solutions The partnership between Lightbits Labs and Oracle Cloud Infrastructure signals a transformative shift in cloud storage, one that places AI and high-performance computing at the forefront. As AI applications become more pervasive, the demand for ultra-fast, scalable, and resilient storage solutions will only grow. Lightbits' innovation in NVMe® over TCP, combined with OCI’s robust infrastructure, sets a strong precedent for future developments in the field, driving more efficient, accessible, and powerful storage options for businesses worldwide. With Lightbits Labs and OCI at the helm, organizations can now deploy and scale AI workloads with a higher degree of efficiency, cost-effectiveness, and operational speed. This collaboration offers a clear advantage for companies eager to harness the full potential of AI without compromising on storage performance, creating a promising outlook for AI-enabled business operations in the cloud. Read the full article

#AIworkloads #cloudoptimization #cloudstoragesolutions #data-intensiveworkloads #enterprisedatastorage #high-performancestorage #Kubernetesintegration #LightbitsLabs #lowlatency #mission-criticalapplications #NVMeoverTCP #OCIcertification #OracleCloudInfrastructure #real-timeanalytics #scalablestorage

0 notes

galaxydigitalstore · 5 years ago

Photo

NVIDIA Jetson AGX Xavier Developer Kit (32GB)

The NVIDIA Jetson AGX XAVIER Developer Kit is capable to run modern AIworkloads and solve problems in optical inspection, manufacturing, robotics, logistics, retail, service, agriculture, smart cities, and healthcare.

#NVIDIA Jetson AGX Xavier Developer Kit #NVIDIA Jetson AGX Xavier #NVIDIA Jetson AGX

1 note · View note

govindhtech · 7 months ago

Text

Google Cloud Parallelstore Powering AI And HPC Workloads

Parallelstore

Businesses process enormous datasets, execute intricate simulations, and train generative models with billions of parameters using artificial intelligence (AI) and high-performance computing (HPC) applications for a variety of use cases, including LLMs, genomic analysis, quantitative analysis, and real-time sports analytics. Their storage systems are under a lot of performance pressure from these workloads, which necessitate high throughput and scalable I/O performance that keeps latencies under a millisecond even when thousands of clients are reading and writing the same shared file at the same time.

Google Cloud is thrilled to share that Parallelstore, which is unveiled at Google Cloud Next 2024, is now widely available to power these next-generation AI and HPC workloads. Parallelstore, which is based on the Distributed Asynchronous Object Storage (DAOS) architecture, combines a key-value architecture with completely distributed metadata to provide high throughput and IOPS.

Continue reading to find out how Google Parallelstore meets the demands of demanding AI and HPC workloads by enabling you to provision Google Kubernetes Engine and Compute Engine resources, optimize goodput and GPU/TPU utilization, and move data in and out of Parallelstore programmatically.

Optimize throughput and GPU/TPU use

It employs a key-value store architecture along with a distributed metadata management system to get beyond the performance constraints of conventional parallel file systems. Due to its high-throughput parallel data access, it may overwhelm each computing client’s network capacity while reducing latency and I/O constraints. Optimizing the expenses of AI workloads is dependent on maximizing good output to GPUs and TPUs, which is achieved through efficient data transmission. Google Cloud may also meet the needs of modest-to-massive AI and HPC workloads by continuously granting read/write access to thousands of virtual machines, graphics processing units, and TPUs.

The largest Parallelstore deployment of 100 TiB yields throughput scaling to around 115 GiB/s, with a low latency of ~0.3 ms, 3 million read IOPS, and 1 million write IOPS. This indicates that a large number of clients can benefit from random, dispersed access to small files on Parallelstore. According to Google Cloud benchmarks, Parallelstore’s performance with tiny files and metadata operations allows for up to 3.7x higher training throughput and 3.9x faster training timeframes for AI use cases when compared to native ML framework data loaders.

Data is moved into and out of Parallelstore using programming

For data preparation or preservation, cloud storage is used by many AI and HPC applications. You may automate the transfer of the data you want to import into Parallelstore for processing by using the integrated import/export API. With the help of the API, you may ingest enormous datasets into Parallelstore from Cloud Storage at a rate of about 20 GB per second for files bigger than 32 MB and about 5,000 files per second for smaller files.

gcloud alpha parallelstore instances import-data $INSTANCE_ID –location=$LOCATION –source-gcs-bucket-uri=gs://$BUCKET_NAME [–destination-parallelstore-path=”/”] –project= $PROJECT_ID

You can programmatically export results from an AI training task or HPC workload to Cloud Storage for additional evaluation or longer-term storage. Moreover, data pipelines can be streamlined and manual involvement reduced by using the API to automate data transfers.

gcloud alpha parallelstore instances export-data $INSTANCE_ID –location=$LOCATION –destination-gcs-bucket-uri=gs://$BUCKET_NAME [–source-parallelstore-path=”/”]

GKE resources are programmatically provisioned via the CSI driver

The Parallelstores GKE CSI driver makes it simple to effectively manage high-performance storage for containerized workloads. Using well-known Kubernetes APIs, you may access pre-existing Parallelstore instances in Kubernetes workloads or dynamically provision and manage Parallelstore file systems as persistent volumes within your GKE clusters. This frees you up to concentrate on resource optimization and TCO reduction rather than having to learn and maintain a different storage system.

ApiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: parallelstore-class provisioner: parallelstore.csi.storage.gke.io volumeBindingMode: Immediate reclaimPolicy: Delete allowedTopologies:

matchLabelExpressions:

key: topology.gke.io/zone values:

us-central1-a

The fully managed GKE Volume Populator, which automates the preloading of data from Cloud Storage straight into Parallelstore during the PersistentVolumeClaim provisioning process, will be available to preload data from Cloud Storage in the upcoming months. This helps guarantee that your training data is easily accessible, allowing you to maximize GPU and TPU use and reduce idle compute-resource time.

Utilizing the Cluster Toolkit, provide Compute Engine resources programmatically

With the Cluster Toolkit’s assistance, deploying Parallelstore instances for Compute Engine is simple. Cluster Toolkit is open-source software for delivering AI and HPC workloads; it was formerly known as Cloud HPC Toolkit. Using best practices, Cluster Toolkit allocates computing, network, and storage resources for your cluster or workload. With just four lines of code, you can integrate the Parallelstore module into your blueprint and begin using Cluster Toolkit right away. For your convenience, we’ve also included starter blueprints. Apart from the Cluster Toolkit, Parallelstore may also be deployed using Terraform templates, which minimize human overhead and support operations and provisioning processes through code.

Respo.vision

Leading sports video analytics company Respo. Vision is using it to speed up its real-time system’s transition from 4K to 8K video. Coaches, scouts, and fans can receive relevant information by having Respo.vision help gather and label granular data markers utilizing Parallelstore as the transport layer. Respo.vision was able to maintain low computation latency while managing spikes in high-performance video processing without having to make costly infrastructure upgrades because to Parallelstore.

The use of AI and HPC is expanding quickly. The storage solution you need to maintain the demanding GPU/TPUs and workloads is Parallelstore, with its novel architecture, performance, and integration with Cloud Storage, GKE, and Compute Engine.

Read more on govindhtech.com

#GoogleCloud #ParallelstorePoweringAI #AIworkloads #HPCWorkloads #artificialintelligence #AI #news #cloudstorage #GoogleCloudNext2024 #ClusterToolkit #ComputeEngine #GoogleKubernetesEngine #technology #technews #GOVINDHTECH

0 notes

govindhtech · 7 months ago

Text

ProLiant DL145: HPE ProLiant Gen11 Servers On The Edge

Providing up-to-date information for shops on the edge Server HPE ProLiant DL145 Gen11

HPE ProLiant Servers

ProLiant HPE servers are designed to function well in a hybrid environment. Increase the value of your data and hasten the AI results that produce new ideas and insights.

Boost the results of your AI

New HPE ProLiant Compute servers are geared for enterprise AI workloads like computer vision inference, generative visual AI, and end-to-end natural language processing as part of the NVIDIA AI Computing by HPE portfolio.

HPE ProLiant Gen11 Servers

HPE ProLiant DL145 Gen11

Accelerate your business globally with computation that is optimized for the edge. Driven by a 4th generation AMD EPYC 8004 processor, it is intended to facilitate virtualization, AI workloads at the edge, and important business applications.

Made with the Edge in mind

The HPE ProLiant DL145 Gen11 is designed to perform well in settings on the edge. It operates efficiently in temperatures ranging from -5℃ to 55℃ and features built-in air filtration for dusty spaces and vibration tolerance. It is an industry game-changer due to its excellent performance, enterprise-grade security, sturdy design, and easy maintenance.

With the introduction of the HPE ProLiant DL145 Gen11 server, HPE is assisting businesses in enhancing performance for their most demanding workloads and applications at the edge. This server offers real-time services and smooth deployment to distributed businesses like industrial and retail client

The HPE ProLiant DL145 Gen11’s compact size, engineered for a variety of edge locations, makes it ideal for high-performance environments including retail outlets, clinics, banks, and manufacturing lines.

Other engineering features include built-in air filtration for dusty spaces, a high level of energy efficiency, vibration-tolerant operation, and quieter performance than data center servers. The HPE ProLiant DL145 Gen11’s location flexibility now enables distributed organizations to execute their edge initiatives without the complexities of deploying a server designed for the data center into an edge location.

In addition to supporting a variety of industry apps like inventory management, pricing, and point of sale, the HPE ProLiant DL145 Gen11 expands the HPE ProLiant Gen11 edge server portfolio. It also offers edge-specific analytics solutions, business intelligence, content delivery, and workloads related to artificial intelligence (AI) and machine learning (ML). A growing ecosystem of ISV partners is also in place to deliver industry specific solutions optimized for edge scenarios, such as loss prevention and video analytics for retail, ormanufacturing supply chain, predictive maintenance, and process automation.

The cloud-native management solution HPE GreenLake for Compute Ops Management makes it simple and secure to deploy the HPE ProLiant DL145 Gen11.

Organizations can ship servers to remote locations with zero-touch deployment capabilities, making it easier for non-IT staff to onboard securely. At the same time, automated management capabilities allow centralized IT staff to access, monitor, and manage servers from any location where the compute environment is located.

Any hybrid strategy starts with the HPE ProLiant DL145 Gen11, which helps move services closer to the edge where data is created and security is crucial. By lowering their dependency on remote data centers or cloud resources, organizations of all sizes can allow real-time insights with onsite data processing for quicker decision making, resulting in decreased latency, reduced bandwidth use, and reduced connectivity costs. With up to 64 cores and a fourth generation AMD EPYC 8004 processor, the HPE ProLiant DL145 Gen11 is an extremely powerful server that can run enterprise applications quickly and accommodate up to 128 virtual machines.

Organizations that are using data from users, devices, and the Internet of Things to create new innovations at the edge continue to prioritize security; integrating security into infrastructure is a good idea. Going all the way back to HPE’s founding principles, the business carries on its heritage of protecting computing workloads with its exclusive security innovation, which originates in the silicon and is enhanced in the firmware.

Read more on govindhtech.com

#ProLiantDL145 #HPEProLiant #AMDEPYC8004processor #HPE #ai #amd #Gen11Servers #AIworkloads #NVIDIAAIComputing #artificialintelligence #HPEProLiantGen11Servers #technology #technews #news #govindhtech

0 notes

govindhtech · 8 months ago

Text

NVIDIA DGX SuperPOD In Dell PowerScale Storage offer Gen AI

Boost Productivity: NVIDIA DGX SuperPOD Certified PowerScale Storage.

Dell PowerScale storage

NVIDIA DGX SuperPOD with Dell PowerScale storage provide groundbreaking generative AI. With fast technological growth, AI is impacting several businesses. Generative AI lets computers synthesize and construct meaning. At the NVIDIA GTC global AI conference, Dell PowerScale file storage became the first Ethernet-based storage authorized for NVIDIA DGX SuperPOD. With this technology, organizations may maximize AI-enabled app potential.

- Advertisement -

DGX SuperPODs

The Development of Generative AI

Generative AI, which has transformed technology, allows robots to generate, copy, and learn from patterns in enormous datasets without human input. Generative AI might transform healthcare, industry, banking, and entertainment. There is an extraordinary requirement for advanced storage solutions to handle AI applications’ huge data sets.

Accreditation

Dell has historically led innovation, providing cutting-edge solutions to address commercial corporate needs. According to Dell, Dell PowerScale is the first Ethernet-based storage solution for NVIDIA DGX SuperPOD, improving storage interoperability. By simplifying AI infrastructure, this technology helps enterprises maximize AI initiatives.

“The world’s first Ethernet storage certification for NVIDIA DGX SuperPOD with Dell PowerScale combines Dell’s industry-leading storage and NVIDIA’s AI supercomputing systems, empowering organizations to unlock AI’s full potential, drive breakthroughs, and achieve the seemingly impossible,” says Martin Glynn, senior director of product management at Dell Technologies. “With Dell PowerScale’s certification as the first Ethernet storage to work with NVIDIA DGX SuperPOD, enterprises can create scalable AI infrastructure with greater flexibility.”

PowerScale Storage

Exceptional Performance for Next-Generation Tasks with PowerScale

Due to its remarkable scalability, performance, and security, which are based on more than ten years of expertise, Dell PowerScale has earned respect and acclaim. With its NVIDIA DGX SuperPOD certification, Dell’s storage offering is even more robust for businesses looking to use generative AI.

- Advertisement -

This is how PowerScale differs:

Improved access to the network: NVIDIA Magnum IO, GPUDirect Storage, and NFS over RDMA are examples of natively integrated technologies that speed up network access to storage in NVIDIA ConnectX NICs and NVIDIA Spectrum switches. By significantly reducing data transfer times to and from PowerScale storage, these cutting-edge capabilities guarantee higher storage throughput for workloads like AI training, checkpointing, and inferencing.

Achieving peak performance: A new Multipath Client Driver from Dell PowerScale increases data throughput and enables businesses to meet DGX SuperPOD‘s high performance requirements. With the help of this cutting-edge functionality, companies can quickly train and infer AI models, allowing for a smooth integration with the potent NVIDIA DGX platform.

Flexibility: Because of PowerScale’s innovative architecture, organizations can easily scale by only adding more nodes, giving them unmatched agility. Because of this flexibility, businesses can develop and adjust their storage infrastructure in tandem with their increasing AI workloads, preventing bottlenecks for even the most demanding AI use cases.

Security on a federal level: PowerScale has been approved for the U.S. Department of Defense Approved Product List due to its outstanding security capabilities. The accomplishment of this demanding procedure strengthens the safety of vital data assets and highlights PowerScale’s suitability for mission-critical applications.

Effectiveness: The creative architecture of PowerScale is intended to maximize effectiveness. It reduces operating expenses and environmental effect while optimizing performance and minimizing power consumption via the use of cutting-edge technology.

NVIDIA DGX SuperPOD

Dell created the standard architecture below in order to hasten the PowerScale with NVIDIA DGX SuperPOD solutions implementation. The document provides examples of how PowerScale and NVIDIA DGX SuperPOD work together seamlessly.

An important turning point in the quickly changing field of artificial intelligence has been reached with the accreditation of Dell PowerScale as the first Ethernet-based storage solution approved for NVIDIA DGX SuperPOD worldwide. Organizations may now use generative AI with scalability and performance thanks to Dell and NVIDIA’s partnership. Businesses may fully use AI and revolutionize their operations by using these cutting-edge technologies, ushering in a period of unprecedented development and rapid innovation.

Read more on govindhtech.com

#NVIDIADGX #SuperPOD #DellPowerScale #GenAI #GenerativeAI #DellTechnologies #generativeAI #AIworkloads #NVIDIA #dell #NVIDIADGXSuperPOD #PowerScaleStorage #technology #technews #news #govindhtech

0 notes

govindhtech · 8 months ago

Text

IBM And Intel Introduce Gaudi 3 AI Accelerators On IBM Cloud

Cloud-Based Enterprise AI from Intel and IBM. To assist businesses in scaling AI, Intel and IBM will implement Gaudi 3 AI accelerators on IBM Cloud.

Gaudi 3 AI Accelerator

The worldwide deployment of Intel Gaudi 3 AI accelerators as a service on IBM Cloud is the result of an announcement made by IBM and Intel. Anticipated for release in early 2025, this product seeks to support corporate AI scalability more economically and foster creativity supported by security and resilience.

Support for Gaudi 3 will also be possible because to this partnership with IBM’s Watsonx AI and analytics platform. The first cloud service provider (CSP) to use Gaudi 3 is IBM Cloud, and the product will be offered for on-premises and hybrid setups.

Intel and IBM

“AI’s true potential requires an open, cooperative environment that gives customers alternatives and solutions. are generating new AI capabilities and satisfying the need for reasonably priced, safe, and cutting-edge AI computing solutions by fusing Xeon CPUs and Gaudi 3 AI accelerators with IBM Cloud.

Why This Is Important: Although generative AI may speed up transformation, the amount of computational power needed highlights how important it is for businesses to prioritize availability, performance, cost, energy efficiency, and security. By working together, Intel and IBM want to improve performance while reducing the total cost of ownership for using and scaling AI.

Gaudi 3

Gaudi 3’s integration with 5th generation Xeon simplifies workload and application management by supporting corporate AI workloads in data centers and the cloud. It also gives clients insight and control over their software stack. Performance, security, and resilience are given first priority as clients expand corporate AI workloads more affordably with the aid of IBM Cloud and Gaudi 3.

IBM’s Watsonx AI and data platform will support Gaudi 3 to improve model inferencing price/performance. This will give Watsonx clients access to extra AI infrastructure resources for scaling their AI workloads across hybrid cloud environments.

“IBM is dedicated to supporting Intel customers in driving innovation in AI and hybrid cloud by providing solutions that address their business demands. According to Alan Peacock, general manager of IBM Cloud, “Intel commitment to security and resilience with IBM Cloud has helped fuel IBM’s hybrid cloud and AI strategy for Intel enterprise clients.”

Intel Gaudi 3 AI Accelerator

“The clients will have access to a flexible enterprise AI solution that aims to optimize cost performance by utilizing IBM Cloud and Intel’s Gaudi 3 accelerators.” They are making possible new AI business prospects available to their customers so they may test, develop, and implement AI inferencing solutions more affordably.

IBM and Intel

How It Works: IBM and Intel are working together to provide customers using AI a Gaudi 3 service capability. IBM and Intel want to use IBM Cloud’s security and compliance features to assist customers in a variety of sectors, including highly regulated ones.

Scalability and Flexibility: Clients may modify computing resources as required with the help of scalable and flexible solutions from IBM Cloud and Intel, which may result in cost savings and improved operational effectiveness.

Improved Security and Performance: By integrating Gaudi 3 with IBM Cloud Virtual Servers for VPC, x86-based businesses will be able to execute applications more securely and quickly than they could have before, which will improve user experiences.

What’s Next: Intel and IBM have a long history of working together, starting with the IBM PC and continuing with Gaudi 3 to create corporate AI solutions. General availability of IBM Cloud with Gaudi 3 products is scheduled for early 2025. In the next months, stay out for additional developments from IBM and Intel.

Intel Gaudi 3: The Distinguishing AI

Introducing your new, high-performing choice for every kind of workplace AI task.

An Improved Method for Using Enterprise AI

The Intel Gaudi 3 AI accelerators are designed to withstand rigorous training and inference tasks. They are based on the high-efficiency Intel Gaudi platform, which has shown MLPerf benchmark performance.

Support AI workloads from node to mega cluster in your data center or in the cloud, all running on Ethernet equipment you probably already possess. Intel Gaudi 3 may be crucial to the success of any AI project, regardless of how many accelerators you require one or hundreds.

Developed to Meet AI’s Real-World Needs

With the help of industry-standard Ethernet networking and open, community-based software, you can grow systems more flexibly thanks to the Intel Gaudi 3 AI accelerators.

Adopt Easily

Whether you are beginning from scratch, optimizing pre-made models, or switching from a GPU-based method, using Intel Gaudi 3 AI accelerators is easy.

Designed with developers in mind: To quickly catch up, make use of developer resources and software tools.

Encouragement of Both New and Old Models: Use open source tools, such as Hugging Face resources, to modify reference models, create new ones, or migrate old ones.

Included PyTorch: Continue using the library that your team is already familiar with.

Simple Translation of Models Based on GPUs: With the help of their specially designed software tools, quickly transfer your current solutions.

Ease Development from Start to Finish

Take less time to get from proof of concept to manufacturing. Intel Gaudi 3 AI Accelerators are backed by a robust suite of software tools, resources, and training from migration to implementation. Find out what resources are available to make your AI endeavors easier.

Scale Without Effort: Integrate AI into everyday life. The goal of the Intel Gaudi 3 AI Accelerators is to provide even the biggest and most complicated installations with straightforward, affordable AI scaling.

Increased I/O: Benefit from 33 percent greater I/O connection per accelerator than in H100,4 to allow for huge scale-up and scale-out while maintaining optimal cost effectiveness.

Constructed for Ethernet: Utilize the networking infrastructure you currently have and use conventional Ethernet gear to accommodate growing demands.

Open: Steer clear of hazardous investments in proprietary, locked technologies like NVSwitch, InfiniBand, and NVLink.

Boost Your AI Use Case: Realize the extraordinary on any scale. Modern generative AI and LLMs are supported by Intel Gaudi 3 AI accelerators in the data center. These accelerators work in tandem with Intel Xeon processors, the preferred host CPU for cutting-edge AI systems, to provide enterprise performance and dependability.

Read more on govindhtech.com

#IBM #IntroduceGaudi3 #AIAccelerators #IBMCloud #IBMWatsonxAI #ibm #intel #AIcomputing #NVLink #AIworkloads #hybridcloud #IntelGaudiplatform #MLPerfbenchmark #IntelXeonprocessors #technology #technews #news #govindhtech

0 notes