#IntelXeonScalableCPU | Explore Tumblr posts and blogs

govindhtech · 9 months ago

Text

Dell PowerEdge XE9680L Cools and Powers Dell AI Factory

When It Comes to Cooling and Powering Your AI Factory, Think Dell. As part of the Dell AI Factory initiative, the company is thrilled to introduce a variety of new server power and cooling capabilities.

Dell PowerEdge XE9680L Server

As part of the Dell AI Factory, they’re showcasing new server capabilities after a fantastic Dell Technologies World event. These developments, which offer a thorough, scalable, and integrated method of imaplementing AI solutions, have the potential to completely transform the way businesses use artificial intelligence.

These new capabilities, which begin with the PowerEdge XE9680L with support for NVIDIA B200 HGX 8-way NVLink GPUs (graphics processing units), promise unmatched AI performance, power management, and cooling. This offer doubles I/O throughput and supports up to 72 GPUs per rack 107 kW, pushing the envelope of what’s feasible for AI-driven operations.

Integrating AI with Your Data

In order to fully utilise AI, customers must integrate it with their data. However, how can they do this in a more sustainable way? Putting in place state-of-the-art infrastructure that is tailored to meet the demands of AI workloads as effectively as feasible is the solution. Dell PowerEdge servers and software are built with Smart Power and Cooling to assist IT operations make the most of their power and thermal budgets.

Astute Cooling

Effective power management is but one aspect of the problem. Recall that cooling ability is also essential. At the highest workloads, Dell’s rack-scale system, which consists of eight XE9680 H100 servers in a rack with an integrated rear door heat exchanged, runs at 70 kW or less, as we disclosed at Dell Technologies World 2024. In addition to ensuring that component thermal and reliability standards are satisfied, Dell innovates to reduce the amount of power required to maintain cool systems.

Together, these significant hardware advancements including taller server chassis, rack-level integrated cooling, and the growth of liquid cooling, which includes liquid-assisted air cooling, or LAAC improve heat dissipation, maximise airflow, and enable larger compute densities. An effective fan power management technology is one example of how to maximise airflow. It uses an AI-based fuzzy logic controller for closed-loop thermal management, which immediately lowers operating costs.

Constructed to Be Reliable

Dependability and the data centre are clearly at the forefront of Dell’s solution development. All thorough testing and validation procedures, which guarantee that their systems can endure the most demanding situations, are clear examples of this.

A recent study brought attention to problems with data centre overheating, highlighting how crucial reliability is to data centre operations. A Supermicro SYS‑621C-TN12R server failed in high-temperature test situations, however a Dell PowerEdge HS5620 server continued to perform an intense workload without any component warnings or failures.

Announcing AI Factory Rack-Scale Architecture on the Dell PowerEdge XE9680L

Dell announced a factory integrated rack-scale design as well as the liquid-cooled replacement for the Dell PowerEdge XE9680.

The GPU-powered Since the launch of the PowerEdge product line thirty years ago, one of Dell’s fastest-growing products is the PowerEdge XE9680. immediately following the Dell PowerEdge. Dell announced an intriguing new addition to the PowerEdge XE product family as part of their next announcement for cloud service providers and near-edge deployments.

AI computing has advanced significantly with the Direct Liquid Cooled (DLC) Dell PowerEdge XE9680L with NVIDIA Blackwell Tensor Core GPUs. This server, shown at Dell Technologies World 2024 as part of the Dell AI Factory with NVIDIA, pushes the limits of performance, GPU density per rack, and scalability for AI workloads.

The XE9680L’s clever cooling system and cutting-edge rack-scale architecture are its key components. Why it matters is as follows:

GPU Density per Rack, Low Power Consumption, and Outstanding Efficiency

The most rigorous large language model (LLM) training and large-scale AI inferencing environments where GPU density per rack is crucial are intended for the XE9680L. It provides one of the greatest density x86 server solutions available in the industry for the next-generation NVIDIA HGX B200 with a small 4U form factor.

Efficient DLC smart cooling is utilised by the XE9680L for both CPUs and GPUs. This innovative technique maximises compute power while retaining thermal efficiency, enabling a more rack-dense 4U architecture. The XE9680L offers remarkable performance for training large language models (LLMs) and other AI tasks because it is tailored for the upcoming NVIDIA HGX B200.

More Capability for PCIe 5 Expansion

With its standard 12 x PCIe 5.0 full-height, half-length slots, the XE9680L offers 20% more FHHL PCIe 5.0 density to its clients. This translates to two times the capability for high-speed input/output for the North/South AI fabric, direct storage connectivity for GPUs from Dell PowerScale, and smooth accelerator integration.

The XE9680L’s PCIe capacity enables smooth data flow whether you’re managing data-intensive jobs, implementing deep learning models, or running simulations.

Rack-scale factory integration and a turn-key solution

Dell is dedicated to quality over the XE9680L’s whole lifecycle. Partner components are seamlessly linked with rack-scale factory integration, guaranteeing a dependable and effective deployment procedure.

Bid farewell to deployment difficulties and welcome to faster time-to-value for accelerated AI workloads. From PDU sizing to rack, stack, and cabling, the XE9680L offers a turn-key solution.

With the Dell PowerEdge XE9680L, you can scale up to 72 Blackwell GPUs per 52 RU rack or 64 GPUs per 48 RU rack.

With pre-validated rack infrastructure solutions, increasing power, cooling, and AI fabric can be done without guesswork.

AI factory solutions on a rack size, factory integrated, and provided with “one call” support and professional deployment services for your data centre or colocation facility floor.

Dell PowerEdge XE9680L

The PowerEdge XE9680L epitomises high-performance computing innovation and efficiency. This server delivers unmatched performance, scalability, and dependability for modern data centres and companies. Let’s explore the PowerEdge XE9680L’s many advantages for computing.

Superior performance and scalability

Enhanced Processing: Advanced processing powers the PowerEdge XE9680L. This server performs well for many applications thanks to the latest Intel Xeon Scalable CPUs. The XE9680L can handle complicated simulations, big databases, and high-volume transactional applications.

Flexibility in Memory and Storage: Flexible memory and storage options make the PowerEdge XE9680L stand out. This server may be customised for your organisation with up to 6TB of DDR4 memory and NVMe, SSD, and HDD storage. This versatility lets you optimise your server’s performance for any demand, from fast data access to enormous storage.

Strong Security and Management

Complete Security: Today’s digital world demands security. The PowerEdge XE9680L protects data and system integrity with extensive security features. Secure Boot, BIOS Recovery, and TPM 2.0 prevent cyberattacks. Our server’s built-in encryption safeguards your data at rest and in transit, following industry standards.

Advanced Management Tools

Maintaining performance and minimising downtime requires efficient IT infrastructure management. Advanced management features ease administration and boost operating efficiency on the PowerEdge XE9680L. Dell EMC OpenManage offers simple server monitoring, management, and optimisation solutions. With iDRAC9 and Quick Sync 2, you can install, update, and troubleshoot servers remotely, decreasing on-site intervention and speeding response times.

Excellent Reliability and Support

More efficient cooling and power

For optimal performance, high-performance servers need cooling and power control. The PowerEdge XE9680L’s improved cooling solutions dissipate heat efficiently even under intense loads. Airflow is directed precisely to prevent hotspots and maintain stable temperatures with multi-vector cooling. Redundant power supply and sophisticated power management optimise the server’s power efficiency, minimising energy consumption and running expenses.

A proactive support service

The PowerEdge XE9680L has proactive support from Dell to maximise uptime and assure continued operation. Expert technicians, automatic issue identification, and predictive analytics are available 24/7 in ProSupport Plus to prevent and resolve issues before they affect your operations. This proactive assistance reduces disruptions and improves IT infrastructure stability, letting you focus on your core business.

Innovation in Modern Data Centre Design Scalable Architecture

The PowerEdge XE9680L’s scalable architecture meets modern data centre needs. You can extend your infrastructure as your business grows with its modular architecture and easy extension and customisation. Whether you need more storage, processing power, or new technologies, the XE9680L can adapt easily.

Ideal for virtualisation and clouds

Cloud computing and virtualisation are essential to modern IT strategies. Virtualisation support and cloud platform integration make the PowerEdge XE9680L ideal for these environments. VMware, Microsoft Hyper-V, and OpenStack interoperability lets you maximise resource utilisation and operational efficiency with your visualised infrastructure.

Conclusion

Finally, the PowerEdge XE9680L is a powerful server with flexible memory and storage, strong security, and easy management. Modern data centres and organisations looking to improve their IT infrastructure will love its innovative design, high reliability, and proactive support. The PowerEdge XE9680L gives your company the tools to develop, innovate, and succeed in a digital environment.

Read more on govindhtech.com

#EC2I4iInstances #IncreasingEfficiency #SavingMoney #databases #AmazonEC2 #OpenSearch #IntelXeonScalableprocessors #cloudstorage #memorystorage #AWSinstance #IntelXeonScalableCPU #I4iinstances #aws #intel #BoostingThroughput #technology #technews #news #govindhtech

0 notes

govindhtech · 9 months ago

Text

WiNGPT: Intel Xeon Processor Powered Winning Health LLM

Healthcare organisations’ performance requirements are met by WiNGPT, Winning Health LLM, which uses Intel technology.

Summary

Large language models (LLMs) are a novel technology with great promise for use in medical settings. This is well acknowledged in the current state of smart hospital improvement. Medical report generation, AI-assisted imaging diagnosis, pathology analysis, medical literature analysis, healthcare Q&A, medical record sorting, and chronic disease monitoring and management are just a few of the applications that are powered by LLMs and help to improve patient experience while lowering costs for medical institutions in terms of personnel and other resources.

The absence of high-performance and reasonably priced computing systems, however, is a significant barrier to the wider adoption of LLMs in healthcare facilities. Consider the inference process for models: the sheer magnitude and complexity of LLMs considerably outweigh those of typical AI applications, making it difficult for conventional computing platforms to satisfy their needs.

WiNGPT, a medical LLM leader, has been enhanced by Winning Health with the release of its 5th generation Intel Xeon Scalable processor-based WiNGPT system. The method efficiently makes use of the processors’ integrated accelerators, such as Intel Advanced Matrix Extensions (Intel AMX), for model inference. In comparison with the platform based on the 3rd Gen Intel Xeon Scalable processors, the inference performance has been boosted by nearly three times through collaboration with Intel in areas such as graph optimisation and weight-only quantization. Accelerating the implementation of LLM applications in healthcare organisations, the upgrade satisfies the performance demand for scenarios like automated medical report creation.

Challenge: Medical LLM Inference’s Compute Dilemma

The broad implementation of LLMs across multiple industries, including healthcare, is seen as a significant achievement for the practical usage of this technology. Healthcare organisations are investing more and have advanced significantly in LLMs for medical services, management, and diagnosis. According to research, the healthcare industry will embrace LLMs at a rapid pace between 2023 and 2027. By that time, the market is predicted to grow to a size of over 7 billion yuan.

Typically, LLMs are compute-intensive applications with high computing costs because of the significant computing resources required for training, fine-tuning, and inference. Model inference is one of the most important phases in the deployment of LLM among them. Healthcare facilities frequently face the following difficulties when developing model inference solutions:

There is a great need for real-time accuracy in these intricate settings. For this to work, the computing platform must have sufficient inference capacity. Furthermore, healthcare organisations typically prefer that the platform be deployed locally rather than on the cloud due to the strict security requirements for medical data.

While GPU improvements may be necessary in response to LLM updates, hardware upgrades are rarely common. Updated models might therefore not be compatible with outdated hardware.

The inference of Transformer-based LLMs now requires significantly more hardware than it did previously. It is challenging to completely use prior computational resources since memory and time complexity both rise exponentially with the length of the input sequence. As such, hardware utilisation is still below maximum potential.

From an economic standpoint, it would be more expensive to deploy computers specifically for model inference, and their utilisation would be restricted. Because of this, many healthcare organisations choose to employ CPU-based server systems for inference in order to reduce hardware costs while maintaining the ability to handle a range of workloads.

WiNGPT based on Intel 5th Generation Xeon Scalable Processors

An LLM created especially for the healthcare industry is called WiNGPT by Winning Health. WiNGPT, which is based on the general-purpose LLM, combines high-quality medical data and is tailored and optimised for medical scenarios, enabling it to offer intelligent knowledge services for various healthcare scenarios. The three unique characteristics of WiNGPT are as follows:

Perfected and specialised: WiNGPT delivers remarkable data accuracy that satisfies a variety of business needs. It has been trained and refined for medical scenarios and on high-quality data.

Cheap: By means of algorithm optimisation, the CPU-based deployment has achieved generation efficiency that is comparable to that of the GPU through testing.

Assist with personalised private deployment: In addition to improving system stability and dependability, private deployment guarantees that medical data stays inside healthcare facilities and prevents data leaks. It also enables tailored alternatives to suit different financial plans for organisations with different needs.

Winning Health has teamed with Intel and selected the 5th Gen Intel Xeon Scalable processors in order to speed up WiNGPT’s inference speed. These processors have a lower total cost of ownership (TCO), provide remarkable performance gains per watt across a range of workloads, and are more reliable and energy-efficient. They also perform exceptionally well in data centres, networks, AI, and HPC. In the same power consumption range, the 5th Gen Intel Xeon Scalable processors provide faster memory and more processing capability than their predecessors. Furthermore, when deploying new systems, they greatly reduce the amount of testing and validation that must be done because they are interoperable with platforms and software from previous generations.

AI performance is elevated by the integrated Intel AMX and other AI-optimized technologies found in the 5th generation Intel Xeon Scalable CPUs. Intel AMX introduces a novel instruction set and circuit design that allows matrix operations, resulting in a large increase in instructions per cycle (IPC) for AI applications. For training and inference workloads in AI, the breakthrough results in a significant performance boost.

The Intel Xeon Scalable 5th generation processor enables:

Up to 21% increase in overall performance

A 42 percent increase in inference performance

Memory speed up to 16 percent quicker

Larger L3 cache by up to 2.7 times

Greater performance per watt by up to ten times

Apart from the upcoming 5th generation Intel Xeon Scalable processors, Winning Health and Intel are investigating methods to tackle the memory access limitation in LLM inference using the existing hardware platform. LLMs are typically thought of being memory-bound because of their large number of parameters, which frequently necessitate loading billions or even tens of billions of model weights into memory before calculating. Large amounts of data must be momentarily stored in memory during computation and accessed for further processing. Thus, the main factor preventing inference speed from increasing has shifted from processing power to memory access speed.

In order to maximise memory access and beyond, Winning Health and Intel have implemented the following measures:

Graph optimisation is the technique of combining several operators in order to minimise the overhead associated with operator/core calls. By combining many operators into one operation, performance is increased because fewer memory resources are used than when reading in and reading out separate operators. Winning Health has optimised the algorithms in these procedures using Intel Extension for PyTorch, which has effectively increased performance. Intel employs the Intel Extension for PyTorch plug-in, which is built on Intel Xeon Scalable processors and Intel Iris Xe graphics, to enhance PyTorch performance on servers by utilising acceleration libraries like oneDNN and oneCCL.

Weight-only quantization: This kind of optimisation is specifically designed for LLMs. The parameter weights are transformed to an INT8 data type as long as the computing accuracy is ensured, but they are then returned to half-precision. This helps to speed up the computing process overall by reducing the amount of memory used for model inference.

WiNGPT’s inference performance has been enhanced by Winning Health and Intel working together to improve memory utilisation. In order to further accelerate inference for the deep learning framework, the two have also worked together to optimise PyTorch’s main operator algorithms on CPU platforms.

The LLaMA2 model’s inference performance reached 52 ms/token in a test-based validation environment. An automated medical report is generated in less than three seconds for a single output.

Winning Health also contrasted the 3rd Gen and 5th Gen Intel Xeon Scalable processor-based solutions’ performance during the test. According to the results, current generation CPUs outperform third generation processors by nearly three times. Image credit to Intel

The robust performance of the 5th Gen Intel Xeon Scalable CPU is sufficient to suit customer expectations because the business situations in which WiNGPT is employed are relatively tolerant of LLM latency. In the meantime, the CPU-based approach may be readily scaled to handle inference instances and altered to run inference on several platforms.

Advantages

Healthcare facilities have benefited from WiNGPT’s solution, which is built on 5th generation Intel Xeon Scalable processors, in the following ways:

Enhanced application experience and optimised LLM performance: Thanks to technical improvements on both ends, the solution has fully benefited from AI’s performance benefits when using 5th generation Intel Xeon Scalable processors. It can ensure a guaranteed user experience while reducing generation time by meeting the performance requirements for model inference in scenarios like medical report generation.

Increased economy while controlling platform construction costs: By avoiding the requirement to add specialised inference servers, the solution can use the general-purpose servers that are currently in use in healthcare facilities for inference. This lowers the cost of procurement, deployment, operation, maintenance, and energy usage.

LLMs and other IT applications are ideally balanced: Due to the solution’s ability to employ CPU for inference, computing power allocation for healthcare organisations can be more agile and flexible as needed, as CPU can be divided between LLM inference and other IT operations.

Observing Up Front

When combined with WiNGPT, the 5th generation Intel Xeon Scalable CPUs offer exceptional inferencing speed, which facilitates and lowers the cost of LLM implementation. To increase user accessibility and benefit from Winning Health’s most recent AI technologies, both parties will keep refining their work on LLMs.

Read more on govindhtech.com

#wingpt #intelxeonprocessor #IntelCore #healthllm #LLM #LargeLanguageModels #AI #AIApplications #CPU #GPU #deeplearning #Llama2 #intelxeonscalablecpu #news #technews #technology #technologynews #technologytrends #govindhtech

0 notes

govindhtech · 6 months ago

Text

ASUS ESC4000-E11 Server Advances Federated AI Capabilities

Introduction

The all-purpose With its 4th generation Intel Xeon Scalable CPUs and XPUs for the Intel Data Center GPU Flex Series 170, the ASUS ESC4000-E11 server is essential for improving federated AI capabilities across a range of sectors. Because of its architecture, which maximizes distributed AI workloads, this server is perfect for industries like healthcare and finance that value privacy, scalability, and speed.

The Test Setup

Three ASUS ESC4000-E11 server computers were used to construct the testing environment. The third ASUS ESC4000-E11 was the federated server in charge of aggregating models based on the data that each federated client had, while the other two servers functioned as federated clients. Different aggregation techniques and their possible effects on the final model in federated learning settings have been the subject of many research. By default, this test combined the gradients from the federated clients using the averaging approach.

Performance insights on the federated client hardware, especially the ASUS ESC4000-E11 with the Intel Data Center GPU Flex Series 170 for acceleration, are the main goal of this test.

The following are the main metrics assessed in this setup:

Time spent training the model

Model precision

Loss of training

A thorough understanding of which hardware is most appropriate for implementing federated learning in a real-world medical setting was then provided by comparing these measures to the outcomes from Intel Xeon CPUs.

Model Inference and RA Erosion Detection

It evaluated the model’s capacity to accurately identify varying degrees of RA erosion by testing its inference performance after the federated learning procedure and the final model aggregation. Three different degrees of RA erosion severity were determined using the developed mTSS model:

Level 0: No erosion

Level 1: Light erosion

Level 2: Significant erosion

Effective treatment planning is facilitated by this categorization approach, which enables precise detection of RA development based on medical imaging.

Principal Advantages of Federated AI

Improved Data Security: By supporting federated learning, the ASUS ESC4000-E11 keeps private information dispersed. Maintaining data privacy requires this decentralization, particularly in industries with a high regulatory burden. By processing data locally, the server’s architecture protects privacy, lowers the possibility of data breaches, and guarantees compliance with strict data protection laws.

Flexibility and Scalability: As federated learning networks grow, the server’s design enables it to scale effectively. Larger datasets and more complicated models are supported by this scalability, which allows businesses to expand their AI capabilities while preserving peak performance across several edge devices or institutions.

Decreased Latency: The ASUS ESC4000-E11 reduces latency during model training and updates because to its strong processing capabilities. In real-time applications like medical diagnostics, where prompt decision-making may have a big influence on results, this latency reduction is very important.

Energy Efficiency: High performance and optimal power consumption are guaranteed by the incorporation of XPUs for the Intel Data Center GPU Flex Series 170. It is a sustainable option for extensive AI deployments because of its energy efficiency, which lowers costs and improves the environment.

Organizations may create federated learning settings that are quicker, safer, and more effective by using the ASUS ESC4000-E11, which will spur innovation in AI-driven industries.

Case Study: Federated AI in Medical Imaging Diagnostics

Multiple hospitals work together to increase the precision of AI models that identify illnesses using medical images like MRIs, CT scans, and X-rays in a real-world federated AI scenario involving medical imaging diagnostics. While developing a common AI model, each institution keeps its data locally to ensure compliance with privacy laws.

Infrastructure Setup

In order to manage demanding AI workloads and support federated learning, each hospital implements the ASUS ESC4000-E11, which is outfitted with 4th generation Intel Xeon Scalable processors and XPUs for Intel Data Center GPU Flex Series 170. The hospitals may work together with this configuration without exchanging raw data.

The Federated Learning Process

Data Preparation: To guarantee that local medical imaging data never leaves a safe environment, each institution preprocesses it internally.

Local Model Training: Hospitals train AI models on local datasets utilizing ASUS ESC4000-E11 servers, which feature XPUs for Intel Data Center GPU Flex Series 170 for faster training. In order to maintain anonymity, the training procedure stays within each hospital’s infrastructure.

Model Aggregation: A central server receives the locally trained models and aggregates them to create a global model. No raw data is shared during this aggregation procedure; only model parameters are used.

Updates to the Global Model: Each hospital receives a redistribution of the global model, which now incorporates the pooled wisdom of all hospitals. Iterations and further local training are part of the cycle.

Performance and Efficiency Gains

Faster Training Times: Hospitals can rapidly converge on a highly accurate global model thanks to the ASUS ESC4000-E11’s potent hardware, which drastically cuts down on training times.

Energy-Efficient Training: By using XPUs for Intel Data Center GPU Flex Series 170, training is carried out in an energy-efficient manner, lowering operating expenses and the environmental impact.

Improved Data Security: The sophisticated security features of fourth-generation Intel Xeon processors guarantee that patient data is safe throughout the federated learning process.

Outcome and Benefits of Medical Diagnostics

With the help of the ASUS ESC4000-E11 servers, the federated AI system produces:

An very reliable and accurate AI model that can diagnose illnesses from medical pictures A cooperative architecture that allows hospitals to get access to a variety of datasets, enhancing the generalizability of the model without jeopardizing data privacy Quicker model iterations result in the rapid deployment of diagnostic tools, which improves patient care by enabling more precise and quicker diagnoses.

Medical organizations may cooperatively improve their AI skills by using ASUS ESC4000-E11 servers in federated learning, which will improve healthcare results while maintaining data confidentiality and privacy.

In conclusion

This white paper has shown how the ASUS ESC4000-E11 AI server’s Intel Data Center GPU Flex Series 170 may greatly speed up federated learning for practical medical applications. The technology is a powerful way to advance AI in healthcare since it improves model training durations, lowers latency, and guarantees adherence to data privacy laws. By using this gear, healthcare facilities may enhance their diagnostic skills and patient care results.