#AMDROCm6
Explore tagged Tumblr posts
Text
AMD ROCm 6.2.3 Brings Llama 3 And SD 2.1 To Radeon GPUs

AMD ROCm 6.2.3
AMD recently published AMD ROCm 6.2.3, the most recent version of their open compute software that supports Radeon GPUs on native Ubuntu Linux systems. Most significantly, this latest edition enables developers to use Stable Diffusion (SD) 2.1 text-to-image capabilities in their AI programming and offers amazing inference performance with Llama 3 70BQ4.
AMD focused on particular features to speed up the development of generative AI after its last release with AMD ROCm 6.1. Using vLLM and Flash Attention 2, AMD ROCm 6.2 provides pro-level performance for Large Language Model inference. This release also includes beta support for the Triton framework, enabling more users to develop AI functionality on AMD hardware.
The following are AMD ROCm 6.2.3 for Radeon GPUs’ four main feature highlights:
The most recent version of Llama is officially supported by vLLM. AMD ROCm on Radeon with Llama 3 70BQ4 offers amazing inference performance.
Flash Attention 2 “Forward Enablement” is officially supported. Its purpose is to speed up inference performance and lower memory requirements.
Formally endorsing stable diffusion (SD) The SD text-to-image model can be integrated into your own AI development.
Triton Beta Support: Use the Triton framework to quickly and simply develop high-performance AI programs with little experience
Since its first 5.7 release barely a year ago, AMD ROCm support for Radeon GPUs has advanced significantly.
With version 6.0, it formally qualified the usage of additional Radeon GPUs, such as the Radeon PRO W7800 with 32GB, and greatly increased AMD ROCm’s capabilities by adding support for the widely used ONNX runtime.
Another significant milestone was reached with the release of AMD ROCm 6.1, where it declared official support for the TensorFlow framework and multi-GPU systems. It also granted beta access to Windows Subsystem for Linux (WSL 2), which is now officially eligible for use with 6.1.
The AMD ROCm 6.2.3 solution stack for Radeon GPUs is as follows:
Although Linux was the primary focus of AMD ROCm 6.2.3, WSL 2 support will be released shortly.
ROCm on Radeon for AI and Machine Learning development has had a fantastic year, and it is eager to keep collaborating closely with the community to improve its product stack and support its system builders in developing attractive on-premises, client-based solutions.
Evolution of AMD ROCm from version 5.7 to 6.2.3
From version 5.7 to 6.2.3, AMD ROCm (Radeon Open Compute) has made substantial improvements to performance, hardware support, developer tools, and deep learning frameworks. Each release’s main improvements are listed below:
AMD ROCm 5.7
Support for New Architectures: ROCm 5.7 included support for AMD’s RDNA 3 family. This release expanded the GPUs that can utilize ROCm for deep learning and HPC.
HIP Improvements: AMD’s HIP framework for running CUDA code on AMD GPUs was optimized to facilitate interoperability between ROCm-supported systems and CUDA-based workflows.
Deep Learning Framework Updates: TensorFlow and PyTorch were made more compatible and performant. These upgrades optimized AI workloads in multi-GPU setups.
Performance Optimizations: This version improved HPC task performance, including memory management and multi-GPU scaling.
AMD ROCm 6.0
Unified Memory Support: ROCm 6.0 fully supported unified memory, making CPU-GPU data transfers smoother. This feature improved memory management, especially for applications that often access these processors’ memory.
New Compiler Infrastructure: AMD enhanced the ROCm Compiler (LLVM-based) for greater performance and larger workload support. We wanted to boost deep learning, HPC, and AI efficiency.
ROCm 6.0 might target more GPUs, especially in HPC, due to improved scalability and RDNA and CDNA architecture compatibility.
New CUDA compatibility features were added to the HIP API in this edition. These changes let developers convert CUDA apps to ROCm.
AMD ROCm 6.1
Optimized AI/ML Framework Compatibility: ROCm 6.1 improved PyTorch and TensorFlow performance. This improved mixed precision training, which maximizes GPU utilization in deep learning.
Experimental HIP Tensor Cores support allowed AI models to use hardware-accelerated matrix operations. This improvement greatly accelerated matrix multiplication, which is essential for deep learning.
Expanded Container Support: AMD included pre-built Docker containers that were easier to connect with Kubernetes in ROCm 6.1, simplifying cloud and cluster deployment.
More efficient data transfer in multi-GPU systems was achieved by improving memory and I/O operations.
AMD ROCm 6.1.3
Support for several GPUs makes it possible to create scalable AI desktops for multi-user, multi-serving applications.
These solutions can be used with ROCm on a Windows OS-based system thanks to beta-level support for Windows Subsystem for Linux.
More options for AI development are provided via the TensorFlow Framework.
AMD ROCm 6.2
New Kernel and Driver Features: ROCm 6.2 improved low-level driver and kernel support for advanced computing workload stability and performance. This significantly strengthened ROCm in enterprise environments.
AMD Infinity Architecture Integration: ROCm 6.2 enhances AMD’s Infinity Architecture, which connects GPUs at fast speeds. Multi-GPU configurations performed better, especially for large-scale HPC and AI applications.
HIP API expansion: ROCm 6.2 improved the HIP API, making CUDA-based application conversion easier. Asynchronous data transmission and other advanced features were implemented in this release to boost computational performance.
AMD ROCm 6.2.3
The most recent version of Llama is officially supported by vLLM. AMD ROCm on Radeon with Llama 3 70BQ4 offers amazing inference performance.
Flash Attention 2 “Forward Enablement” is officially supported. Its purpose is to speed up inference performance and lower memory requirements.
Formally endorsing stable diffusion (SD) The SD text-to-image model can be integrated into your own AI development.
Triton Beta Support: Use the Triton framework to quickly and simply develop high-performance AI programs with little experience
Key Milestone Summary
These versions supported new AMD architectures (RDNA, CDNA) as ROCm expanded its hardware support.
CUDA application porting became easier with HIP updates, while data scientists and academics found AI/ML framework support more useful.
Multi-GPU Optimizations: Unified memory support, RDMA, and AMD Infinity Architecture improved multi-GPU deployments, which are essential for HPC and large-scale AI training.
Each release improved ROCm’s stability and scalability by fixing bugs, optimizing memory management, and improving speed.
AMD now prioritizes an open, high-performance computing platform for AI, machine learning, and HPC applications.
Read more on Govindhtech.com
#AMDROCm#AMDROCm6#AMD#radeongpus#AI#hpc#govindhtech#news#TechNews#Technology#technologies#technologynews#technologytrends
0 notes
Text
How AMD ROCm 6.1 Advances AI and HPC Development

AMD ROCm 6.1
With the AMD ROCm 6 open-source software platform, AMD hopes to maintain its commitment to open-source and device-independent solutions while creating an environment that maximises the performance and potential of AMD Instinct accelerators. Consider ROCm 6 as the link that will allow your most ambitious AI concepts to be implemented successfully. In the current market, it gives developers the opportunity to create at their own speed, testing and deploying applications across a wide range of GPU architectures, and it delivers outstanding interoperability with key industry frameworks.
The most recent platform upgrade from AMD, ROCm 6.1, adds a host of new features for both academics and developers. In order to stay up with the quick developments in AI frameworks, AMD will examine how ROCm 6.1 builds on the fundamental advantages of ROCm 6 by supporting the most recent AMD Instinct and Radeon GPUs, boosting optimisations across a wide range of computational domains, and extending ecosystem support. The goal of ROCm 6.1’s new features and updates is to enhance application performance and reliability so that AI and HPC developers can push the boundaries of what is feasible.
Presenting rocDecode, a video processing tool Thanks to the new ROCm library, AMD GPUs now have high-performance video decoding capabilities directly on the GPU thanks to the Video Core Next (VCN) specialised media engines. These hardware-based decoders are effective at handling video streams.
By enabling direct decoding of compressed video into visual memory, rocDecode reduces the amount of data transferred via the PCIe bus and gets rid of typical bottlenecks in video processing. With real-time applications like video scaling, colour conversion, and augmentation which are critical for advanced analytics, inferencing, and machine learning training this feature enables rapid post-processing with the ROCm HIP framework.
The efficiency and scalability of video decoding activities are maximised with rocDecode. The API fully utilises all of the VCNs on a GPU device by permitting the development of numerous decoder instances that can run concurrently. The capacity to process in parallel ensures that even large-volume video streams can be simultaneously decoded and processed. To put it succinctly, rocDecode strengthens the video processing pipeline, providing power efficiency and performance increases that are necessary for contemporary AI and HPC applications.
MIGraphX adds Flash Attention and PyTorch backend The AMD graph inference engine is called MIGraphX. MIGraphX is a command-line programme called migraphx-driver and is available through C++ and Python APIs. Its purpose is to speed up deep learning neural networks. Because of this flexibility, developers may incorporate sophisticated model inference features into their applications with ease.
With support for Flash Attention, which increases the memory efficiency of well-known models like BERT, GPT, and Stable Diffusion, ROCm 6.1 enhances performance for transformer-based models and contributes to the faster, more power-efficient processing of complicated neural networks.
A new Torch-MIGraphX library is also included in ROCm 6.1, allowing the PyTorch workflows to directly incorporate MIGraphX capabilities. It defines an immediate-use “migraphx” backend for the torch.compile API. A variety of data types, such as FP32, FP16, and INT8, are supported by the Torch-MIGraphX library to meet various computing requirements.
Better MIOpen Library performance AMD’s open-source MIOpen deep learning primitives library is made especially to improve GPU performance. It has a full suite of tools to maximise GPU launch overheads and memory bandwidth using cutting-edge methods like fusion and auto-tuning infrastructure. This infrastructure adapts algorithms to optimise convolutions for different filter and input sizes, and it handles a wide range of issue setups efficiently.
The goal of MIOpen’s most recent upgrades is to improve performance, especially for convolutions and inference. ROCm 6.1 features Find 2.0 fusion plans, which are intended to maximise system resource utilisation and enhance the library’s capacity to carry out inference jobs more effectively. The convolution kernels for the Number of samples, Height, Width, and Channels (NHWC) format have been enhanced by AMD. The new heuristics especially optimise efficiency for this format, allowing better handling and processing of convolution operations across multiple applications. NHWC prioritises the height and width dimensions, followed by channels.
New Composable Kernel Library Architecture Support The Composable Kernel (CK) library now has expanded architecture support thanks to ROCm 6.1, providing extremely effective capabilities on a larger variety of AMD GPUs. The addition of stochastic rounding to the FP8 rounding mechanism is a major update in this version. By simulating more realistic data behaviour, this rounding technique improves model convergence and provides a more accurate and dependable means of handling data in machine learning models.
Enlarged hipSparse Computations using SPARSELt To speed up deep learning tasks, ROCm 6.1 adds extensions to hipSPARSELt that allow structured sparsity matrices. Support for configurations in which ‘B’ denotes the sparse matrix and ‘A’ the dense matrix in Sparse Matrix-Matrix Multiplication (SPMM) is noteworthy in this release. The library’s capabilities were previously restricted to multiplications where the sparse matrix was represented by the letter “A” and the dense matrix by the letter “B.” This addition expands the library’s capabilities. The performance and versatility of SPMM operations are improved by support for various matrix configurations, which further optimises deep learning computations.
Higher-Level Tensor Functions using hipTensor The AMD-specific C++ library hipTensor uses the Composable Kernel Library’s primitives to speed up tensor operations. hipTensor was created by AMD to take advantage of general-purpose kernel languages like HIP C++. In cases where complicated tensor computations are needed, hipTensor optimises the way tensor primitives are executed.
HipTensor’s most recent version adds support for 4D tensor contraction and permutation. A critical operation in many tensor-based computations, permutations on 4D tensors can now be efficiently carried out by users with ROCm 6.1. 4D contractions for F16, BF16, and Complex F32/F64 data formats are now supported by the library. With this additional functionality, hipTensor can now optimise a wider range of operations, enabling more complicated and varied manipulations of tensor data many of which are necessary for sophisticated computing activities like training neural networks and running complex simulations.
AMD wants to provide you with the newest in high-performance computing through the ROCm platform. Every upgrade in ROCm 6.1 has been created to increase productivity, optimise processes, and assist you in reaching your objectives more quickly by offering useful, strong tools that unleash your creative potential.
Read more on Govindhtech.com
0 notes