#JAX and OpenXLA
Explore tagged Tumblr posts
govindhtech · 2 months ago
Text
StableHLO & OpenXLA: Enhancing Hardware Portability for ML
Tumblr media
JAX and OpenXLA: Methods and Theory
JAX, a Python numerical computing package with pytorch/XLA compilation and automated differentiation, optimises computations for CPUs, GPUs, and TPUs using OpenXLA.
Even though the Intel articles on JAX and OpenXLA do not define StableHLO, the context of OpenXLA's function suggests that it is related to the portability and stability of the Hardware Abstraction Layer (HAL) in the ecosystem. Intel Extension for OpenXLA with PJRT plug-in.
StableHLO likely matches the sources' scenario:
OpenXLA abstracts low-level hardware backends from high-level machine learning frameworks like JAX. This abstraction lets models operate on different hardware without code changes.
OpenXLA uses an intermediate representation (IR) to connect the backend (XLA compilers for specific hardware) and frontend (JAX).
This abstraction requires IR stability to perform properly and enable reliable deployment across devices. This IR change may break backend compilers and frontend frameworks.
We believe StableHLO is an OpenXLA versioned and standardised HLO (High-Level Optimiser) IR. With this standardisation and versioning, models compiled for a StableHLO version would work on compatible hardware backends that support that version.
Although the sources don't define StableHLO, OpenXLA's role as an abstraction layer with an intermediate representation implies that it's essential to the JAX and OpenXLA ecosystem for ensuring computation stability and portability across hardware targets. Hardware and software (JAX via OpenXLA) would have a solid contract.
To better understand StableHLO, you should read OpenXLA project and component documentation.
Understanding how JAX and OpenXLA interact, especially the compilation and execution cycle, helps Intel and other systems optimise performance. OpenXLA's role in backend-agnostic optimisation, JAX's staged compilation, and cross-device execution are highlighted.
Important topics
Core Functionality and Transformation System of JAX
JIT compilation, vectorisation, parallelisation, and automated differentiation (jax.grad) are added to NumPy by JAX.
These changes make JAX functions more efficient.
Jax.jit converts JAX functions into XLA computations, improving efficiency. “The jax.jit transformation in JAX optimises numerical computations by compiling Python functions that operate on JAX arrays into efficient, hardware-accelerated code using XLA.”
OpenXLA as a Backend-Agnostic Compiler
OpenXLA bridges hardware backends to JAX. The optimisation and intermediate representation pipeline is combined.
The jax.jit converter converts JAX code to OpenXLA HLO IR.
OpenXLA optimises this HLO IR and generates backend machine code.
“OpenXLA serves as a unifying compiler infrastructure that produces optimised machine code for CPUs, GPUs, and TPUs from JAX's computation graph in HLO.”
Compilation in stages in JAX
JAX-decorated functions employ staged compilation.Invoking jit requires a specified input shape and data type (abstract signature).
JAX watches the Python function's execution using abstract variables to describe the calculation.
This traced calculation then reaches the OpenXLA HLO IR.
OpenXLA optimises the HLO and generates target backend code.
Using the resulting code in subsequent calls with the same abstract signature will boost performance. “When a JAX-jitted function is called for the first time with a specific shape and dtype of inputs, JAX traces the sequence of operations, and OpenXLA compiles this computation graph into optimised machine code for the target device.”
CPU and GPU Execution Flow
How OpenXLA lets JAX regulate device computations.
OpenXLA optimises CPU machine code using SIMD and other architectural features.
In OpenXLA, data flows and kernel execution are maintained while the GPU handles calculations.
On GPUs, OpenXLA generates kernels for the GPU's parallel processing units.
This includes initiating and coordinating GPU kernels and managing CPU-GPU memory transfers.
Data management between devices using device buffers (jax.device_buffer.DeviceArray).
Understanding Abstract Signatures and Recompilation
The form and data type of input arguments determine a jax.jit-decorated function's abstract signature.
When a jitted function is called with inputs with a different abstract signature, JAX recompiles. Use consistent input shapes and data types to save compilation costs.
Intel Hardware/Software Optimisation Integration
Since the resources are on the Intel developer website, they likely demonstrate how JAX and OpenXLA may optimise Intel CPUs and GPUs.
This area includes optimised kernels, vectorisation on Intel architectures like AVX-512, and interaction with Intel-specific libraries or tools.
The jax.jit transformation in JAX employs XLA to turn Python functions that operate with JAX arrays into hardware-accelerated code, optimising numerical operations.
OpenXLA serves as a unified compiler infrastructure, converting JAX's compute graph (HLO) into optimised machine code for CPUs, GPUs, and TPUs.
When JAX-jitted functions are initially performed with a specific shape and dtype of inputs, JAX tracks the chain of operations. OpenXLA then compiles this processing graph into device-optimized machine code.
OpenXLA targets GPUs to generate kernels for the GPU's parallel processing units. Launching and synchronising GPU kernels and managing CPU-GPU data flows are required.
0 notes