#MLPerf benchmarks | Explore Tumblr posts and blogs

govindhtech · 2 months ago

Text

Intel MLPerf: Benchmarking Hardware For Machine Learning(ML)

Overview

This briefing describes Intel MLPerf, a popular and rapidly growing benchmark suite for machine learning (ML) hardware, software, and services. Intel MLPerf, formed by a wide coalition of academic, scientific, and industry organisations, compares ML systems impartially to accelerate innovation. MLPerf's definition, operation, aims, and relevance in artificial intelligence will be discussed in this article.

What's MLPerf?

When combined, “ML” for machine learning and “Perf” for performance create “MLPerf.” MLPerf is a series of benchmarks that evaluates ML systems in different tasks and conditions.

Intel MLPerf, an industry benchmark, measures ML hardware and software performance. It standardises machine learning system evaluation and progress tracking.

MLPerf emphasises real-world application settings rather than vendor-specific criteria to level the playing field for machine learning performance assessment. Developers, researchers, and consumers may pick the best hardware and software for their machine learning needs.

How MLPerf Works

MLPerf's rigorous and transparent process involves several key elements:

Benchmark Suites: Intel MLPerf has several benchmark suites for specific ML problems. Over time, these suites evolve with the field. Edge computing, inference, and training examples are given.

Machine learning concerns including recommendation systems, object recognition, picture classification, and NLP are addressed.

Open Participation: The Intel MLPerf collaboration welcomes cloud service providers, software developers, hardware manufacturers, and educational organisations. This coordinated approach ensures benchmark applicability and credibility.

Standardised Rules and indicators: MLPerf sets strict benchmarking standards and performance metrics to ensure fair comparisons. The rules cover allowed optimisations, model accuracy targets, and data preparation.

Benchmarks include strict requirements to provide fair system comparisons.

After participants submit their performance results, the MLPerf website posts complete software stacks and system specifications for public review. This transparency encourages healthy rivalry and clear comparisons. Leaderboards are crucial for tracking progress:

Users may see how different systems perform on different machine learning workloads because the findings are publically available.

Focus on Practical Tasks: Intel MLPerf benchmarks simulate genuine ML applications using representative or public datasets. It ensures that performance indicators apply to real-world use cases.

The Value of MLPerf

The Intel article emphasises many aims and MLPerf's role in the AI ecosystem:

Objective Comparisons: MLPerf simplifies machine learning system comparisons by standardising methods and metrics. This lets customers make data-driven choices.

MLPerf sets defined performance objectives and makes hardware and software innovations public to motivate vendors to improve results. Competition increases growth.

Open submission and comprehensive reporting standards make ML performance claims transparent. Users may view software stacks and settings used to achieve goals.

Influencing Purchase Decisions: Intel MLPerf findings assist organisations adopt ML solutions by revealing the performance capabilities of different hardware and software alternatives for specific workloads.

Monitoring Development in the Field: MLPerf results indicate how new algorithms, software optimisations, and architectural upgrades affect ML system performance over time.

It tracks ML technology advancement.

MLPerf benchmarks training and inference at many levels of ML. This provides a complete system performance view.

The Changes and Impact of MLPerf

Remember that MLPerf is a dynamic project that extends beyond description and operation.

New ML tasks, models, and application areas are introduced to benchmark suites often to keep current. Long-term effects depend on adaptability.

The quest for MLPerf benchmark perfection affects hardware and software design, including CPUs, GPUs, memory systems, interconnects, and software frameworks. To meet these standards, companies actively optimise their products.

Community-Driven Development: MLPerf's strength is its community participation. The consortium's transparent and cooperative structure ensures that benchmarks reflect machine learning community concerns.

Addressing Emerging Trends: MLPerf is assessing edge computing, personalised recommendation systems, and massive language models to keep up with AI application changes.

In conclusion

The primary machine learning system effectiveness benchmark is Intel MLPerf. A standardised, transparent, and community-driven evaluation strategy empowers users, stimulates innovation, and facilitates informed decision-making in the fast-growing field of artificial intelligence. MLPerf's development and use are crucial for tracking progress and understanding AI technology potential.

#technology #technews #govindhtech #news #technologynews #MLPerf #Intel MLPerf #machine learning #MLPerf benchmarks #MLPerf Intel

0 notes

monpetitrobot · 5 days ago

Link

#AIchips #Blackwellarchitecture #cloudcomputing #enterpriseAI #GenerativeAI #MLPerf #NVIDIA #semiconductorindustry

0 notes

3acesnews · 16 days ago

Photo

CoreWeave Achieves Record MLPerf Benchmark with NVIDIA GB200 Superchips

0 notes

fabiopempy · 16 days ago

Text

CoreWeave, NVIDIA, And IBM Submit Record-Breaking MLPerf Results Using NVIDIA GB200 Grace Blackwell Superchips

Software cloud platform CoreWeave announced that it has collaborated with NVIDIA and IBM to complete the largest MLPerf Training v5.0 submission to date using NVIDIA Blackwell technology. The effort utilized 2,496 NVIDIA Blackwell GPUs operating on CoreWeave’s cloud infrastructure, which is optimized for AI. This benchmark represents the most extensive NVIDIA GB200 NVL72 cluster evaluated

Read More: You won't believe what happens next... Click here!

#BlockchainArt #Collectibles #CryptoArt #DigitalArt #EthereumNFTs #NFT #NFTArt #NFTCommunity #NFTs #Web3

1 note · View note

fraoula1 · 18 days ago

Text

7 𝐄𝐬𝐬𝐞𝐧𝐭𝐢𝐚𝐥 𝐑𝐞𝐬𝐞𝐚𝐫𝐜𝐡 𝐏𝐥𝐚𝐭𝐟𝐨𝐫𝐦𝐬 𝐟𝐨𝐫 𝐔𝐧𝐛𝐢𝐚𝐬𝐞𝐝 𝐀𝐈 𝐏𝐫𝐨𝐝𝐮𝐜𝐭 𝐑𝐞𝐯𝐢𝐞𝐰𝐬 𝐘𝐨𝐮 𝐍𝐞𝐞𝐝 𝐭𝐨 𝐄𝐱𝐩𝐥𝐨𝐫𝐞 𝐢𝐧 2025

Looking for trustworthy platforms to evaluate deep tech AI products? We just published a data-backed guide to the Top 7 Verified Review Platforms every CTO, AI strategist, and enterprise buyer should know.

Packed with research metrics, industry benchmarking data (MLPerf, Hugging Face, Stanford AI Index), and verified user insights — this is your go-to resource for making confident AI product decisions in 2025.

Read the full blog here: https://www.fraoula.co/post/7-essential-research-platforms-for-unbiased-ai-product-reviews-you-need-to-explore-in-2025

#AIReviewPlatforms #DeepTech #Fraoula #EnterpriseAI #MLPerf #PapersWithCode #G2 #Gartner #HuggingFace #StanfordAIIndex

0 notes

groovy-computers · 3 months ago

Photo

🚀 Ready to revolutionize your AI infrastructure? Super Micro's NVIDIA HGX B200 systems are leading the charge with stunning AI performance leaps. 🔍 Context: Our latest 8-GPU systems offer a groundbreaking 3.1x throughput boost for Llama2-70B, outpacing H200 setups in MLPerf v5.0 benchmarks. Imagine generating 98,443 tokens per second compared to just 33,072 – that's the power of our 4U liquid-cooled and 10U air-cooled systems. 💡 Dive into the Details: With consistently verified, production-ready hardware, our rack configurations allow up to 96 Blackwell GPUs, promising a **15x performance gain.** Perfect for those seeking high-density, efficient AI processing solutions. 🔗 Insight: Staying ahead in AI means embracing innovation today. Our systems offer accessibility, scalability, and power, reshaping the industry's pace. What's your take on the evolving AI landscape? Let us know below! 👇 #AI #SuperMicro #GPU #TechInnovation #MachineLearning #LLAMA2 #BlackwellGPUs #HighPerformanceComputing 🌐

0 notes

digitalmore · 3 months ago

Text

#IFTTT #Digital More

0 notes

infernovm · 3 months ago

Text

New MLCommons benchmarks to test AI infrastructure performance

Industry consortium MLCommons has released new versions of its MLPerf Inference benchmarks, offering a closer look at how current-generation data center and edge hardware performs under increasingly demanding AI workloads. The updated MLPerf Inference V5.0 comes as infrastructure teams grapple with surging demand from generative AI applications like chatbots and code assistants, which require…

0 notes

strategictech · 3 months ago

Text

MLCommons Releases MLPerf Inference v5.0 Benchmark Results

Today, MLCommons announced new results for its MLPerf Inference v5.0 benchmark suite, which delivers machine learning (ML) system performance benchmarking. The rorganization said the esults highlight that the AI community is focusing on generative AI, and that the combination of recent hardware and software advances optimized for generative AI have led to performance improvements over the past year.

@tonyshan #techinnovation https://bit.ly/tonyshan https://bit.ly/tonyshan_X

0 notes

jcmarchi · 5 months ago

Text

Singapore-based Firmus wins recognition for AI data centre design

New Post has been published on https://thedigitalinsider.com/singapore-based-firmus-wins-recognition-for-ai-data-centre-design/

Singapore-based Firmus wins recognition for AI data centre design

.pp-multiple-authors-boxes-wrapper display:none; img width:100%;

Singapore-based Firmus Technologies has been recognised with the Asia Pacific Data Centre Project of the Year award for its AI Factory facility.

The facility stands out for its advanced infrastructure and focus on energy efficiency, reflecting broader efforts to meet the rising demands of AI computing sustainably.

The AI Factory is part of Firmus’s ongoing initiative to transform existing ST Telemedia Global Data Centres (STT GDC) into GPU-powered AI computing platforms. The redesigned centres are equipped with state-of-the-art hardware and efficient cooling systems, enabling them to meet both enterprise and research needs with improved energy performance metrics.

As artificial intelligence continues to need more power, energy efficiency has become a major issue. Firmus has addressed the issue for nearly a decade with its AI Factory platform, which combines advanced immersion cooling technology with dependable design, build, and operation services. The company states its platform has several significant advantages, including:

Energy efficiency: 45% more FLOP per utility picoJoule than traditional data centres,

Cost-effectiveness: Up to 30% cheaper total cost of ownership (TCO) than direct-to-chip cooling platforms,

Scalability and sustainability: Supports high-density AI workloads while reducing environmental effects,

Global expertise: A track record in building and operating immersion-cooled data centres in Singapore and Australia.

The deployment of the AI Factory in Singapore shows how innovative approaches to data centre infrastructure can address the energy demands of AI. The project highlights a potential pathway for sustainable AI development by achieving a pPUE of 1.02 and a reduction in energy consumption of 45%. The achievement aligns with Singapore’s National AI Strategy 2.0, which emphasises sustainable growth in AI and data centre innovation.

Tim Rosenfield, co-CEO of Firmus Technologies, explained the broader vision behind the project, noting that it’s about balancing AI growth with sustainability. “By rethinking data centre design, we have created a platform that supports the growth of AI while promoting environmental sustainability. If we can do it in Singapore, where space is constrained and the humid climate is against us, we can do it anywhere,” he said.

Firmus has recently changed its leadership team, adding Dr. Daniel Kearney as chief technology officer. Previously AWS’s Head of Technology for the ASEAN Enterprise business, Kearney leads the engineering team at Firmus. He pointed out how sustainable AI infrastructure is becoming essential as AI technologies expand. “This win against established data centre players recognises the importance of technology like ours in meeting the growth of AI and the energy challenges it brings,” he said.

The company has been advancing its work through the Sustainable Metal Cloud (SMC), an initiative aimed at improving the efficiency and sustainability of AI infrastructure. Recent updates from Firmus include:

Power efficiency benchmarks: Firmus became the first to publish comprehensive power consumption data alongside performance results for the MLPerf Training benchmark,

Policy contributions: Insights from Tim Rosenfield contributed to the Tony Blair Institute for Global Change’s policy agenda on managing the energy demands of the AI sector,

Industry discussions: At ATxSG24, Firmus’s Chairman, Edward Pretty, joined a panel featuring organisations like NVIDIA, the World Bank, and Alibaba Cloud to explore the balance between sustainability and the computational needs of AI,

Hypercube expansion: Firmus’s team of 700 is installing the first fleet of Sustainable AI Factories, known as HyperCubes in multiple regions.

Engagement at NVIDIA GTC 2024: The company participated in two panels at NVIDIA’s GTC event, discussing sustainable AI infrastructure alongside partners like NVIDIA, Deloitte, and WEKA.

See also: The AI revolution: Reshaping data centres and the digital landscape

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Tags: artificial intelligence, data centre

0 notes

y2fear · 1 year ago

Photo

MLCommons Releases Latest MLPerf Tiny Benchmark Results for On-Device TinyML

0 notes

govindhtech · 7 months ago

Text

Google Trillium’s Cost-Effective Breakthrough In MLPerf 4.1

MLPerf 4.1 Benchmarks: Google Trillium Boosts AI Training with 1.8x Performance-Per-Dollar

The performance and efficiency of hardware accelerators are under unprecedented pressure due to the rapidly changing generative AI models. To meet the needs of next-generation models, Google introduced Trillium, its sixth-generation Tensor Processing Unit (TPU). From the chip to the system to its Google data center deployments, Trillium is specifically designed for performance at scale to support training on an incredibly huge scale.

Google’s first MLPerf training benchmark results for Trillium are presented today. According to the MLPerf 4.1 training benchmarks, Google Trillium offers an astounding 99% scaling efficiency (throughput) and up to 1.8x greater performance-per-dollar than previous-generation Cloud TPU v5p.

It provides a succinct performance study of Trillium in this blog, showing why it is the most effective and economical TPU training solution to date. Traditional scaling efficiency is the first system comparison metric Google briefly reviews. In addition to scaling efficiency, Google presents convergence scaling efficiency as an important parameter to take into account. It compares Google Trillium to Cloud TPU v5p and evaluates these two criteria in addition to performance per dollar. It wraps up by offering advice that will help you choose your cloud accelerators wisely.

Traditional performance metrics

Accelerator systems can be assessed and contrasted in a number of ways, including throughput scaling efficiency, effective throughput, and peak throughput. Although they are useful indications, none of these metrics account for convergence time.

Hardware details and optimal performance

Hardware characteristics like peak throughput, memory bandwidth, and network connectivity were the main focus of comparisons in the past. Although these peak values set theoretical limits, they are not very good at forecasting performance in the actual world, which is mostly dependent on software implementation and architectural design. The effective throughput of a system that is the right size for a given workload is the most important parameter because contemporary machine learning workloads usually involve hundreds or thousands of accelerators.

Performance of utilization

Utilization metrics that compare achieved throughput to peak capacity, such as memory bandwidth utilization (MBU) and effective model FLOPS usage (EMFU), can be used to measure system performance. However, business-value metrics like training time or model quality are not directly correlated with these hardware efficiency indicators.

Efficiency scaling and trade-offs

Both weak scaling (efficiency when increasing workload and system size proportionately) and strong scaling (performance improvement with system size for fixed workloads) are used to assess a system’s scalability. Although both metrics are useful indicators, the ultimate objective is to produce high-quality models as soon as possible, which occasionally justifies sacrificing scaling efficiency in favor of quicker training times or improved model convergence.

Convergence scaling efficiency is necessary

Convergence scaling efficiency concentrates on the core objective of training: effectively achieving model convergence, even while hardware usage and scaling indicators offer valuable system insights. The point at which a model’s output ceases to improve and the error rate stabilizes is known as convergence. The efficiency with which extra computing resources speed up the training process to completion is measured by convergence scaling efficiency.

The base case, in which a cluster of N₀ accelerators converges in time T₀, and the scaled case, in which N₁ accelerators take time T₁ to converge, are the two key measurements we use to determine convergence scaling efficiency. The following is the ratio of the increase in cluster size to the speedup in convergence time:

When time-to-solution increases by the same ratio as the cluster size, the convergence scaling efficiency is 1. Therefore, a convergence scaling efficiency as near to 1 as feasible is preferred.

Let’s now use these ideas to comprehend our ML Perf submission for the Google Trillium and Cloud TPU v5p training challenge for GPT3-175b.

Google Trillium’s performance

Google submitted the GPT3-175b training results for three distinct Cloud TPU v5p configurations and four distinct Google Trillium configurations. For comparison, it group the results by cluster sizes with the same total peak flops in the analysis that follows. For instance, 4xTrillium-256 is compared to the Cloud TPU v5p-4096 configuration, 8xTrillium-256 to the Cloud TPU v5p-8192 configuration, and so forth.

MaxText, Google’s high-performance reference solution for Cloud TPUs and GPUs, provides the foundation for all of the findings in this investigation.

Weak scaling efficiency

Trillium and TPU v5p both provide almost linear scaling efficiency for growing cluster sizes with correspondingly increased batch sizes:Weak scaling comparison for Trillium and Cloud TPU v5p

Relative throughput scaling from the base arrangement is seen in the above Figure as cluster sizes grow. Even when using Cloud TPU multislice technology to operate across data-center networks, Google Trillium achieves 99% scaling efficiency, surpassing the 94% scaling efficiency of Cloud TPU v5p cluster within a single ICI domain. A base configuration of 1024 chips (4x Trillium-256 pods) was employed for these comparisons, creating a consistent baseline with the smallest v5p submission (v5p-4096; 2048 chips). In comparison to its simplest configuration, which consists of two Trillium-256 pods, Trillium retains a robust 97.6% scaling efficiency.

Convergence scaling efficiency

As previously mentioned, convergence scaling efficiency takes time-to-solution into account, whereas weak scaling is helpful but insufficient as a value indication.Convergence scaling comparison for Trillium and Cloud TPU v5p.

It found that Google Trillium and Cloud TPU v5p have similar convergent scaling efficiency for the maximum cluster size. With a CSE of 0.8 in this case, the cluster size for the rightmost configuration was three times larger than the (base) configuration, and the time to convergence was 2.4 times faster than for the base configuration (2.4/3 = 0.8).

Although Google Trillium and TPU v5p have similar convergence scaling efficiency, Trillium excels in providing convergence at a reduced cost, which leads us to the final criteria.

Cost-to-train

Google has not yet examined the most important parameter, which is the cost of training, even if weak scaling efficiency and convergence scaling efficiency show the scaling characteristics of systems.Comparison of cost-to-train based on the wall-clock time and the on-demand list price for Cloud TPU v5p and Trillium.

Google Trillium achieves convergence to the same validation accuracy as TPU v5p while reducing the training cost by up to 1.8x (45%).

Making informed Cloud Accelerator choices

The complexity of comparing accelerator systems was examined in this paper, with a focus on the significance of examining more than just metrics to determine actual performance and efficiency. It discovered that whereas peak performance measures offer a place to start, they frequently fail to accurately forecast practical utility. Rather, measures such as Memory Bandwidth Utilization (MBU) and Effective Model Flops Utilization (EMFU) provide more insightful information about an accelerator’s performance.

In assessing how well systems function as workloads and resources increase, it also emphasized the crucial significance of scaling characteristics, including both strong and weak scaling. Convergence scaling efficiency, on the other hand, is the most objective metric it found since it guarantees that it is comparing systems based on their capacity to produce the same outcome rather than merely their speed.

Google Cloud demonstrated that Google Trillium reduces the cost-to-train while attaining comparable convergence scaling efficiency to Cloud TPU v5p and higher using these measures in its benchmark submission with GPT3-175b training to 1.8x greater performance per dollar. These findings emphasize how crucial it is to assess accelerator systems using a variety of performance and efficiency criteria.