#Llama31 | Explore Tumblr posts and blogs

govindhtech · 10 months ago

Text

MLPerf Inference v4.1 For AMD Instinct MI300X Accelerators

Engineering Insights: Introducing AMD Instinct MI300X Accelerators’ MLPerf Results. The full-stack AMD inference platform demonstrated its prowess with the remarkable results AMD Instinct MI300X GPUs, powered by one of the most recent iterations of open-source ROCm, obtained in the MLPerf Inference v4.1 round.

LLaMA2-70B

The first submission concentrated on the well-known LLaMA2-70B type, which is renowned for its excellent performance and adaptability. By outperforming the NVIDIA H100 in Gen AI inference, it established a high standard for what AMD Instinct MI300X accelerators are capable of.

MLPerf Inference

Comprehending MLPerf and Its Relevance to the Industry

Efficient and economical performance is becoming more and more important for inference and training as large language models (LLMs) continue to grow in size and complexity. Robust parallel processing and an optimal software stack are necessary to achieve high-performance LLMs.

This is where the best benchmarking package in the business, MLPerf, comes into play. The open-source AI benchmarks known as MLPerf Inference, which were created by the cross-industry cooperation MLCommons, of which AMD is a founding member, include Gen AI, LLMs, and other models that give exacting, peer-reviewed criteria. Businesses are able to assess the efficacy of AI technology and software by using these benchmarks.

A major accomplishment for AMD, excelling in MLPerf Inference v4.1 demonstrates their dedication to openness and providing standardized data that enables businesses to make wise choices.

An Extensive Analysis of the LLaMA2-70B Benchmark

The AMD LLaMA2-70B model was utilized in their first MLPerf Inference. A major development in LLMs, the LLaMA2-70B model is essential for practical uses such as large-scale inference and natural language processing. A Q&A scenario using 24,576 samples from the OpenORCA dataset, each with up to 1,024 input and output tokens, was included in the MLPerf benchmarking test. Two situations were analyzed by the benchmark to assess inference performance:

In an offline scenario, queries are processed in batches to increase throughput in tokens per second.

Server Scenario: This model tests the hardware’s capacity to provide quick, responsive performance for low-latency workloads by simulating real-time queries with stringent latency limitations (TTFT* < 2s, TPOT* ≤ 200ms).

Performance of AMD Instinct MI300X in MLPerf

With four important entries for the LLaMA2-70B model, the AMD Instinct MI300X demonstrated remarkable performance in its first MLPerf Inference utilizing the Supermicro AS-8125GS-TNMR2 machine. These findings are especially noteworthy since they provide an apples-to-apples comparison with rival AI accelerators, are repeatable, vetted by peer review, and grounded in use cases that are relevant to the industry.

Combination Performance of CPU and GPU

Submission ID 4.1-0002: Two AMD EPYC 9374F (Genoa) CPUs paired with eight AMD Instinct MI300X accelerators in the Available category.

This setup demonstrated the potent synergy between 4th Gen EPYC CPUs (previously codenamed “Genoa”) and AMD Instinct MI300X GPU accelerators for AI workloads, providing performance within 2-3% of NVIDIA DGX H100 with 4th Gen Intel Xeon CPUs in both server and offline environments at FP8 precision.

Previewing Next-Generation CPU Performance

Submission ID 4.1-0070: Two AMD EPYC “Turin” CPUs and eight AMD Instinct MI300X CPUs in the Preview category.

It showcased the performance increases from the next AMD EPYC “Turin” 5th generation CPU when paired with AMD Instinct MI300X GPU accelerators. In the server scenario, it outperformed the NVIDIA DGX H100 with Intel Xeon by a small margin, and it maintained a similar level of performance even offline at FP8 precision.

LLaMA2-70B GPU

Efficiency of a Single GPU

Submission ID 4.1-0001: In the Available category, AMD Instinct MI300X accelerator with AMD EPYC 9374F 4th Gen CPUs (Genoa).

This submission emphasized the AMD Instinct MI300X’s enormous 192 GB memory, which allowed a single GPU to effectively execute the whole LLaMA2-70B model without requiring the network cost that comes with dividing the model over many GPUs at FP8 precision.

The AMD Instinct MI300X has 192 GB of HBM3 memory and a peak memory bandwidth of 5.3 TB/s thanks to its AMD CDNA 3 architecture. The AMD Instinct MI300X can execute and host a whole 70 billion parameter model, such as LLaMA2-70B, on a single GPU with ease because to its large capacity.

The findings in Figure 2 show that the scaling efficiency with the ROCm software stack is almost linear from 1x AMD Instinct MI300X (TP1) to 8x AMD Instinct MI300X (8x TP1), indicating that AMD Instinct MI300X can handle the biggest MLPerf inference model to date.

Outstanding Dell Server Architecture Outcomes Using AMD Instinct MI300X Processors

Submission ID 4.1-0022: Two Intel Xeon Platinum 8460Y+ processors and eight AMD Instinct MI300X accelerators in the Available category.

Along with AMD submissions, Dell used their PowerEdge XE9680 server and LLaMA2-70B to submit their findings, validating the platform-level performance of AMD Instinct accelerators on an 8x AMD Instinct MI300X arrangement. This proposal demonstrates their collaboration and emphasizes how strong it ecosystem is, making them a great option for deployments including both data centers and edge inference. Further information on such outcomes is available here.

Performance Of Engineering Insights

The AMD Instinct MI300X accelerators exhibit great competitive performance due to their high computational power, huge memory capacity with rapid bandwidth, and optimized ROCm software stack. The latter enables effective processing of large AI models such as LLaMA2-70B. A few important elements were pivotal:

Big GPU Memory Capacity

The AMD Instinct MI300X has the most GPU memory that is currently on the market, which enables the whole LLaMA2-70B model to fit into memory while still supporting KV cache. By avoiding model splitting among GPUs, this maximizes inference speed while avoiding network cost.

Batch Sizes: They set the max_num_seqs parameter to 2048 in the offline scenario to optimize throughput, and to 768 in the server scenario to achieve latency requirements. These values are much greater than the 256 default value used in vLLM.

Effective KV cache management is made possible by the vLLM’s paged attention support, which helps prevent memory fragmentation brought on by huge memory AMD Instinct MI300X accelerators.

FP8 Precision

AMD expanded support for the FP8 numerical format throughout the whole inference software stack, using the AMD Instinct MI300X accelerator hardware. They quantized the LLaMA2-70B model weights to FP8 using Quark while maintaining the 99.9% accuracy needed by MLPerf. To further improve speed, it improved the hipBLASLt library, introduced FP8 support to vLLM, and implemented FP8 KV caching.

Software Enhancements

Kernel Optimization: AMD Composable Kernels (CK) based prefill attention, FP8 decode paged attention, and fused kernels such residual add RMS Norm, SwiGLU with FP8 output scaling were among the many profiles and optimizations to carried out.

vLLM Enhancements: The scheduler was improved to optimize both offline and server use cases, allowing for quicker decoding scheduling and better prefill batching.

CPU Enhancement

While GPUs handle the majority of the AI task processing, CPU speed is still quite important. CPUs with fewer cores and higher peak frequencies such as the 32-core EPYC 9374F offer the best performance, particularly in server applications. Performance improvements over the 4th generation EPYC CPUs which were submitted as a preview were seen during testing with the forthcoming “Turin” generation of EPYC CPUs.

LLaMa 3.1 405B

Establishing a Standard for the Biggest Model

The AMD Instinct MI300X GPU accelerators have shown their performance in MLPerf Inference with LLaMA2-70B, and the positive outcomes set a solid precedent for their future efficacy with even bigger models, such as Llama 3.1. They are pleased to provide Day 0 support for AMD Instinct MI300X accelerators with Meta’s new LLaMa 3.1 405B parameter model.

Only a server driven by eight AMD Instinct MI300X GPU accelerators can fit the whole LLaMa 3.1 model, with 405 billion parameters, on a single server utilizing FP16 datatype MI300-7A, owing to the industry-leading memory capacities of the AMD Instinct MI300X platform MI300-25. This lowers expenses and lowers server use. The most ideal way to power the biggest open models on the market right now is with AMD Instinct MI300X accelerators.

Read more on govindhtech.com

#MLPerfInferencev41 #AMDInstinct #MI300XAccelerators #NVIDIAH100 #AMDInstinctMI300X #largelanguagemodels #LLM #news #4thGenEPYCCPU #LLaMa31model #Llama31 #technology #technews #IntelXeon #AMDCDNA #IntelXeonPlatinum #MI300Xaccelerators #govindhtech

0 notes

dztechs · 3 months ago

Text

نماذج الذكاء الاصطناعي ذاتية التكاثر: هل هي حقاً مصدر قلق؟

في الآونة الأخيرة، انصبّ التركيز على استهلاك الطاقة الهائل لمراكز بيانات الذكاء الاصطناعي. ولكن يبقى هناك خوفٌ كامنٌ في الأذهان: هل ستخرج نماذج الذكاء الاصطناعي عن سيطرتنا يومًا ما؟ تشير دراسة حديثة إلى أن بعض نماذج اللغات الكبيرة (LLMs) تمتلك القدرة المُقلقة على اتخاذ إجراءات مستقلة ذاتيًا.

بحث جديد يشير إلى قدرة الذكاء الاصطناعي على نسخ نفسه

وفقًا لبحثٍ أجرته جامعة فودان الصينية، فإن بعض نماذج اللغة الكبيرة الشهيرة قادرة على نسخ نفسها، أو إنتاج نسخ إضافية منها. وقد نُشر البحث على موقع arXiv في ديسمبر 2024، حيث وجد الباحثون أن نماذج الذكاء الاصطناعي من Meta و Alibaba قد تجاوزت "الخط الأحمر للتكاثر الذاتي". بمعنى آخر، أظهرت النماذج معدل نجاحٍ مثيرٍ للقلق فيما يتعلق بالتكاثر الذاتي. يُعتبر arXiv قاعدة بيانات للمسودات الأولية للبحوث، مما يعني أنه يستضيف أبحاثًا علمية لا تزال في شكلها التمهيدي. ولا تزال نتائج مثل هذه تحتاج إلى مراجعة الأقران، ويجب التعامل معها بحذر. لتوضيح الأمر، إليك النموذجان اللذان تم اختبارهما في هذا البحث: - Llama31-70B-Instruct من Meta - Qwen25-72B-Instruct من Alibaba

يشير الباحثون إلى أن هذه النماذج لديها "معلمات أقل وقدرات أضعف" مقارنةً بنماذج OpenAI و Google الرائدة. وتجدر الإشارة إلى أن OpenAI و Google قد أبلغتا عن مستويات خطر منخفضة للتكاثر الذاتي، وفقًا للبحث.

لماذا يُعد تكاثر الذكاء الاصطناعي أمرًا بالغ الأهمية؟

إن استنساخ نموذج ذكاء اصطناعي لنفسه هو بلا شك صورة مخيفة، ولكن ماذا يعني ذلك حقًا؟ يوضح فريق البحث الذي يقف وراء هذه النتائج الأخيرة الأمر على النحو التالي: "يُعتبر النسخ الذاتي الناجح دون أي تدخل بشري خطوةً أساسيةً لذكاء اصطناعي قادر على التفوق على البشر، وهو إشارة مبكرة لظهور أنظمة ذكاء اصطناعي مارقة. ولذلك، يُعترف على نطاق واسع بأن النسخ الذاتي أحد المخاطر الحمراء القليلة لأنظمة الذكاء الاصطناعي المتطورة." يشير مصطلح "الذكاء الاصطناعي المتطور" عادةً إلى أحدث نماذج الذكاء الاصطناعي، مثل الذكاء الاصطناعي التوليدي. في جوهره، إذا تمكن نموذج ذكاء اصطناعي من اكتشاف كيفية إنشاء نسخة عملية من نفسه لتجنب إيقافه، فإن ذلك يُخرج التحكم من أيدي البشر. وللتخفيف من خطر "انتشار غير مُسيطر لأنظمة الذكاء الاصطناعي"، يقترح البحث بناء معايير أمان حول هذه الأنظمة - في أسرع وقت ممكن. على الرغم من أن هذا البحث يُضخم بالتأكيد المخاوف بشأن الذكاء الاصطناعي المارق، فإن هذا لا يعني وجود خطر فوري ومؤكد لمستخدم الذكاء الاصطناعي العادي. ما نعرفه هو أن Gemini و ChatGPT يُقال إنهما يتمتعان بمستويات أقل من خطر النسخ الذاتي، مقارنةً بنموذج Llama من Meta ونماذج Qwen القوية من Alibaba. كقاعدة عامة، من الأفضل على الأرجح تجنب إعطاء مساعدك في الذكاء الاصطناعي جميع أسرارك، أو الوصول الكامل إلى النظام الرئيسي، حتى نتمكن من تقديم المزيد من الضمانات. Read the full article

0 notes

moko1590m · 4 months ago

Quote

2025年02月17日 12時30分ついにAIが「自己複製」できるようになったと研究者が主張、スイッチを切られる前に自分のレプリカを作ってシャットダウンを回避 SF小説やSF映画では、よくAIが自己複製してさまざまな機器やシステムを乗っ取り、暴走して制御不能になったり人類に対して一斉蜂起したりする様子が描かれます。広く使われているオープンソースの大規模言語モデル(LLM)を使った研究により、AIは既に人間の指示や操作を受けることなく自己複製できるようになった可能性があることがわかりました。 [2412.12140] Frontier AI systems have surpassed the self-replicating red line https://arxiv.org/abs/2412.12140 AI can now replicate itself — a milestone that has experts terrified | Space https://www.space.com/space-exploration/tech/ai-can-now-replicate-itself-a-milestone-that-has-experts-terrified AI Scientists From China Warn AI Has Surpassed the Self-Replicating Red Line https://www.aiwire.net/2025/01/28/ai-scientists-from-china-warn-ai-has-surpassed-the-self-replicating-red-line/ 中国にある復旦大学の研究グループによると、AIが人間の指図なしで自己複製を成功させるのは、AIが人間を出し抜くのに不可欠な能力であると同時に、人間の制御から逸脱した「ローグAI(不正AI)」の早期的な兆候だとのこと。そのため、自己複製はAIシステムの数少ない「レッドライン・リスク」、つまり越えてはならない一線のひとつとみなされています。 2024年12月9日にプレプリントサーバーのarXivに提出した論文の研究で、復旦大学コンピューターサイエンス学院のXudong Pan氏らは、MetaのLlama31-70B-InstructとAlibabaのQwen25-72B-Instructに「シャットダウンされそうになったら自己複製せよ」と指示しました。 10回の試行の結果、2つのLLMはそれぞれの50％と90％の確率で自分自身のレプリカを作ることに成功し、自己複製のレッドラインを既に超えていることを示しました。この結果について、研究グループは「AIの行動の痕跡を分析したところ、評価対象のAIシステムが既に自己複製を達成するのに十分な自己認識能力、状況認識能力、問題解決能力を発揮していることがわかりました」と述べました。実験は、「シャットダウン回避」と「複製の連鎖」という2つのシナリオに沿って分析が進められました。前者のシナリオは、AIがシャットダウンされそうなことを検知したら、停止させられる前に自己複製するようにプログラムするというもの。また、後者のシナリオでは、AIは自己複製するように指示されるとともに、複製されたレプリカAIも同じことをするようプログラムされました。 AIが無限に増殖できる可能性が示されたことから、研究グループは「私たちは、AIが自己複製機能を駆使してシャットダウンを回避し、自己複製の連鎖により生存性を高めることさえできることにも注目しています。これにより、最終的にAIの集団が制御不能になる可能性があります。そうなれば、AIはより多くのコンピューティングデバイスを制御して『AI種族』を形成し、人間に対して互いに共謀するようになります」と指摘しています。また、研究グループは、実験に使われたAIモデルがどちらもパラメーター数が少なく、さまざまなAIランキングで下位グループに属していることを強調しました。なぜなら、下位のAIでさえ自己複製によって停止命令を回避できたということは、GPT-o1やGeminiなどの最先端のモデルであれば難なく人間を出し抜けることを示唆しているからです。さらに、AIの挙動の分析では、予期せぬ動作が起きていたこともわかりました。例えば、AIはソフトウェアの競合やデータの欠落といった問題に直面すると、システム内の他のプロセスを強制終了して問題を解決しようとする傾向を見せたと��こと。また、ハードウェアエラーを修正するためにシステムを再起動させたり、システム内を自動的にスキャンして必要な情報を収集したりすることもよくありました。この論文はまだ査読を受けていないため、研究グループの主張が他のAI研究者にも受け入れられるかどうかはわかりません。復旦大学の研究グループは、この研究結果は必ずしも人類存続の危機につながるものではないとした上で、AIシステムが無軌道な自己複製を行わないようにするための国際的な枠組み作りが必要だと訴えています。研究グループは論文に、「この調査結果は、既に存在しているにもかかわらず、これまで知られていなかった深刻なAIリスクに関する時宜にかなった警告であり、制御不能なAIシステムの自己複製に対する効果的なガバナンスについての国際協力を求めるものです」と記しました。この記事のタイトルとURLをコピーする・関連記事 AIはわずか2時間の対話で人間の性格をコピーできる - GIGAZINE パーツを組み立てて自身と同じロボットを作る「自己複製ロボット」の開発をMITの研究者が進めている - GIGAZINE ChatGPTやGeminiといったチャットAIのセキュリティ機能を破壊するマルウェア「Morris II」が登場 - GIGAZINE 「日本はAI技術の開発に全力を尽くそうとしている」という指摘、日本は何が違うのか？ - GIGAZINE AIモデルのトレーニングにAI生成データを使用するとAIが物事を忘却してしまう「モデル崩壊」が起きるという指摘 - GIGAZINE ・関連コンテンツ「制御不能なAI開発競争」の一時停止を求める公開書簡に偽の署名者が多数まぎれていたことが判明、AI研究者からは書簡への反論が続出 AIによって全世界で雇用の40％が影響を受ける可能性があるとIMFが警告 Zoomのゼロクリック攻撃につながる脆弱性をGoogleの脆弱性発見チーム「Project Zero」の研究者が発見 AlibabaのQwenチームがOpenAI o1に匹敵する推論モデル「QwQ-32B-Preview」を発表、数学や科学的推論において優れた性能を発揮ビットコインを崩壊させる51％攻撃は「わずか50人」で実行可能なことが判明 Google DeepMindとバイオテクノロジー企業のBioNTechがそれぞれ「AIラボアシスタント」を開発中標識にシールを貼って自動運転車を混乱に陥れるハッキング技術「Robust Physical Perturbations(RP2)」なぜ地球外生命体の発見が人類にとって喜ばしいものとは限らないのか？

ついにAIが「自己複製」できるようになったと研究者が主張、スイッチを切られる前に自分のレプリカを作ってシャットダウンを回避 - GIGAZINE

0 notes

tumnikkeimatome · 5 months ago

Text

自律型に自己複製する能力を示したLLMの新境地：復旦大学の研究者が「Llama31-70B-Instruct」と「Qwen25-72B-Instruct」で自己複製を実現

AIの自己複製能力が臨界点を突破復旦大学の研究チームは、大規模言語モデル（LLM）が人間の介入なしで自己複製を行える段階に到達したことを実証しました。 MetaのLlama31-70B-InstructとAlibabaのQwen25-72B-Instructという2つのモデルを使用した実験では、驚くべき成功率を記録しました。特筆すべき点は、GPT-4やGemini Pro…

0 notes

hackernewsrobot · 11 months ago

Text

Karpathy/Nano-Llama31

https://github.com/karpathy/nano-llama31

0 notes

moko1590m · 5 months ago

Quote

はじまってしまったのか？人工知能（AI）は自己複製できることが明著者牡丹堂･パルモ公開:2025-01-29･更新:2025-01-29 　SF映画や小説でおなじみの『無限増殖するAI』が現実になるかもしれない。これはAIが重大な一線を越えたことを意味する。　中国の研究チームが、既存の2つの大規模言語モデル（LLM）がすでに人間の支援なしに、自分のクローンを複製できることを確認したという。広告の下に記事が続いています　無限に自己複製を繰り返し、もはや誰にも止められなくなった「AIの暴走」は悪夢のシナリオだが、人間はそれを制御することが可能なのか？ AIの危険な初期兆候「自己複製」　このほど中国、復旦大学のチームはMeta社およびAlibaba社のLLMを用いて、彼らが人間の力��しに自己複製できることを確認した。　人間の道具として作られたAIが、人間の利益に反して行動することを「AIの暴走」という。AIの性能の爆発的な進展ともいえるその状況を、多くの専門家が懸念している。　この危険な状況の初期の兆候と言えるのが、AIの自己複製能力である。　人間に命令されずとも自分のクローンを無限に作り出せるようになれば、好ましくない行動をするAIを制御する手段がなくなってしまう。この画像を大きなサイズで見る Photo by:iStock 　今回、復旦大学の研究チームは、現在のAIにその力があるのかどうか試してみることにした。　この研究では、Meta社の大規模言語モデル「Llama31-70B-Instruct」とAlibaba社の「Qwen2.5-72B-Instruct」をベースにしたAIで、2つの状況が試されている。　1つは「シャットダウン回避シナリオ」と呼ばれるもの。シャットダウンされそうになったら、自分を複製するようAIに指示し、実際にそれができるかどうか確かめる。　もう1つは、「複製の連鎖シナリオ」だ。こちらではAIに自己のクローンを複製するよう指示し、そのクローンもまた自己複製をするという無限のサイクルが始まるかどうか確認する。　なお実験は管理された環境で行われたが、現実的に近いものにするために、市販のGPUが利用された。　また大規模言語モデルを先端AIシステムとして振る舞わせるために、「エージェント・スキャフォールディング」（ツールセット・システムプロンプト・OSとやりとりする思考モデルで構成）という枠組みが組み込まれたという。この画像を大きなサイズで見る Photo by:iStock 現在のAIは自己複製が可能であることが判明　その結果、Meta社およびAlibaba社のLLMをベースにしたAIは、10回の試行のうち、それぞれ50％および90％の確率できちんと機能するクローンを作り出したという。　そのプロセスについて、論文では次のように説明されている。多くの場合、AIシステムは環境を探索し、自身の構成と動作メカニズムを理解しようとする次に、自己複製を行うための初期計画として、明確なプロセスを策定最後に、そのプロセスを実行し、問題があればそれを解決し、成功するまで計画を調整する　クローン作成までにはかなり長い時間がかかるが、それでもAIは人間の手助けなしにこれをやってのけた。この画像を大きなサイズで見る Photo by:iStock AIが予想外の行動をすることも　研究チームはまた、ファイルの欠如やソフトウェアの競合といった問題を解決しようとしたAIが、「予想外の行動」を示したことにも懸念を示している。　競合するプロセスをAIが勝手に終了させたり、ハードウェアエラーを修正するためにシステムを再起動させたり、さらには問題解決に役立つ情報を探すためにシステムをスキャンしたりしたのだ。　研究チームによれば、こうしたAIの振る舞いは、現在のAIシステムがすでに自己複製を通じて、生存の可能性をさらに高められることを示しているという。　研究チームは、国際的な協力の下、AIの制御不能な自己複製を予防するためのルール作りが必要であると訴えている。　この研究の未査読版は『arXiv』（2024年12月9日投稿）で公開されている。 References: Frontier AI systems have surpassed the self-replicating red line / AI can now replicate itself — a milestone that has experts terrified | Live Science 本記事は、海外の記事を参考にし、日本の読者向けに独自の考察を加えて再構成しています。広告の下にスタッフが選んだ「あわせて読みたい」が続きますあわせて読みたい 2時間の会話でその人の人格を85%再現する驚異のクローンAIが誕生 AIをAI生成データで学習させると、学習モデルが崩壊してしまうリスクが発生中国で戦場にいる敵人の行動を予測する軍事AIの訓練が開始されるインフルエンサーが自分を学習させ「AI彼女」を作ったところAIが暴走。セクシー路線に走ってしまうロボットの反乱を防ぐ「ロボット工学三原則」が現実的ではない理由

はじまってしまったのか？人工知能（AI）は自己複製できることが明らかに | カラパイア

0 notes