Tumgik
#AIaccelerated
Text
Battling Bakeries in an AI Arms Race! Inside the High-Tech Doughnut Feud
Tumblr media
1 note · View note
govindhtech · 4 months
Text
Aurora Supercomputer Sets a New Record for AI Tragic Speed!
Tumblr media
Intel Aurora Supercomputer
Together with Argonne National Laboratory and Hewlett Packard Enterprise (HPE), Intel announced at ISC High Performance 2024 that the Aurora supercomputer has broken the exascale barrier at 1.012 exaflops and is now the fastest AI system in the world for AI for open science, achieving 10.6 AI exaflops. Additionally, Intel will discuss how open ecosystems are essential to the advancement of AI-accelerated high performance computing (HPC).
Why This Is Important:
From the beginning, Aurora was intended to be an AI-centric system that would enable scientists to use generative AI models to hasten scientific discoveries. Early AI-driven research at Argonne has advanced significantly. Among the many achievements are the mapping of the 80 billion neurons in the human brain, the improvement of high-energy particle physics by deep learning, and the acceleration of drug discovery and design using machine learning.
Analysis
The Aurora supercomputer has 166 racks, 10,624 compute blades, 21,248 Intel Xeon CPU Max Series processors, and 63,744 Intel Data Centre GPU Max Series units, making it one of the world’s largest GPU clusters. 84,992 HPE slingshot fabric endpoints make up Aurora’s largest open, Ethernet-based supercomputing connection on a single system.
The Aurora supercomputer crossed the exascale barrier at 1.012 exaflops using 9,234 nodes, or just 87% of the system, yet it came in second on the high-performance LINPACK (HPL) benchmark. Aurora supercomputer placed third on the HPCG benchmark at 5,612 TF/s with 39% of the machine. The goal of this benchmark is to evaluate more realistic situations that offer insights into memory access and communication patterns two crucial components of real-world HPC systems. It provides a full perspective of a system’s capabilities, complementing benchmarks such as LINPACK.
How AI is Optimized
The Intel Data Centre GPU Max Series is the brains behind the Aurora supercomputer. The core of the Max Series is the Intel X GPU architecture, which includes specialised hardware including matrix and vector computing blocks that are ideal for AI and HPC applications. Because of the unmatched computational performance provided by the Intel X architecture, the Aurora supercomputer won the high-performance LINPACK-mixed precision (HPL-MxP) benchmark, which best illustrates the significance of AI workloads in HPC.
The parallel processing power of the X architecture excels at handling the complex matrix-vector operations that are a necessary part of neural network AI computing. Deep learning models rely heavily on matrix operations, which these compute cores are essential for speeding up. In addition to the rich collection of performance libraries, optimised AI frameworks, and Intel’s suite of software tools, which includes the Intel oneAPI DPC++/C++ Compiler, the X architecture supports an open ecosystem for developers that is distinguished by adaptability and scalability across a range of devices and form factors.
Enhancing Accelerated Computing with Open Software and Capacity
He will stress the value of oneAPI, which provides a consistent programming model for a variety of architectures. OneAPI, which is based on open standards, gives developers the freedom to write code that works flawlessly across a variety of hardware platforms without requiring significant changes or vendor lock-in. In order to overcome proprietary lock-in, Arm, Google, Intel, Qualcomm, and others are working towards this objective through the Linux Foundation’s Unified Acceleration Foundation (UXL), which is creating an open environment for all accelerators and unified heterogeneous compute on open standards. The UXL Foundation is expanding its coalition by adding new members.
As this is going on, Intel Tiber Developer Cloud is growing its compute capacity by adding new, cutting-edge hardware platforms and new service features that enable developers and businesses to assess the newest Intel architecture, innovate and optimise workloads and models of artificial intelligence rapidly, and then implement AI models at scale. Large-scale Intel Gaudi 2-based and Intel Data Centre GPU Max Series-based clusters, as well as previews of Intel Xeon 6 E-core and P-core systems for certain customers, are among the new hardware offerings. Intel Kubernetes Service for multiuser accounts and cloud-native AI training and inference workloads is one of the new features.
Next Up
Intel’s objective to enhance HPC and AI is demonstrated by the new supercomputers that are being implemented with Intel Xeon CPU Max Series and Intel Data Centre GPU Max Series technologies. The Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA) CRESCO 8 system will help advance fusion energy; the Texas Advanced Computing Centre (TACC) is fully operational and will enable data analysis in biology to supersonic turbulence flows and atomistic simulations on a wide range of materials; and the United Kingdom Atomic Energy Authority (UKAEA) will solve memory-bound problems that underpin the design of future fusion powerplants. These systems include the Euro-Mediterranean Centre on Climate Change (CMCC) Cassandra climate change modelling system.
The outcome of the mixed-precision AI benchmark will serve as the basis for Intel’s Falcon Shores next-generation GPU for AI and HPC. Falcon Shores will make use of Intel Gaudi’s greatest features along with the next-generation Intel X architecture. A single programming interface is made possible by this integration.
In comparison to the previous generation, early performance results on the Intel Xeon 6 with P-cores and Multiplexer Combined Ranks (MCR) memory at 8800 megatransfers per second (MT/s) deliver up to 2.3x performance improvement for real-world HPC applications, such as Nucleus for European Modelling of the Ocean (NEMO). This solidifies the chip’s position as the host CPU of choice for HPC solutions.
Read more on govindhtech.com
0 notes
solidrun · 2 months
Text
Explore our latest article on the Hailo-15, the cutting-edge AI accelerator that's setting new standards in performance and efficiency. Discover how this advanced SoC, integrated into SolidRun's platforms, is driving innovation across various applications—from edge computing to autonomous systems.
Don’t miss out on learning how the Hailo-15 can revolutionize your AI projects!
Read the full article here!
0 notes
forlinx · 3 months
Text
Announcing the new Forlinx FET-MX95xx-C SoM! Based on the NXP i.MX95 series flagship processor, it integrates 6 Cortex-A55 cores, Cortex-M7 and Cortex-M33 cores. With a powerful 2TOPS NPU, it offers excellent AI computing capability, ideal for edge computing, smart cockpits, Industry 4.0 and more.
Tumblr media
The FET-MX95xx-C features rich interfaces including 5x CAN-FD, 1x 10GbE, 2x GbE, 2x PCIe Gen3, and an embedded ISP supporting 4K@30fps video capture with powerful GPU acceleration. It also has strong safety features compliant with ASIL-B and SIL-2 standards.
Want to learn more? Check out the Forlinx website
Let's explore Forlinx's latest technology together and power up your products!
0 notes
toptrends111 · 7 months
Text
Tumblr media
Artificial Intelligence (AI) Updates
"Top Trends LLC (DBA ""Top Trends"") is a dynamic and information-rich web platform that empowers its readers with a broad spectrum of knowledge, insights, and data-driven trends. Our professional writers, industry experts, and enthusiasts dive deep into Artificial Intelligence, Finance, Startups, SEO and Backlinks.
0 notes
rtc-tek · 8 months
Text
Tumblr media
Our AI-based performance testing accelerator automates the scripting process reducing the scripting time to 24 hours only. This brings a drastic improvement from the conventional weeks-long scripting process. Unlike the manual scripting standards, our accelerator rapidly addresses diverse testing scenarios within minutes, offering unmatched efficiency. This not only accelerates the testing process but also ensures adaptability to various scenarios.
Talk to our performance experts at https://rtctek.com/contact-us/. Visit https://rtctek.com/performance-testing-services to learn more about our services.
0 notes
futurride · 9 months
Link
0 notes
Text
RISE เปิดโครงการ AI Accelerator เป็นครั้งแรกใน SEA
Tumblr media
สถาบันเร่งสปีดนวัตกรรมองค์กร เปิดตัวโปรแกรม AI Accelerator เป็นครั้งแรกในเอเชียตะวันออกเฉียงใต้ เดินหน้าสร้างองค์ความรู้ด้านปัญญาประดิษฐ์ สู่ต้นแบบธุรกิจหลายหลายบริการ ซึ่งจะเป็นโปรแกรมเร่งสปีดการนำเทคโนโลยีปัญญาประดิษฐ์ หรือ Artificial Intelligence (AI) มาใช้ในองค์กร มุ่งเน้นให้เกิดผลลัพธ์ทางธุรกิจที่จับต้องได้ และสามารถตอบโจทย์ขององค์กรชั้นนำในภูมิภาคเอเชียตะวันออกเฉียงใต้ RISE.AI เป็นโปรแกรมเร่งสปีดนวัตกรรมในด้าน AI สำหรับองค์กร โดย RISE ทำงานร่วมกับเครือข่ายพาร์ทเนอร์ที่ครอบคลุมทั้งในภูมิภาคเอเชียตะวันออกเฉียงใต้และทั่วโลก โดยใช้ความเชี่ยวชาญของ RISE ในการนำ AI มาใช้เพื่อพัฒนานวัตกรรมองค์กร เพื่อมุ่งเน้นให้เกิดผลลัพธ์ที่เป็นรูปธรรมและนำไปใช้ได้จริง ด้วยการเชื่อมต่อแนวคิดเชิงนวัตกรรมเข้ากับแนวทางปฏิบัติที่ประยุกต์ใช้ได้ในการเร่งสปีดการพัฒนาเทคโนโลยี AI โปรแกรม RISE.AI นี้ มีจุดมุ่งหมายเพื่อรวบรวมสตาร์ทอัพที่มีนวัตกรรม AI ที่ดีที่สุดจากทั่วโลก และผู้เชี่ยวชาญด้านเทคโนโลยีปัญญาประดิษฐ์มาร่วมกันพัฒนาโครงการนำร่องต่าง ๆ กับบริษัทชั้นนำในภาคธุรกิจต่าง ๆ เช่น การเงิน & การธนาคาร ประกันภัย พลังงาน และเทคโนโลยีสะอาด เป็นต้น องค์กรชั้นนำในประเทศไทยที่เข้าร่วมโปรแกรมนี้ ได้แก่ บริษัท ปตท. สำรวจและผลิตปิโตรเลียม จำกัด (มหาชน) (ปตท.สผ) บริษัท เอไอ แอนด์ โรโบติกส์ เวนเจอร์ส จำกัด ธนาคารกรุงศรีอยุธยา จำกัด (มหาชน) และสำนักงานส่งเสริมเศรษฐกิจดิจิทัล (depa) โดยโปรแกรมนี้จะจัดขึ้นในช่วงเดือนเมษายน – กันยายน พ.ศ. 2562
Tumblr media
นายณัฐภัทร ธเนศวรกุล Head of Ventures สถาบันเร่งสปีดนวัตกรรมองค์กร หรือ RISE นายณัฐภัทร ธเนศวรกุล Head of Ventures สถาบันเร่งสปีดนวัตกรรมองค์กร หรือ RISE– Regional Corporate Innovation Accelerator กล่าวว่า เทคโนโลยีปัญญาประดิษฐ์จะเป็นตัวขับเคลื่อนหลักที่ช่วยกระตุ้นการเติบโตของจีดีพีโดยรวมของประเทศไทยและภูมิภาคเอเชียตะวันออกเฉียงใต้ ซึ่งการจัดตั้งวัฒนธรรมที่ขับเคลื่อนด้วยข้อมูลในองค์กรธุรกิจ จะช่วยให้องค์กรระดับภูมิภาคต่าง ๆ สามารถปรับกลยุทธ์ทางธุรกิจให้เหมาะสมและยกระดับผลิตภัณฑ์ และบริการต่าง ๆ เพื่อนำไปสู่การเติบโตทางเศรษฐกิจที่ยั่งยืน ขณะนี้ อุตสาหกรรมของ AI กำลังเติบโตและมีผลกระทบอย่างมากต่อทั้งธุรกิจและสังคม ซึ่งการเพิ่มขึ้นของการใช้ปัญญาประดิษฐ์ก่อให้เกิดการปรับเปลี่ยนและพัฒนามากมายในอุตสาหกรรมต่าง ๆ ทำให้องค์กรเปลี่ยนแนวทางการดำเนินธุรกิจ ปฏิรูปวิธีการ เปลี่ยนแปลงสภาพแวดล้อม เพื่อให้องค์กรสามารถทำธุรกิจของตนเอง และแข่งขันในเศรษฐกิจโลกได้ ดังนั้น ธุรกิจต่าง ๆ จึงจำเป็นต้องนำเทคโนโลยีใหม่มาใช้ในการรับมือกับการเปลี่ยนแปลงทางเทคโนโลยีอย่างรวดเร็วและสร้างโอกาสทางธุรกิจใหม่ ๆ เพื่อช่วยเสริมสร้างความสามารถในการแข่งขันและเพิ่มรายได้ให้สูงขึ้น อีกทั้ง จากข้อมูลวิจัยของ McKinsey ได้ระบุว่า การปรับใช้ AI จะส่งผลทำให้กำไรของธุรกิจต่าง ๆ ในทุกภาคธุรกิจเพิ่มขึ้นอย่างมากในปี พ.ศ. 2578 โดยเฉพาะอย่างยิ่งด้านการศึกษา การให้บริการที่พัก & อาหาร และการก่อสร้าง ซึ่งคาดว่าจะเพิ่มสูงขึ้นมากกว่า 70% นอกจากนี้ มีการคาดว่าการใช้ AI ในธุรกิจค้าส่งและค้าปลีก การเกษตร ป่าไม้ การประมงและการดูแลสุขภาพจะทำให้ผลกำไรเพิ่มขึ้นมากกว่า 50% รวมทั้งเมื่อพิจารณาถึงความได้เปรียบจากการนำ AI มาใช้ในตลาดก่อนคู่แข่งขัน ในขณะนี้ ธุรกิจต่าง ๆ มีความกระตือรือร้นที่จะพัฒนาขีดความสามารถด้าน AI ของตนเองเพื่อสร้างความได้เปรียบในการแข่งขันดังกล่าว “อย่างไรก็ตาม การนำ AI มาใช้นั้นจะต้องใช้เวลานานและมีค่าใช้จ่ายสูง บริษัทส่วนใหญ่ในเอเชียตะวันออกเฉียงใต้ กำลังเผชิญหน้ากับปัญหาทรัพยากรไม่เพียงพอในการพัฒนาเทคโนโลยี AI ภายในองค์กร และยังไม่สามารถเข้าถึงนักพัฒนา AI ทั่วโลกได้อีกด้วย สภาพแวดล้อมเหล่านี้ คือเหตุผลว่าทำไมโปรแกรม RISE.AI จึงถูกออกแบบให้เชื่อมโยงกับองค์กรต่าง ๆ และนักพัฒนา AI ทั่วโลกที่มีคุณสมบัติเหมาะสมเข้าไว้ด้วยกัน เพื่อร่วมกันทำงานที่มีศักยภาพและรักษาความสามารถในการแข่งขันในเศรษฐกิจโลกที่กำลังมีการเปลี่ยนแปลง” นายณัฐภัทร กล่าว
Tumblr media
นายธนา สราญเวทย์พันธุ์ ผู้จัดการอาวุโสสายงานบริหารเทคโนโลยีและองค์ความรู้ ปตท.สผ ด้านนายธนา สราญเวทย์พันธุ์ ผู้จัดการอาวุโสสายงานบริหารเทคโนโลยีและองค์ความรู้ ปตท.สผ กล่าวว่า “ ปตท.สผ ได้วางแผนในการนำเทคโนโลยี AI มาใช้ในหลายส่วนที่สำคัญขององค์กร เพื่อยกระดับการดำเนินธุรกิจขององค์กร ทางเรารู้สึกตื่นเต้นที่จะเข้ามาเป็นหนึ่งในพันธมิตรที่มีส่วนร่วมอย่างเป็นทางการของโปรแกรม RISE.AI ที่จะช่วยให้ ปตท.สผ ค้นหาสตาร์ทอัพที่ดีที่สุดจากทั่วโลกมาผลักดันนวัตกรรมองค์กรด้าน AI RISE.AI เป็นโปรแกรมเร่งสปีด AI ที่ช่วยให้องค์กรต่าง ๆ สามารถเข้าถึงแหล่งคอมมูนิตี้ AI ทั่วโลก โดยโปรแกรมนี้จะคัดเลือกสตาร์ทอัพจากความสามารถในการแก้ไขปัญหาในโจทย์ที่ได้รับจากแต่ละองค์กร ทั้งนี้ สตาร์ทอัพด้าน AI ทั้งหมดที่ได้รับคัดเลือกให้เข้าร่วมโปรแกรม จะมีโอกาสเข้าร่วมแคมป์เพื่อร่วมกันพัฒนาโครงการนำร่องต่าง ๆ เป็นเวลา 9 สัปดาห์กับพันธมิตรองค์กรชั้นนำต่าง ๆ ของ RISE และรับการให้คำปรึกษาส่วนตัวจากผู้เชี่ยวชาญด้าน AI จาก New York University Tandon Future Labs เพื่อทำให้มั่นใจได้ว่าในโครงการที่สร้างขึ้นภายในกรอบเวลาของโปรแกรมมีศักยภาพระดับสากล นอกจากนั้น ด้วยโปรแกรมการประเมินเชิงกลยุทธ์และการให้คำปรึกษาจากพันธมิตรผู้เชี่ยวชาญของ RISE.AI จะทำให้ RISE.AI เป็นแพลตฟอร์มที่มีแนวโน้มในการพัฒนา AI ขององค์กรในภูมิภาคเอเชียตะวันออกเฉียงใต้ได้สำเร็จ ซึ่งโปรแกรมดังกล่าวได้เปิดตัวอย่างเป็นทางการแล้วในเดือนเมษายน พ.ศ. 2562 โดยมีแผนที่จะจัดงานโรดโชว์ในเมืองใหญ่ 10 แห่ง ทั่วเอเชีย ได้แก่ กรุงเทพฯ สิงคโปร์ โตเกียว เมืองโฮจิมินห์ ปักกิ่ง หางโจว เซินเจิ้น ฮ่องกง โซล และไทเป ด้าน มีนา ซาลิบ ผู้อำนวยการโครงการจาก New York University Tandon Future Labs ซึ่งเป็นหนึ่งในพันธมิตรทางกลยุทธ์ ของ RISE.AI กล่าวว่า “ โปรแกรม RISE.AI เป็นโอกาสที่ดีที่สุด สำหรับสตาร์ทอัพ ที่ต้องการการขยายตัวและการเติบโตของธุรกิจในเอเชียตะวันออกเฉียงใต้ เพราะองค์กรธุรกิจในภูมิภาคนี้ทุ่มการลงทุนในเทคโนโลยี AI เป็นอย่างมาก ดังนั้น ผมจึงสนับสนุนให้สตาร์ทอัพ AI จากทั่วโลกเข้าร่วมกับ RISE.AI ซึ่งถือเป็น Corporate AI Accelerator ครั้งแรกในภูมิภาคเอเชียตะวันออกเฉียงใต้” ลิงค์ที่เกี่ยวข้อง www.riseaccel.com Read the full article
0 notes
maintec · 2 years
Photo
Tumblr media
Benefits of the new IBM z16 Mainframe• Accelerated AIAccelerated AI insights to create new value for business• Cyber resiliencyProtect against current and future threats by securing your data• ModernizationSpeed up modernization of workloads and integrate them seamlessly across the hybrid cloud Source- IBM Contact us for Mainframe Services   
0 notes
govindhtech · 6 months
Text
Genio 510: Redefining the Future of Smart Retail Experiences
Tumblr media
Genio IoT Platform by MediaTek
Genio 510
Manufacturers of consumer, business, and industrial devices can benefit from MediaTek Genio IoT Platform’s innovation, quicker market access, and more than a decade of longevity. A range of IoT chipsets called MediaTek Genio IoT is designed to enable and lead the way for innovative gadgets. to cooperation and support from conception to design and production, MediaTek guarantees success. MediaTek can pivot, scale, and adjust to needs thanks to their global network of reliable distributors and business partners.
Genio 510 features
Excellent work
Broad range of third-party modules and power-efficient, high-performing IoT SoCs
AI-driven sophisticated multimedia AI accelerators and cores that improve peripheral intelligent autonomous capabilities
Interaction
Sub-6GHz 5G technologies and Wi-Fi protocols for consumer, business, and industrial use
Both powerful and energy-efficient
Adaptable, quick interfaces
Global 5G modem supported by carriers
Superior assistance
From idea to design to manufacture, MediaTek works with clients, sharing experience and offering thorough documentation, in-depth training, and reliable developer tools.
Safety
IoT SoC with high security and intelligent modules to create goods
Several applications on one common platform
Developing industry, commercial, and enterprise IoT applications on a single platform that works with all SoCs can save development costs and accelerate time to market.
MediaTek Genio 510
Smart retail, industrial, factory automation, and many more Internet of things applications are powered by MediaTek’s Genio 510. Leading manufacturer of fabless semiconductors worldwide, MediaTek will be present at Embedded World 2024, which takes place in Nuremberg this week, along with a number of other firms. Their most recent IoT innovations are on display at the event, and They’ll be talking about how these MediaTek-powered products help a variety of market sectors.
They will be showcasing the recently released MediaTek Genio 510 SoC in one of their demos. The Genio 510 will offer high-efficiency solutions in AI performance, CPU and graphics, 4K display, rich input/output, and 5G and Wi-Fi 6 connection for popular IoT applications. With the Genio 510 and Genio 700 chips being pin-compatible, product developers may now better segment and diversify their designs for different markets without having to pay for a redesign.
Numerous applications, such as digital menus and table service displays, kiosks, smart home displays, point of sale (PoS) devices, and various advertising and public domain HMI applications, are best suited for the MediaTek Genio 510. Industrial HMI covers ruggedized tablets for smart agriculture, healthcare, EV charging infrastructure, factory automation, transportation, warehousing, and logistics. It also includes ruggedized tablets for commercial and industrial vehicles.
The fully integrated, extensive feature set of Genio 510 makes such diversity possible:
Support for two displays, such as an FHD and 4K display
Modern visual quality support for two cameras built on MediaTek’s tried-and-true technologies
For a wide range of computer vision applications, such as facial recognition, object/people identification, collision warning, driver monitoring, gesture and posture detection, and image segmentation, a powerful multi-core AI processor with a dedicated visual processing engine
Rich input/output for peripherals, such as network connectivity, manufacturing equipment, scanners, card readers, and sensors
4K encoding engine (camera recording) and 4K video decoding (multimedia playback for advertising)
Exceptionally power-efficient 6nm SoC
Ready for MediaTek NeuroPilot AI SDK and multitasking OS (time to market accelerated by familiar development environment)
Support for fanless design and industrial grade temperature operation (-40 to 105C)
10-year supply guarantee (one-stop shop supported by a top semiconductor manufacturer in the world)
To what extent does it surpass the alternatives?
The Genio 510 uses more than 50% less power and provides over 250% more CPU performance than the direct alternative!
The MediaTek Genio 510 is an effective IoT platform designed for Edge AI, interactive retail, smart homes, industrial, and commercial uses. It offers multitasking OS, sophisticated multimedia, extremely rapid edge processing, and more. intended for goods that work well with off-grid power systems and fanless enclosure designs.
EVK MediaTek Genio 510
The highly competent Genio 510 (MT8370) edge-AI IoT platform for smart homes, interactive retail, industrial, and commercial applications comes with an evaluation kit called the MediaTek Genio 510 EVK. It offers many multitasking operating systems, a variety of networking choices, very responsive edge processing, and sophisticated multimedia capabilities.
SoC: MediaTek Genio 510
This Edge AI platform, which was created utilising an incredibly efficient 6nm technology, combines an integrated APU (AI processor), DSP, Arm Mali-G57 MC2 GPU, and six cores (2×2.2 GHz Arm Cortex-A78& 4×2.0 GHz Arm Cortex-A55) into a single chip. Video recorded with attached cameras can be converted at up to Full HD resolution while using the least amount of space possible thanks to a HEVC encoding acceleration engine.
FAQS
What is the MediaTek Genio 510?
A chipset intended for a broad spectrum of Internet of Things (IoT) applications is the Genio 510.
What kind of IoT applications is the Genio 510 suited for?
Because of its adaptability, the Genio 510 may be utilised in a wide range of applications, including smart homes, healthcare, transportation, and agriculture, as well as industrial automation (rugged tablets, manufacturing machinery, and point-of-sale systems).
What are the benefits of using the Genio 510?
Rich input/output choices, powerful CPU and graphics processing, compatibility for 4K screens, high-efficiency AI performance, and networking capabilities like 5G and Wi-Fi 6 are all included with the Genio 510.
Read more on Govindhtech.com
2 notes · View notes
govindhtech · 13 days
Text
Amazon SageMaker HyperPod Presents Amazon EKS Support
Tumblr media
Amazon SageMaker HyperPod
Cut the training duration of foundation models by up to 40% and scale effectively across over a thousand AI accelerators.
We are happy to inform you today that Amazon SageMaker HyperPod, a specially designed infrastructure with robustness at its core, will enable Amazon Elastic Kubernetes Service (EKS) for foundation model (FM) development. With this new feature, users can use EKS to orchestrate HyperPod clusters, combining the strength of Kubernetes with the robust environment of Amazon SageMaker HyperPod, which is ideal for training big models. By effectively scaling across over a thousand artificial intelligence (AI) accelerators, Amazon SageMaker HyperPod can save up to 40% of training time.
- Advertisement -
SageMaker HyperPod: What is it?
The undifferentiated heavy lifting associated with developing and refining machine learning (ML) infrastructure is eliminated by Amazon SageMaker HyperPod. Workloads can be executed in parallel for better model performance because it is pre-configured with SageMaker’s distributed training libraries, which automatically divide training workloads over more than a thousand AI accelerators. SageMaker HyperPod occasionally saves checkpoints to guarantee your FM training continues uninterrupted.
You no longer need to actively oversee this process because it automatically recognizes hardware failure when it occurs, fixes or replaces the problematic instance, and continues training from the most recent checkpoint that was saved. Up to 40% less training time is required thanks to the robust environment, which enables you to train models in a distributed context without interruption for weeks or months at a time. The high degree of customization offered by SageMaker HyperPod enables you to share compute capacity amongst various workloads, from large-scale training to inference, and to run and scale FM tasks effectively.
Advantages of the Amazon SageMaker HyperPod
Distributed training with a focus on efficiency for big training clusters
Because Amazon SageMaker HyperPod comes preconfigured with Amazon SageMaker distributed training libraries, you can expand training workloads more effectively by automatically dividing your models and training datasets across AWS cluster instances.
Optimum use of the cluster’s memory, processing power, and networking infrastructure
Using two strategies, data parallelism and model parallelism, Amazon SageMaker distributed training library optimizes your training task for AWS network architecture and cluster topology. Model parallelism divides models that are too big to fit on one GPU into smaller pieces, which are then divided among several GPUs for training. To increase training speed, data parallelism divides huge datasets into smaller ones for concurrent training.
- Advertisement -
Robust training environment with no disruptions
You can train FMs continuously for months on end with SageMaker HyperPod because it automatically detects, diagnoses, and recovers from problems, creating a more resilient training environment.
Customers may now use a Kubernetes-based interface to manage their clusters using Amazon SageMaker HyperPod. This connection makes it possible to switch between Slurm and Amazon EKS with ease in order to optimize different workloads, including as inference, experimentation, training, and fine-tuning. Comprehensive monitoring capabilities are provided by the CloudWatch Observability EKS add-on, which offers insights into low-level node metrics on a single dashboard, including CPU, network, disk, and other. This improved observability includes data on container-specific use, node-level metrics, pod-level performance, and resource utilization for the entire cluster, which makes troubleshooting and optimization more effective.
Since its launch at re:Invent 2023, Amazon SageMaker HyperPod has established itself as the go-to option for businesses and startups using AI to effectively train and implement large-scale models. The distributed training libraries from SageMaker, which include Model Parallel and Data Parallel software optimizations to assist cut training time by up to 20%, are compatible with it. With SageMaker HyperPod, data scientists may train models for weeks or months at a time without interruption since it automatically identifies, fixes, or replaces malfunctioning instances. This frees up data scientists to concentrate on developing models instead of overseeing infrastructure.
Because of its scalability and abundance of open-source tooling, Kubernetes has gained popularity for machine learning (ML) workloads. These benefits are leveraged in the integration of Amazon EKS with Amazon SageMaker HyperPod. When developing applications including those needed for generative AI use cases organizations frequently rely on Kubernetes because it enables the reuse of capabilities across environments while adhering to compliance and governance norms. Customers may now scale and maximize resource utilization across over a thousand AI accelerators thanks to today’s news. This flexibility improves the workflows for FM training and inference, containerized app management, and developers.
With comprehensive health checks, automated node recovery, and work auto-resume features, Amazon EKS support in Amazon SageMaker HyperPod fortifies resilience and guarantees continuous training for big-ticket and/or protracted jobs. Although clients can use their own CLI tools, the optional HyperPod CLI, built for Kubernetes settings, can streamline job administration. Advanced observability is made possible by integration with Amazon CloudWatch Container Insights, which offers more in-depth information on the health, utilization, and performance of clusters. Furthermore, data scientists can automate machine learning operations with platforms like Kubeflow. A reliable solution for experiment monitoring and model maintenance is offered by the integration, which also incorporates Amazon SageMaker managed MLflow.
In summary, the HyperPod service fully manages the HyperPod service-generated Amazon SageMaker HyperPod cluster, eliminating the need for undifferentiated heavy lifting in the process of constructing and optimizing machine learning infrastructure. This cluster is built by the cloud admin via the HyperPod cluster API. These HyperPod nodes are orchestrated by Amazon EKS in a manner akin to that of Slurm, giving users a recognizable Kubernetes-based administrator experience.
Important information
The following are some essential details regarding Amazon EKS support in the Amazon SageMaker HyperPod:
Resilient Environment: With comprehensive health checks, automated node recovery, and work auto-resume, this integration offers a more resilient training environment. With SageMaker HyperPod, you may train foundation models continuously for weeks or months at a time without interruption since it automatically finds, diagnoses, and fixes errors. This can result in a 40% reduction in training time.
Improved GPU Observability: Your containerized apps and microservices can benefit from comprehensive metrics and logs from Amazon CloudWatch Container Insights. This makes it possible to monitor cluster health and performance in great detail.
Scientist-Friendly Tool: This release includes interaction with SageMaker Managed MLflow for experiment tracking, a customized HyperPod CLI for job management, Kubeflow Training Operators for distributed training, and Kueue for scheduling. Additionally, it is compatible with the distributed training libraries offered by SageMaker, which offer data parallel and model parallel optimizations to drastically cut down on training time. Large model training is made effective and continuous by these libraries and auto-resumption of jobs.
Flexible Resource Utilization: This integration improves the scalability of FM workloads and the developer experience. Computational resources can be effectively shared by data scientists for both training and inference operations. You can use your own tools for job submission, queuing, and monitoring, and you can use your current Amazon EKS clusters or build new ones and tie them to HyperPod compute.
Read more on govindhtech.com
0 notes
govindhtech · 25 days
Text
IBM And Intel Introduce Gaudi 3 AI Accelerators On IBM Cloud
Tumblr media
Cloud-Based Enterprise AI from Intel and IBM. To assist businesses in scaling AI, Intel and IBM will implement Gaudi 3 AI accelerators on IBM Cloud.
Gaudi 3 AI Accelerator
The worldwide deployment of Intel Gaudi 3 AI accelerators as a service on IBM Cloud is the result of an announcement made by IBM and Intel. Anticipated for release in early 2025, this product seeks to support corporate AI scalability more economically and foster creativity supported by security and resilience.
Support for Gaudi 3 will also be possible because to this partnership with IBM’s Watsonx AI and analytics platform. The first cloud service provider (CSP) to use Gaudi 3 is IBM Cloud, and the product will be offered for on-premises and hybrid setups.
Intel and IBM
“AI’s true potential requires an open, cooperative environment that gives customers alternatives and solutions. are generating new AI capabilities and satisfying the need for reasonably priced, safe, and cutting-edge AI computing solutions by fusing Xeon CPUs and Gaudi 3 AI accelerators with IBM Cloud.
Why This Is Important: Although generative AI may speed up transformation, the amount of computational power needed highlights how important it is for businesses to prioritize availability, performance, cost, energy efficiency, and security. By working together, Intel and IBM want to improve performance while reducing the total cost of ownership for using and scaling AI.
Gaudi 3
Gaudi 3’s integration with 5th generation Xeon simplifies workload and application management by supporting corporate AI workloads in data centers and the cloud. It also gives clients insight and control over their software stack. Performance, security, and resilience are given first priority as clients expand corporate AI workloads more affordably with the aid of IBM Cloud and Gaudi 3.
IBM’s Watsonx AI and data platform will support Gaudi 3 to improve model inferencing price/performance. This will give Watsonx clients access to extra AI infrastructure resources for scaling their AI workloads across hybrid cloud environments.
“IBM is dedicated to supporting Intel customers in driving innovation in AI and hybrid cloud by providing solutions that address their business demands. According to Alan Peacock, general manager of IBM Cloud, “Intel commitment to security and resilience with IBM Cloud has helped fuel IBM’s hybrid cloud and AI strategy for Intel enterprise clients.”
Intel Gaudi 3 AI Accelerator
“The clients will have access to a flexible enterprise AI solution that aims to optimize cost performance by utilizing IBM Cloud and Intel’s Gaudi 3 accelerators.” They are making possible new AI business prospects available to their customers so they may test, develop, and implement AI inferencing solutions more affordably.
IBM and Intel
How It Works: IBM and Intel are working together to provide customers using AI a Gaudi 3 service capability. IBM and Intel want to use IBM Cloud’s security and compliance features to assist customers in a variety of sectors, including highly regulated ones.
Scalability and Flexibility: Clients may modify computing resources as required with the help of scalable and flexible solutions from IBM Cloud and Intel, which may result in cost savings and improved operational effectiveness.
Improved Security and Performance: By integrating Gaudi 3 with IBM Cloud Virtual Servers for VPC, x86-based businesses will be able to execute applications more securely and quickly than they could have before, which will improve user experiences.
What’s Next: Intel and IBM have a long history of working together, starting with the IBM PC and continuing with Gaudi 3 to create corporate AI solutions. General availability of IBM Cloud with Gaudi 3 products is scheduled for early 2025. In the next months, stay out for additional developments from IBM and Intel.
Intel Gaudi 3: The Distinguishing AI
Introducing your new, high-performing choice for every kind of workplace AI task.
An Improved Method for Using Enterprise AI
The Intel Gaudi 3 AI accelerators are designed to withstand rigorous training and inference tasks. They are based on the high-efficiency Intel Gaudi platform, which has shown MLPerf benchmark performance.
Support AI workloads from node to mega cluster in your data center or in the cloud, all running on Ethernet equipment you probably already possess. Intel Gaudi 3 may be crucial to the success of any AI project, regardless of how many accelerators you require one or hundreds.
Developed to Meet AI’s Real-World Needs
With the help of industry-standard Ethernet networking and open, community-based software, you can grow systems more flexibly thanks to the Intel Gaudi 3 AI accelerators.
Adopt Easily
Whether you are beginning from scratch, optimizing pre-made models, or switching from a GPU-based method, using Intel Gaudi 3 AI accelerators is easy.
Designed with developers in mind: To quickly catch up, make use of developer resources and software tools.
Encouragement of Both New and Old Models: Use open source tools, such as Hugging Face resources, to modify reference models, create new ones, or migrate old ones.
Included PyTorch: Continue using the library that your team is already familiar with.
Simple Translation of Models Based on GPUs: With the help of their specially designed software tools, quickly transfer your current solutions.
Ease Development from Start to Finish
Take less time to get from proof of concept to manufacturing. Intel Gaudi 3 AI Accelerators are backed by a robust suite of software tools, resources, and training from migration to implementation. Find out what resources are available to make your AI endeavors easier.
Scale Without Effort: Integrate AI into everyday life. The goal of the Intel Gaudi 3 AI Accelerators is to provide even the biggest and most complicated installations with straightforward, affordable AI scaling.
Increased I/O: Benefit from 33 percent greater I/O connection per accelerator than in H100,4 to allow for huge scale-up and scale-out while maintaining optimal cost effectiveness.
Constructed for Ethernet: Utilize the networking infrastructure you currently have and use conventional Ethernet gear to accommodate growing demands.
Open: Steer clear of hazardous investments in proprietary, locked technologies like NVSwitch, InfiniBand, and NVLink.
Boost Your AI Use Case: Realize the extraordinary on any scale. Modern generative AI and LLMs are supported by Intel Gaudi 3 AI accelerators in the data center. These accelerators work in tandem with Intel Xeon processors, the preferred host CPU for cutting-edge AI systems, to provide enterprise performance and dependability.
Read more on govindhtech.com
0 notes
govindhtech · 26 days
Text
AMD Infinity Guard, BeeKeeperAI Collaborate Secret Computing
Tumblr media
AMD Infinity Guard
The prevalence of ransomware attacks and data breaches in recent years has made it difficult for important business sectors to collaborate. Reports state that organizations are unable to work with suppliers who are attempting to develop potentially ground-breaking apps or discoveries due to the risk posed by threat actors. To keep up the strict restrictions necessary for specific data sets, some businesses don’t even share data within. Researchers’ failure to obtain vital data impedes their capacity to conduct significant study in a number of fields, including government, banking, and healthcare.
Healthcare AI acceleration via a safe platform for algorithm creators and data custodians to collaborate
Data is Never Exchanged or Viewed
The data steward’s safe, HIPAA-compliant environment is where the data is never removed.
Processing Real-World and Protected Data
Employs primary data, which comes directly from the source, as opposed to artificial or de-identified data. Every time, the data is encrypted.
Never Is Intellectual Property Seen or Shared
The algorithm is always encrypted, both when it is uploaded to EscrowAI and when it is moving through the container to the data steward and inside the protected environment of the data steward.
Technology with Secure Enclaves
EscrowAI uses secure enclave technology to reduce the possibility of algorithm IP questioning and data exfiltration during computing.
Matchmaker and intermediary
BeeKeeperAI reduces the time, effort, and expenses of data projects by more than 50% by serving as a matchmaker and broker between data stewards and algorithm developers.
Alan Czeszynski, an expert in the security industry and the marketing and product development leader at BeeKeeperAI, was gracious enough to join me on the AMD EPYC TechTalk podcast series following the Confidential Computing Summit industry gathering in San Francisco. They talked about the state of security and how there has never been a greater need for better hardware and software safeguards.
BeeKeeperAI
EscrowAI, a technology that combines private and confidential computing technology to allow software developers, data scientists, and data owners to collaborate in trusted execution environments (TEE), is utilized by San Francisco-based BeeKeeperAI.
The technology of BeeKeeperAI ensures that an owner always has control over their data. In addition to offering end-to-end encryption and algorithmic and model encryption to safeguard intellectual property, BeeKeeper also applies the algorithm to the data. The business establishes a TEE in a cloud data storage environment after an algorithm is prepared to run against data. Consequently, the data is cut off from all stakeholders, including BeeKeeperAI, the cloud service provider, the data owner, and the owner of the algorithm.
Nobody can see what goes on within the TEE; everyone can only access the output to which they are legally permitted.
“Bring these parties together to enable development and testing of artificial intelligence and machine learning models,” according to Alan, is made possible by BeeKeeper’s secure environment.
Big large language models (LLMs) and generative AI have gained popularity, and as a result, businesses are now more conscious of the need to secure AI, according to Alan. Protecting every stage of the AI and machine learning lifecycle has received a lot of attention lately. According to Alan, this is one of the reasons private computing is starting to get a lot of traction.
Alan warns that legacy security solutions might not provide enough protection in the AI era. The problem with LLMs is that they essentially turn into enormous repositories of all your secrets if you wish to locally train them on your own data,” he continued.
While CISOs and IT administrators prioritize data protection, business managers and data scientists frequently place greater importance on obtaining the data required to develop models that improve the company. Alan claimed that it is far too common for the procedure of obtaining private, protected data to be difficult, costly, and time-consuming. He described a few of the intricate details.
It is usually necessary for parties to have detailed, extremely formal data-use agreements in place. There are often several restrictions on how the data can be interacted with. Audits have to be done, and they always have to. BeeKeeperAI eliminates the effort by offering a technical answer to many of these security challenges.
“Their goal is to eliminate that from the end user and basically take it upon selves,” Alan stated. “The platform then allows the true value, which is basically secure collaboration, getting access to the data, developing your models, being able to execute your AI, ML lifecycle in a secure environment.”
Alan acknowledged that the security features incorporated by AMD EPYC CPUs had strengthened BeeKeeperAI’s offerings. AMD Infinity Guard includes these technologies, such as Secure Encrypted Virtualization and Secure Nested Paging, or SEV-SNP. They prevent the contents of a virtual machine’s memory from being accessed by other VMs operating on the same system or the server they are operating on.
Alan also mentioned adaptability, which is another significant advantage of AMD EPYC. AMD have to provide [clients] a variety of possible platforms, and EPYC is a fantastic one,” said Alan. “In those situations, the safe paging feature of encrypted virtualization and secret containers or virtual machines based on the EPYC CPU is quite advantageous. One of the main advantages of utilizing EPYC processors is that algorithm developers no longer have to adhere to any certain OS type thanks to this lift-and-shift technique.”
Read more on govindhtech.com
0 notes
govindhtech · 4 months
Text
Next-Gen Dell PowerEdge XE9680 for Heavy Workloads
Tumblr media
Exciting Dell-Intel AI Updates from Dell Technologies World: The second part of 2024 will see Dell PowerEdge XE9680 servers with Intel Gaudi 3 AI accelerators.
At this year’s Intel Vision conference, Intel CEO Pat Gelsinger introduced the Gaudi 3 AI accelerator. Michael Dell attended Pat’s keynote via livestream to announce that Dell Technologies will release the Intel Gaudi 3 AI Accelerator on the Dell PowerEdge XE9680 server later this year.
Dell PowerEdge XE9680
Dell Technologies’ Infrastructure Solutions Group President Arthur Lewis reiterated this in his keynote talk at Dell Technologies World last week.
Dell PowerEdge XE9680 will support Gaudi 3 in the second half of 2024. Before launch, customers can seek quotes from Dell and Dell partners in late June.
With its improved AI acceleration and variety of GPU options, the Dell PowerEdge XE9680 server, which debuted last year, has set the benchmark for purpose-built AI and GenAI computation. Gaudi 3 will improve Dell and Intel’s collaboration by meeting customer infrastructure demands, lowering total cost of ownership, and simplifying deployment when it joins Dell’s server lineup later this year.
This is made possible by an open ecosystem that includes scalable Ethernet-based AI fabrics coupled with AI frameworks tailored for Dell and Gaudi 3.
In order to enable a restricted group of customers to evaluate Intel AI solutions on Dell hardware before implementing them in their data centres, Intel will add Dell PowerEdge XE9680 servers with Intel AI Accelerators to the Intel Developer Cloud prior to the general release of Gaudi 3.
For additional details regarding Dell Technologies’ announcement that the PowerEdge XE9680 series will now include the Gaudi 3.
Best AI session summaries
Check out these featured AI sessions from Dell Technologies World 2024, led by experts. AI tools were used to distil the recap content.
Resolving the Data Conundrum in the AI Era
Discover the advantages of AI with Dell Technologies, Deloitte, and ServiceNow. Learn about data governance, unstructured data, and the moral application of AI.
Adopting a Human First Perspective for AI Innovation
Get advice on promoting AI innovation that is focused on people from Dell Technologies, McLaren Racing, EY, and the City of Amarillo.
How to Use Sustainable IT Strategies to Future Proof Your Business
Discover how AI and sustainability can work together as senior executives from Computacenter, Creative Strategies, and Dell share their go-to tactics.
High-performance servers like the Dell PowerEdge XE9680 are made to withstand the rigorous demands of contemporary data centres, especially those with applications including high-performance computing (HPC), machine learning (ML), and artificial intelligence (AI).
Below are the main attributes and details of the Dell PowerEdge XE9680
Important characteristics
High Compute Density: Supports up to eight NVIDIA A100 GPUs for AI and HPC computation.
Processor: Latest Intel Xeon Scalable CPUs provide powerful processing and support for huge memory configurations.
Memory: Some configurations allow terabytes of RAM for data-intensive applications.
Storage: Offers NVMe, SSDs, and HDDs for various performance and capacity needs. With many 100GbE ports and other high-speed networking options, can support large-scale computation data flow.
Cooling and Power Efficiency: State-of-the-art cooling techniques to guarantee peak performance and dependability, along with effective power control to minimise running expenses.
XE9680 spec sheet
Form Factor: Server rack, 4U
GPUs: Eight NVIDIA A100 GPUs maximum
CPUs: The most recent generation of Intel Xeon Scalable processors
Memory: Multiple terabyte capacities are supported, including DDR4 or DDR5 support.
Storage: NVMe, SSD, and HDD are supported by several disc bays.
Networking: A number of fast network connectors, including the expansion of 100GbE PCIe slots are available for adding more expansion cards.
Use cases for machine learning and artificial intelligence
Because of its strong GPU capabilities, the XE9680 is perfect for workloads including training and inference in AI and ML.
High-Performance Computing: Appropriate for data analytics, scientific simulations, and other HPC applications needing substantial processing power.
Data Analytics: Able to manage complicated query execution and real-time data processing, among other large-scale data analytics operations.
Virtualization is a flexible option for combining several workloads on a single server since it supports virtualized environments.
Management and Security: For remote monitoring and management, use the Integrated Dell Remote Access Controller (iDRAC).
Hardware and firmware security mechanisms are included to prevent unwanted access and guarantee data integrity.
PowerEdge XE9680
High-performance servers like the Dell PowerEdge XE9680 are made to last in today’s data centres, especially for applications involving machine learning (ML), artificial intelligence (AI), and high-performance computing (HPC). The Dell PowerEdge XE9680 offers the following main advantages:
High Performance CPU Power: The PowerEdge XE9680’s potent CPUs offer remarkable processing power that can effectively handle demanding applications.
GPU Acceleration: The multiple GPU support on this server greatly speeds up work related to AI, ML, and data analytics. For jobs requiring parallel processing capability, it’s perfect.
Equilibrium
Modular Design: Organisations may easily scale their servers by adding more processing power or storage as needed thanks to the server’s modular design.
Flexible Configurations: Businesses can customise the server to meet unique workload requirements thanks to support for a variety of configurations.
Improved Storage
High space: With a significant amount of storage space, the XE9680 is a good choice for applications that require a lot of data. Large-scale AI model training, databases, and big data analytics all depend on this.
High-speed Storage Options: The server guarantees quick data access and transfer rates by supporting NVMe SSDs and other high-speed storage technologies.
Expert Networking
High-bandwidth connectivity: In order to facilitate real-time data processing and analysis, the server is built to accommodate high-speed networking solutions. This guarantees quick data transfer and minimal latency.
Multiple Network Interfaces: It provides a number of network interfaces for load balancing and redundancy, which improves the performance and stability of the network.
The Efficiency of Energy
Optimised Power Usage: By reducing power consumption without sacrificing performance, Dell has integrated energy-efficient technologies into the XE9680, helping to cut operational costs.
Characteristics of Sustainability: It is equipped with features that lessen the carbon impact of data centre operations, all with an eye towards sustainability.
Trustworthiness and Safety
Robust Design: With features like redundant power supplies and cooling systems to guarantee uptime, the server is designed to be extremely dependable.
Advanced Security: Dell’s security features, which include secure boot, silicon-based root of trust, and system lockdown capabilities, shield infrastructure and data from online attacks.
Management Ease
Easy-to-use Management Tools: Dell offers a range of tools that make monitoring, maintenance, and server deployment easier. One such product is the Dell EMC OpenManage line, which provides all-inclusive infrastructure management.
Remote Management: Administrators may control the server from any location with integrated remote management capabilities, which eliminates the need for on-site assistance and streamlines processes.
Services and Support
Expert Support: To guarantee that the server stays functional and effective, Dell provides a wide range of support services, including proactive maintenance and prompt issue resolution.
Training and Resources: To assist IT workers in efficiently managing and utilising the server, Dell offers a range of resources, such as training courses and manuals.
In conclusion, the Dell PowerEdge XE9680 is a strong and adaptable server made to fulfil the most exacting performance requirements of contemporary data centres. It is appropriate for a wide range of demanding applications, from AI and ML to HPC and beyond, thanks to its support for numerous high-end GPUs, current generation CPUs, and an abundance of memory and storage options.
Read more on govindhtech.com
0 notes