#intelxeonprocessors
Explore tagged Tumblr posts
govindhtech · 6 months ago
Text
New Oracle Roving Edge Device Revolutionizes AI At The Edge
Tumblr media
Oracle’s second-generation Roving Edge Device (RED), the newest product in the company’s e Edge Cloud line, offers exceptional processing power, smooth connection, and integrated security at network edges and in remote areas. Numerous workloads, including corporate applications, AI, and certain OCI services, may be operated at the edge with Roving Edge Device(RED) because to its easy deployment, great price-performance, and better security, which includes the option to run isolated or air-gapped.
What is Oracle Roving Edge Device?
OCI cloud computing and storage are made possible at network edges and in remote areas via the Oracle Roving Edge Device (RED), a portable hardware platform with cloud-integrated services.
The second version of the Roving Edge Device builds upon a strong foundation created to satisfy the requirements of security applications. It not only improves basic capabilities but also adds customized configurations to satisfy business needs in a range of sectors.Image Credit To Intel
Introducing the second generation Oracle RED
With more OCPUs, RAM, storage, and better GPU performance, the second iteration of the Oracle Roving Edge Device offers significant improvements over the first generation.Image Credit To Intel
RED is available in three configurations to suit diverse business
Where can you deploy Oracle Roving Edge Device?
In your data center or at the edge, RED offers the same OCI development and deployment methods, selectable OCI services, and CPU and GPU forms. This drives development in many industries and technical advancement in today’s fast-paced commercial environment. Due to its unrivaled processing power, smooth connectivity, and steadfast security, the Roving Edge Device is ideal for cutting-edge applications that need speed, dependability, and efficiency.
Improvements in performance with the second-generation RED
Milliseconds matter in the fast-paced world of artificial intelligence. Imagine a future in which your network’s borders are infinite and its edge gets exponentially more intelligent. Customers now have more deployment options with to Oracle Roving Edge Device 2nd Generation (RED), which has a new GPU-optimized configuration with compute- and storage-optimized configuration.
Customers gain from low-latency processing nearer the point of data production and ingestion by utilizing the Intel Xeon 8480+ processor’s capability at the edge, which leads to more timely insights into their data. Oracle and Intel collaborated to perform a number of benchmarks over the first-generation RED in order to test this capability.
It used only Intel Xeon processors to run the Llama 2-7B, Yolov10 model, and Resnet50 convolutional neural network (CNN) for the testing. The Intel Xeon 6230T-based first-generation Roving Edge Device is compared to the Intel Xeon 8480+-based second-generation using the following benchmarks:
Deploying Llama2-7B on RED
An autoregressive, transformer architecture serves as the foundation for the Llama 2 family of pre-trained and optimized text generation models. Three models with seven billion, thirteen billion, and seventy billion parameters are included with Llama 2. Oracle benchmarked the Llama 2 7 billion parameter model for this simulation.
Using the Llama 2-7B model, the second-generation Roving Edge Device may achieve response rates up to 13.6 times faster than RED Gen 1, allowing for lightning-fast performance for edge-based large language model (LLM) inferencing. Enhancement of Throughput Intel Xeon 8480+ Processor.
Using the Llama2-7B paradigm, the RED Gen 2 may achieve up to 12.4 times higher throughput, greatly increasing the edge’s capacity for processing LLM data.
YOLO v10
Real-time object identification and precise, low-latency object category and location prediction in pictures were the goals of the YOLO family of models. Oracle compared using the YOLO v10 model on the two versions of Roving Edge Device in this set of benchmarks.
Up to 60% more performance may be achieved by the new RED generation than by the old one. YOLO v10 increased throughput by 67%.
ResNet-50
A convolutional neural network (CNN) architecture called ResNet-50 is a member of the Residual Networks (ResNet) family, a group of models created to tackle the difficulties involved in deep neural network training. Renowned for its depth and effectiveness in image classification tasks, ResNet-50 was created by researchers at Microsoft Research Asia. There are several levels of ResNet topologies, including ResNet-18 and ResNet-32, with ResNet-50 being a mid-sized version.
Using the ResNet 50 CNN, the second generation achieves a response rate that is up to three times higher than the first.
Why deploy with Oracle Roving Edge Device? 
Oracle Roving Edge Device is the best option if you need to deploy application workloads at the edge and need a scalable, secure, and adaptable platform with the advantages of cloud computing and cost-effectiveness. Built to execute time-sensitive, mission-critical applications at the edge in both connected and unconnected areas, it is a powerful cloud-integrated service.
Getting Started
Oracle Roving Edge Device is the perfect infrastructure for anybody seeking a high-security, low-latency data processing and scalable environment at the edge because of its affordable, adaptable configurations and capacity to serve computing, storage, and GPU-intensive applications.
Read more on govindhtech.com
0 notes
govindhtech · 8 months ago
Text
5th Gen Intel Xeon Scalable Processors Boost SQL Server 2022
Tumblr media
5th Gen Intel Xeon Scalable Processors
While speed and scalability have always been essential to databases, contemporary databases also need to serve AI and ML applications at higher performance levels. Real-time decision-making, which is now far more widespread, should be made possible by databases together with increasingly faster searches. Databases and the infrastructure that powers them are usually the first business goals that need to be modernized in order to support analytics. The substantial speed benefits of utilizing 5th Gen Intel Xeon Scalable Processors to run SQL Server 2022 will be demonstrated in this post.
OLTP/OLAP Performance Improvements with 5th gen Intel Xeon Scalable processors
The HammerDB benchmark uses New Orders per minute (NOPM) throughput to quantify OLTP. Figure 1 illustrates performance gains of up to 48.1% NOPM Online Analytical Processing when comparing 5th Gen Intel Xeon processors to 4th Gen Intel Xeon processors, while displays up to 50.6% faster queries.
The enhanced CPU efficiency of the 5th gen Intel Xeon processors, demonstrated by its 83% OLTP and 75% OLAP utilization, is another advantage. When compared to the 5th generation of Intel Xeon processors, the prior generation requires 16% more CPU resources for the OLTP workload and 13% more for the OLAP workload.
The Value of Faster Backups
Faster backups improve uptime, simplify data administration, and enhance security, among other things. Up to 2.72x and 3.42 quicker backups for idle and peak loads, respectively, are possible when running SQL Server 2022 Enterprise Edition on an Intel Xeon Platinum processor when using Intel QAT.
The reason for the longest Intel QAT values for 5th Gen Intel Xeon Scalable Processors is because the Gold version includes less backup cores than the Platinum model, which provides some perspective for the comparisons.
With an emphasis on attaining near-real-time latencies, optimizing query speed, and delivering the full potential of scalable warehouse systems, SQL Server 2022 offers a number of new features. It’s even better when it runs on 5th gen Intel Xeon Processors.
Solution snapshot for SQL Server 2022 running on 4th generation Intel Xeon Scalable CPUs. performance, security, and current data platform that lead the industry.
SQL Server 2022
The performance and dependability of 5th Gen Intel Xeon Scalable Processors, which are well known, can greatly increase your SQL Server 2022 database.
The following tutorial will examine crucial elements and tactics to maximize your setup:
Hardware Points to Consider
Choose a processor: Choose Intel Xeon with many cores and fast clock speeds. Choose models with Intel Turbo Boost and Intel Hyper-Threading Technology for greater performance.
Memory: Have enough RAM for your database size and workload. Sufficient RAM enhances query performance and lowers disk I/O.
Storage: To reduce I/O bottlenecks, choose high-performance storage options like SSDs or fast HDDs with RAID setups.
Modification of Software
Database Design: Make sure your query execution plans, indexes, and database schema are optimized. To guarantee effective data access, evaluate and improve your design on a regular basis.
Configuration Settings: Match your workload and hardware capabilities with the SQL Server 2022 configuration options, such as maximum worker threads, maximum server RAM, and I/O priority.
Query tuning: To find performance bottlenecks and improve queries, use programs like Management Studio or SQL Server Profiler. Think about methods such as parameterization, indexing, and query hints.
Features Exclusive to Intel
Use Intel Turbo Boost Technology to dynamically raise clock speeds for high-demanding tasks.
With Intel Hyper-Threading Technology, you may run many threads on a single core, which improves performance.
Intel QuickAssist Technology (QAT): Enhance database performance by speeding up encryption and compression/decompression operations.
Optimization of Workload
Workload balancing: To prevent resource congestion, divide workloads among several instances or servers.
Partitioning: To improve efficiency and management, split up huge tables into smaller sections.
Indexing: To expedite the retrieval of data, create the proper indexes. Columnstore indexes are a good option for workloads involving analysis.
Observation and Adjustment
Performance monitoring: Track key performance indicators (KPIs) and pinpoint areas for improvement with tools like SQL Server Performance Monitor.
Frequent Tuning: Keep an eye on and adjust your database on a regular basis to accommodate shifting hardware requirements and workloads.
SQL Server 2022 Pricing
SQL Server 2022 cost depends on edition and licensing model. SQL Server 2022 has three main editions:
SQL Server 2022 Standard
Description: For small to medium organizations with minimal database functions for data and application management.
Licensing
Cost per core: ~$3,586.
Server + CAL (Client Access License): ~$931 per server, ~$209 per CAL.
Basic data management, analytics, reporting, integration, and little virtualization.
SQL Server 2022 Enterprise
Designed for large companies with significant workloads, extensive features, and scalability and performance needs.
Licensing
Cost per core: ~$13,748.
High-availability, in-memory performance, business intelligence, machine learning, and infinite virtualization.
SQL Server 2022 Express
Use: Free, lightweight edition for tiny applications, learning, and testing.
License: Free.
Features: Basic capability, 10 GB databases, restricted memory and CPU.
Models for licensing
Per Core: Recommended for big, high-demand situations with processor core-based licensing.
Server + CAL (Client Access License): For smaller environments, each server needs a license and each connecting user/device needs a CAL.
In brief
Faster databases can help firms meet their technical and business objectives because they are the main engines for analytics and transactions. Greater business continuity may result from those databases’ faster backups.
Read more on govindhtech.com
0 notes
govindhtech · 8 months ago
Text
Intel Xeon 6 Server Platforms From MSI And GIGABYTE
Tumblr media
Intel Xeon 6 Processors
Leading worldwide server manufacturer MSI unveiled its newest server solutions today, using Intel Xeon 6 processors with Performance Cores (P-cores). These new devices, which are designed to satisfy the various needs of data center workloads, provide compute-intensive jobs exceptional performance.
The newest Intel Xeon processors server solutions from MSI, satisfy a variety of performance and efficiency needs by providing excellent performance throughout a wide range of workloads.
P-core Intel Xeon 6 processors are designed to provide the lowest total cost of ownership (TCO) for general-purpose and high-core computational applications that need great performance. Intel Xeon 6 processors have 128 cores, great memory bandwidth, and sophisticated I/O, which give them the power to handle even the most difficult AI, HPC, and data analytics tasks.
Based on the OCP Datacenter Modular Hardware System (DC-MHS) design, MSI has developed new server systems. These new servers are powered by Intel Xeon 6 processors with P-cores and feature DC-SCM hardware management cards for modular server management and streamlined OpenBMC firmware development. They are perfect for modern cloud service providers and data centers to provide management flexibility and cooling efficiency.
Using a single Intel Xeon 6 processor, the D3071 DC-MHS M-DNO Type-2 Host Processor Module serves as the foundation for two SKUs from MSI designed for high-density core-compute servers: the 2U 2-node CD270-S3071-X2 and the 3U 2-node CD370-S3071-X2 series. These systems are built to handle TDP 500W CPUs with air cooling and are optimized for compute-intensive tasks. They also aim to provide optimal thermal performance.
In order to satisfy the needs of organizations both now and in the future, MSI offers a wide choice of server platforms with Intel Xeon 6 processors, from high-performance AI solutions to cloud-scalable, energy-efficient systems.
Intel Xeon 6 price
Intel Xeon 6 prices vary depending on core count, clock speed, cache size, and model. Prices vary with market circumstances and availability.
Prices in general:
Entry-Level: $300–$1000. These CPUs perform well for general-purpose and light activities.
The mid-range is $1000 to $3000. These CPUs are good for data center and high-performance computing.
High-End: Prices surpass $3000, typically several thousand. These CPUs are optimized for large-scale data analytics, AI, and scientific simulations.
Performance Optimized Intel Xeon 6900-series Servers with P-core for AI, Cloud Computing, Edge & IOT by GIGABYTE
The first wave of GIGABYTE servers for Intel Xeon 6 Processors with P-cores was unveiled today by Giga Computing, a GIGABYTE company and pioneer in the market for generative AI systems and cutting-edge cooling solutions. The new Intel Xeon platform is designed to maximize per-core performance for general purpose applications and workloads that are heavy on computation and artificial intelligence. For certain workloads, GIGABYTE servers are designed to maximize performance by customizing the server architecture to match the chip design and particular workloads.
Intel Xeon 6900
Intel Xeon 6900-series CPUs with P-cores that have up to 128 cores and up to 96 PCIe Gen5 lanes are supported by all new GIGABYTE servers. Furthermore, the 6900-series offers up to 64 lanes of CXL 2.0 and expandable to 12 channel memory for improved performance in memory-intensive tasks. All things considered, this modular SOC architecture has a lot of promise since it can use a common platform to execute design that is optimized for economy and speed.
Single-socket, general-purpose servers
R164 Series: Small 1U chassis supporting a dual-slot GPU and a single Intel Xeon 6 CPU. The primary emphasis of server variety is the storage solution, offering 12x 2.5″ bays or 4x 3.5″/2.5″ bays for SAS, NVMe, and SATA disks.
R264 Series: This series, which now includes a 2U chassis, can accommodate up to 4x dual-slot Gen5 GPUs in addition to a single CPU.
Dual-Socket General Purpose Servers
R184 Series: This series offers comparable storage options to the R164 series, but it has a higher compute density distributed over two CPU sockets. As a result, it places less emphasis on the expansion slots and does not support dual-slot GPUs.
R284 Series: Using a 2U chassis, the increased server height allows for the integration of two CPU sockets and two GPU slots. The R284 series of devices offers up to 24x 2.5″ Gen5 drives or 12x 3.5″/2.5″ mixed storage choices.
Servers on the Edge
E264 Series: This series maintains the option to have a 2U form factor in order to accommodate a single CPU and up to four dual-slot GPUs for customers who want a small chassis depth. Compared to general-purpose servers, the server depth has dropped by over 200 mm, yet total compute performance may still be maintained at the expense of fewer storage bays.
E284 series: It is a twin CPU socket design that prioritizes CPU computing. It has two OCP NIC 3.0 ports and six FHHL extension slots, providing a multitude of I/O choices.
Modularized NVIDIA MGX servers
XV24 Line: This new twin socket series, an NVIDIA OVX optimized server architecture, offers extra FHFL slots for NICs and DPUs in addition to supporting up to four NVIDIA L40S GPUs.
Multi-node, high-density servers
H374 Series: This series provides up to eight CPU sockets distributed across four nodes for the highest density of CPU computing capabilities. Each node further has two or six 2.5″ bays and four low-profile expansion slots.
Server-grade Motherboards
MA34 Series: The MA34-CP0 is a high-quality server board with a tonne of I/O choices. It contains four Gen5 expansion ports, one OCP NIC 3.0 slot, and a wealth of MCIO 8i and SlimSAS connections.
In order to fulfill the changing needs of contemporary computing, GIGABYTE keeps pushing the envelope in server innovation. The new servers, which feature unparalleled speed, adaptability, and efficiency, are based on Intel Xeon 6900-series processors and are intended to enable organizations in a variety of industries.
Because AI, cloud computing, and edge technologies are becoming more and more important to sectors, GIGABYTE is dedicated to keeping its server designs up to date with the newest developments in processor technology, so its clients are prepared to face the challenges of the future.
As the Intel Xeon 6 platform’s potential is further explored, GIGABYTE will have systems ready to support a wide range of workloads and vertical markets. Furthermore, GIGABYTE plans to unveil a new server that incorporates Intel Gaudi 3 GPUs in the next months. Targeting AI workloads, especially generative AI inference workloads, is the goal of this scalable and affordable approach. Additionally, patrons may anticipate seeing it at Atlanta’s SC24.
Read more on govindhtech.com
0 notes
govindhtech · 9 months ago
Text
IBM And Intel Introduce Gaudi 3 AI Accelerators On IBM Cloud
Tumblr media
Cloud-Based Enterprise AI from Intel and IBM. To assist businesses in scaling AI, Intel and IBM will implement Gaudi 3 AI accelerators on IBM Cloud.
Gaudi 3 AI Accelerator
The worldwide deployment of Intel Gaudi 3 AI accelerators as a service on IBM Cloud is the result of an announcement made by IBM and Intel. Anticipated for release in early 2025, this product seeks to support corporate AI scalability more economically and foster creativity supported by security and resilience.
Support for Gaudi 3 will also be possible because to this partnership with IBM’s Watsonx AI and analytics platform. The first cloud service provider (CSP) to use Gaudi 3 is IBM Cloud, and the product will be offered for on-premises and hybrid setups.
Intel and IBM
“AI’s true potential requires an open, cooperative environment that gives customers alternatives and solutions. are generating new AI capabilities and satisfying the need for reasonably priced, safe, and cutting-edge AI computing solutions by fusing Xeon CPUs and Gaudi 3 AI accelerators with IBM Cloud.
Why This Is Important: Although generative AI may speed up transformation, the amount of computational power needed highlights how important it is for businesses to prioritize availability, performance, cost, energy efficiency, and security. By working together, Intel and IBM want to improve performance while reducing the total cost of ownership for using and scaling AI.
Gaudi 3
Gaudi 3’s integration with 5th generation Xeon simplifies workload and application management by supporting corporate AI workloads in data centers and the cloud. It also gives clients insight and control over their software stack. Performance, security, and resilience are given first priority as clients expand corporate AI workloads more affordably with the aid of IBM Cloud and Gaudi 3.
IBM’s Watsonx AI and data platform will support Gaudi 3 to improve model inferencing price/performance. This will give Watsonx clients access to extra AI infrastructure resources for scaling their AI workloads across hybrid cloud environments.
“IBM is dedicated to supporting Intel customers in driving innovation in AI and hybrid cloud by providing solutions that address their business demands. According to Alan Peacock, general manager of IBM Cloud, “Intel commitment to security and resilience with IBM Cloud has helped fuel IBM’s hybrid cloud and AI strategy for Intel enterprise clients.”
Intel Gaudi 3 AI Accelerator
“The clients will have access to a flexible enterprise AI solution that aims to optimize cost performance by utilizing IBM Cloud and Intel’s Gaudi 3 accelerators.” They are making possible new AI business prospects available to their customers so they may test, develop, and implement AI inferencing solutions more affordably.
IBM and Intel
How It Works: IBM and Intel are working together to provide customers using AI a Gaudi 3 service capability. IBM and Intel want to use IBM Cloud’s security and compliance features to assist customers in a variety of sectors, including highly regulated ones.
Scalability and Flexibility: Clients may modify computing resources as required with the help of scalable and flexible solutions from IBM Cloud and Intel, which may result in cost savings and improved operational effectiveness.
Improved Security and Performance: By integrating Gaudi 3 with IBM Cloud Virtual Servers for VPC, x86-based businesses will be able to execute applications more securely and quickly than they could have before, which will improve user experiences.
What’s Next: Intel and IBM have a long history of working together, starting with the IBM PC and continuing with Gaudi 3 to create corporate AI solutions. General availability of IBM Cloud with Gaudi 3 products is scheduled for early 2025. In the next months, stay out for additional developments from IBM and Intel.
Intel Gaudi 3: The Distinguishing AI
Introducing your new, high-performing choice for every kind of workplace AI task.
An Improved Method for Using Enterprise AI
The Intel Gaudi 3 AI accelerators are designed to withstand rigorous training and inference tasks. They are based on the high-efficiency Intel Gaudi platform, which has shown MLPerf benchmark performance.
Support AI workloads from node to mega cluster in your data center or in the cloud, all running on Ethernet equipment you probably already possess. Intel Gaudi 3 may be crucial to the success of any AI project, regardless of how many accelerators you require one or hundreds.
Developed to Meet AI’s Real-World Needs
With the help of industry-standard Ethernet networking and open, community-based software, you can grow systems more flexibly thanks to the Intel Gaudi 3 AI accelerators.
Adopt Easily
Whether you are beginning from scratch, optimizing pre-made models, or switching from a GPU-based method, using Intel Gaudi 3 AI accelerators is easy.
Designed with developers in mind: To quickly catch up, make use of developer resources and software tools.
Encouragement of Both New and Old Models: Use open source tools, such as Hugging Face resources, to modify reference models, create new ones, or migrate old ones.
Included PyTorch: Continue using the library that your team is already familiar with.
Simple Translation of Models Based on GPUs: With the help of their specially designed software tools, quickly transfer your current solutions.
Ease Development from Start to Finish
Take less time to get from proof of concept to manufacturing. Intel Gaudi 3 AI Accelerators are backed by a robust suite of software tools, resources, and training from migration to implementation. Find out what resources are available to make your AI endeavors easier.
Scale Without Effort: Integrate AI into everyday life. The goal of the Intel Gaudi 3 AI Accelerators is to provide even the biggest and most complicated installations with straightforward, affordable AI scaling.
Increased I/O: Benefit from 33 percent greater I/O connection per accelerator than in H100,4 to allow for huge scale-up and scale-out while maintaining optimal cost effectiveness.
Constructed for Ethernet: Utilize the networking infrastructure you currently have and use conventional Ethernet gear to accommodate growing demands.
Open: Steer clear of hazardous investments in proprietary, locked technologies like NVSwitch, InfiniBand, and NVLink.
Boost Your AI Use Case: Realize the extraordinary on any scale. Modern generative AI and LLMs are supported by Intel Gaudi 3 AI accelerators in the data center. These accelerators work in tandem with Intel Xeon processors, the preferred host CPU for cutting-edge AI systems, to provide enterprise performance and dependability.
Read more on govindhtech.com
0 notes
govindhtech · 9 months ago
Text
kAI: A Mexican AI Startup, Improves The Everyday Activities
Tumblr media
Mexican AI
kAI, a Mexican AI startup, simplifies and improves the convenience of managing daily tasks.
kAI Meaning
“Künstliche Intelligenz” (German for “Artificial Intelligence”) refers to AI technology, techniques, and systems. The word “kAI” may refer to AI-based solutions that use machine learning, data analysis, and other AI methods to improve or automate activities.
AI startup business kAI is based in the technological center of Mexico and is creating an AI-powered organizing software called kAI Tasks. With the help of this software, users can easily arrange their personal days and focus their efforts on the things that really important. With kAI, creating an agenda takes less than a minute because of artificial intelligence’s intuitive capabilities. WatchOS-based smartwatches, tablets, and smartphones running Android and Apple can all use kAI Tasks.
The Problem
In an environment where there are always fresh assignments and meetings, being productive is crucial. Regrettably, rather of increasing user productivity, existing to-do apps actually decrease it. Either important functionality is missing, the user experience is not straightforward enough, or the system does not support the users’ regular daily chores.
The Resolution
The mobile task management software from kAI makes it simple for end users to plan, schedule, and arrange their workdays. Compared to conventional to-do management apps and tools, this can be completed in a fraction of the time because of artificial intelligence.
Block planning appears on one screen daily when using kAI Tasks.Image Credit To Intel
The following are a few of the benefits and features that make the tool so alluring:
Intelligent task management: kAI provides tailored recommendations and reminders to help you stay on track by learning from end users’ behaviors and preferences.
Easy event planning: Arrange agendas and schedules with ease, freeing you time to concentrate on the important things.
Constant adaptation: The more you use the tool, the more it learns about your requirements and adjusts accordingly, personalizing your everyday experience.
AI Tasks may be tailored to the requirements of the final user
To optimize everyday objectives, kAI Tasks may be used in conjunction with a smartphone or wristwatch. The end user may easily control his or her productivity and maintain organization with this configuration.
By the end of September 2024, kAI hopes to provide additional features including wearables and the creation of a bot for Telegram and WhatsApp, among other things. With the aid of these connections, the business will be able to expand its user base and make everyday job organization easier without requiring the usage of another software.
“The foundation of an excellent lifestyle is personal organization. They are redefining time and task management at kAI. Its modern equipment boosts productivity, well-being, and stress reduction. You may easily accomplish your business and personal objectives with kAI while maintaining the ideal balance in your life. According to Kelvin Perea, CEO of kAI, “All of us can even do more in less time because their company is a part of the Intel Liftoff Program.”
kAI chores, which is compatible with almost all smart devices, makes it simple to arrange daily chores. Task management is made more simpler and more straightforward with the aid and assistance of AI, as the software gradually learns the end user’s behavior.
Are you prepared to further innovate and grow your startup? Enroll in the Intel Liftoff program right now to become a part of a community that is committed to fostering your ideas and promoting your development.
Intel Liftoff
Liftoff for Startups using Intel
Take Down Code Barriers, Release Performance, and Turn Your Startup Into a Scalable, AI Company that Defines the Industry.
Early-stage AI and machine learning businesses are eligible to apply for Intel Liftoff for startups. No matter where you are in your entrepreneurial career, this free virtual curriculum supports you in innovating and scaling.
Benefits of the Program for AI Startups
Startups may get the processing power they need to address their most pressing technological problems with Intel Liftoff. The initiative also acts as a launchpad for collaborations, allowing entrepreneurs to improve customer service and strengthen one other’s offers.
Superior Technical Knowledge and Instruction
Availability of the program’s Slack channel
Free online seminars and courses
Engineering advice and assistance
Reduced prices for certification and training
Invitations to forums and activities with experts
Advanced Technology and Research Resources
Offers for Intel Developer Cloud free cloud credits
Cloud service provider credits
Availability of Intel developer tools, which provide several technological advantages
Use the Intel software library to access devices with next-generation artificial intelligence
Opportunities for Networking and Comarketing
Boost consumer awareness using Intel’s marketing channels.
Venture exhibitions at trade shows
Introductions at Intel around the ecosystem
Establish a connection with Intel Capital and the worldwide venture capital (VC) network
Developer Cloud Intel Tiber
Take down the obstacles to hardware access, quicken development times, and increase your AI and HPC processes’ return on investment (ROI).
Register to get instant access to the newest Intel software and hardware innovations, enabling you to write, test, and optimize code more quickly, cheaply, and effectively.
AI Pioneers Who Discovered Intel Liftoff for Startups as Their Launchpad
Their companies are breaking new ground in a variety of AI-related fields. Here’s how they sum up their time in the program and the benefits they’ve received in terms of improved performance.
Enabling businesses to develop and implement vision AI solutions more quickly and consistently
By processing crucial machine learning tasks with AI Tools, the Hasty end-to-end vision AI platform opens up new AI use cases and makes application development more approachable.
“Using Intel OneAPI to unlock computationally demanding vision AI tasks will be a stepwise shift for critical industries like disaster recovery, logistics, agriculture, and medical.”
Use particle-based simulation tools to assist engineers in creating amazing things
Using the Intel HPC Toolkit and the Intel Developer Cloud, Dive Solutions improves their cloud-native computational fluid dynamics simulation software for state-of-the-art hardware.
“It’s used parts from the Intel HPC Toolkit to optimize their solver performance on Intel Xeon processors in an economical manner. The workloads are currently being prepared to execute on both CPU and GPU architectures.
Using a hyperconverged, real-time analytics platform to address the difficulties posed by big data
Using oneAPI, the Isima low-code framework optimizes for cost and performance in the cloud while enabling real-time use cases that drastically shorten time-to-value.
Read more on govindhtech.com
0 notes
govindhtech · 10 months ago
Text
Precision 7960 Tower & LLMs In Dell Precision Workstations
Tumblr media
Precision 7960 Tower
Gaining an understanding of AI Precision Workstations and Large Language Models. Businesses can undergo a radical change by utilising the potential of huge language models on Precision workstations.
If any of the aforementioned circumstances apply to your intended AI application, there are a number of LLMs that may be utilized on-premises and tweaked utilising deskside workstations, most notably:
Gemma 7B, a condensed variant of Google’s Gemini, works well for compact and effective apps on smartphones with constrained processing power.
Llama 3 is ideal for large organizations and research because to its exceptional performance and versatility.
The Mistral series provides flexibility and customization for deployments that are affordable and is completely open source.
With better energy efficiency, lower costs, and improved scalability when working with multiple GPUs in tandem, the NVIDIA accelerated computing platform with NVIDIA Ada Lovelace architecture which includes NVIDIA RTX GPUs for training LLMs offers performance increases over earlier architectures in all categories.
Dell precision workstation
Dell Precision workstations, including the Precision 5860 Tower, Precision 7875 Tower, and Precision 7960 Tower, offer a multitude of single or dual processor configurations, memory configurations up to 4 TB, and the ability to configure single, dual, or up to quad NVIDIA RTX Ada Generation GPUs (systems vary on configuration options).
These desktop workstations will give you the capability to optimize LLMs using the model of your choosing while preserving data residency, privacy, and predictable costs.
Dell 5860, 7875, and 7960 Precision Workstations: Precision Power
Precision 5860 Tower, Precision 7875 Tower, and Precision 7960 Tower workstations manage the greatest computational and graphic demands. In addition to spacious single- or dual-processor configurations, they offer memory up to 4 TB and customizable NVIDIA RTX Ada Generation GPU configurations. Explore these cutting-edge workstations’ astonishing features and capabilities.
Precision 5860 Tower
Performance Unlock
Dell Precision 5860 Tower is powerful and expandable. Computationally intensive specialists who evaluate, render, and simulate complicated data would love this workstation.
Principal Elements
Processor Choices: With so many Intel Xeon processors available for the Precision 5860 Tower, you can customize the device to meet your unique requirements.
Graphics Abilities: This workstation offers unmatched graphics performance because it is outfitted with up to four NVIDIA RTX Ada Generation GPUs.
Memory Set Up: The Precision 5860 is capable of managing even the most complex multitasking situations because it supports up to 4 TB of RAM.
Storage Fixes: ensures rapid data access and plenty of storage space by providing a variety of storage solutions, including as high-capacity HDDs and NVMe SSDs.
Use cases and applications
The Precision 5860 is ideal for engineering, architecture, and media creation, where reliability and performance are essential. Experts love it because it handles large datasets, complex 3D models, and complex simulations easily.
Dell Precision 7875 Tower
Precision 7875 Tower
Dell Precision 7875 Tower pushes performance with its superior features and cutting-edge technology. This workstation targets power users who need the most processing and graphical power.
Principal Elements
Processor Power: Offers unrivalled multi-threaded performance and supports the newest AMD Ryzen Threadripper PRO CPUs.
Excellence in Graphics: Equipped with up to four NVIDIA RTX Ada Generation GPUs, this configuration guarantees fluid and effective management of jobs requiring a lot of graphics.
Enormous Memory: Up to 4 TB of ECC memory can be installed, offering the dependability and efficiency necessary for vital workloads.
Storage Versatility: Offers a selection of fast and large capacity storage solutions, such as SATA HDDs and PCIe NVMe SSDs.
Use cases and applications
Applications involving AI and machine learning, high-end video editing, and virtual reality creation are all ideal for the Precision 7875. Strong CPUs and top-tier GPUs make complex processes and resource-intensive operations easy.
Dell Precision 7960 Tower
With maximum power and configurability, the Dell Precision 7960 Tower delivers the best workstation performance. This paradigm is designed for mission-critical, high-performance computation and graphics.
Principal Elements
Highest Processing Capacity: The Precision 7960 can handle even the most taxing applications because to its dual Intel Xeon Scalable CPUs.
Superb Graphics: Capable of supporting up to four NVIDIA RTX Ada Generation GPUs, offering exceptional graphical capabilities.
Broad Memory Support: Provides reliable multitasking and support for up to 4 TB of DDR4 ECC memory.
Various configurations, such as NVMe SSDs and SATA HDDs, are available for optimal speed and capacity in this extensive storage options package.
Use cases and applications
The preferred workstation for advanced engineering simulations, financial modelling, and scientific research is the Precision 7960. It can handle the most difficult and demanding computational jobs thanks to its unmatched power and versatility.
Configuring Memory
The Precision 7960 Tower supports memory-intensive programs and multitasking:
Up to 4 TB DDR4 ECC memory, 2933 MHz speed.
Eight-channel memory architecture support.
Advanced ECC memory for data security.
Storage Options
The workstation has flexible storage options for fast data access and enough of space:
Supports 12 drives
Storage options include PCIe NVMe SSDs for fast speeds and SATA HDDs for greater capacities.
RAID for data security and performance
Connectivity and Growth
Dell Precision 7960 Tower has many expansion and connectivity options:
Numerous PCIe 3.0 and 4.0 expansion card slots
Type-A and Type-C USB 3.2 Gen 2×2 ports
Two Gigabit Ethernet ports
Wi-Fi 6 and Bluetooth 5.1 options – Thunderbolt 3 for fast data transfer
Dell Endpoint Security
Optional Optional Chassis Intrusion Detection Smart Card Reader Dimensions and Weight The Precision 7960 Tower‘s dimensions make it ideal for professional use:
17.5 x 7.9 x 22.3 inches.
Beginning at 45 pounds
With a powerful power supply, the workstation handles power-hungry application.
In conclusion
Precision workstations like the 5860 Tower, 7875 Tower, and Precision 7960 Tower offer greater performance, durability, and scalability. For professionals who want the highest possible computational and graphical skills, they are the ideal option. These workstations are capable of handling the most demanding workloads and intricate jobs thanks to their adjustable features, which include memory up to 4 TB and single to quad GPUs.
Read more on govindhtedch.com
0 notes
govindhtech · 10 months ago
Text
New Cloud Sandboxes From IBM Cloud Virtual Servers & Intel
Tumblr media
Cloud Sandboxes
Customers are invited to test the performance of 2nd and 4th generation Intel Xeon processors in a nonproduction environment using a new sandbox that uses IBM  Cloud Virtual Servers for VPC.
 Intel Cloud Virtual Servers and IBM Cloud Virtual Servers have launched a new custom cloud sandbox.
What is a cloud sandbox?
A cloud sandbox is a cloud infrastructure security feature that establishes a segregated testing environment. Users can securely launch programmes, run code, and test setups in this environment without affecting the system as a whole.
Resolving issues with performance in a testing setting
To fully comprehend the effectiveness of sophisticated apps running in your cloud hosting environment, performance testing is essential. Indeed, even in fully managed business settings such as IBM Cloud. Your compute strategy ultimately defines how performance is spread among your workloads, even while IBM can provide the newest hardware and software across global data centres built for optimum availability.
 Cloud sandboxes are useful since it’s important to understand your system’s performance characteristics before deploying to production. Without endangering vital resources, you can test and watch how a certain workload behaves in a segregated, non-production environment. Like the one IBM is releasing today, some cloud sandboxes concentrate on monitoring the memory, CPU, network, and I/O utilisation of your server. These kinds of cloud sandboxes can assist in figuring out how much money and processing capacity are needed for both short- and long-term execution. Let’s investigate the specifics.
Intel Xeon processor comparison on virtual servers using the new IBM Cloud sandbox
You may rapidly test and evaluate the workload performance characteristics of 2nd Gen Intel Xeon processors versus 4th Gen  Intel Xeon processors on IBM Cloud Virtual Servers for VPC using the new IBM  Cloud VPC sandbox. You have the option of deploying your own application for a fully customised evaluation or testing IBM system performance differences on preconfigured apps with preselected workload benchmarks. Using IBM Cloud Schematics and Terraform as the programming foundation, the IBM Cloud VPC sandbox is automated and operates on your IBM Cloud VPC account.
The virtual server resources you test are invoiced hourly and will belong to you. Therefore, after your testing is finished, it’s crucial to remove your virtual servers and application environment, just like in the majority of customer-managed cloud sandboxes. Benchmark tests can take as little as 30 seconds to a maximum of about 15 minutes, depending on your application environment. To enrol in their reimbursement programme and receive 100% of your expenses back in IBM Cloud credits, get in touch with IBM sandbox team.
Build, test, and destroy: options for IBM Cloud VPC sandbox applications
Huggingface, Presto, or a Monte Carlo simulation can be used as benchmarks while testing in the new IBM  Cloud VPC sandbox. Alternatively, you might bring your own (BYO) customised app.
Monte Carlo simulation: For financial workloads, the Monte Carlo simulation has emerged as the industry standard for analysing the statical patterns of huge, random data sets. The fundamental modelling strategy of the simulation is based on chance, just like in the casino game roulette. Applied within the IBM Cloud VPC sandbox on two distinct virtual servers, the default load settings generate numerical performance results through repeated random sampling. You may view the number of operations per second, memory usage, and CPU usage for every benchmark that is run on your virtual servers running both 2nd and 4th generation  Intel Xeon processors.
HuggingFace inference application: The IBM  Cloud VPC sandbox’s HuggingFace benchmark programme (version Optimum Intel with Pytorch) lets you run and test many natural language processing models on two widely used text categorization tasks: Bert-base-uncased and Roberta-base. This application is used for inference purposes. The performance dashboard will provide the model inference time for 100 iterations in milliseconds when you run the tests against your 2nd and 4th generation Intel Xeon-based virtual servers, allowing you to readily monitor the performance trend over time.
Presto data lake application: TPC-H query execution on Presto is crucial for decision support tasks in large-scale data lake systems and analytics. Second and fourth generation  Intel Xeon-based virtual servers are tested using IBM PrestoDB and TPC-H, the industry standard for data warehousing benchmarks, inside the IBM  Cloud VPC sandbox. Additionally, each test calculates query times in milliseconds.
Bring your own device (BYOD): Would you like to test the vCPU performance of our newest Intel Xeon processors with your own application? Not an issue. Using shell script formatting, you may quickly and simply upload your installer and runner file via the IBM Cloud VPC sandbox dashboard interface. Alternatively, you can configure IBM Cloud LogDNA for your application. For all performance benchmarks, Ubuntu 22.04 is used in each option. Once implemented, you can evaluate your 2nd and 4th generation  Intel Xeon-based virtual servers’ average, current, and maximum utilisation metrics.
Read more on govindhtech.com
0 notes
govindhtech · 11 months ago
Text
IBM Cloud Bare Metal Servers for VPCs Use 4th Gen Intel Xeon
Tumblr media
The range of IBM  Cloud Bare Metal Servers for Virtual Private Clouds is being shaken up by new 4th Gen Intel Xeon processors and dynamic network bandwidth.
With great pleasure, IBM is thrilled to announce that the fourth generation of Intel Xeon CPUs are now available on IBM  Cloud Bare Metal Servers for Virtual Private Clouds. IBM customers now have the ability to provision Intel’s most recent micro architecture within their very own virtual private cloud. This allows them to get access to a variety of performance benefits, such as increased core-to-memory ratios (21 new server profiles) and dynamic network bandwidth that is only available through IBM Cloud VPC. For those who are following track, that is three times as many provisioning options as their present Intel Xeon  CPUs, which are of the second generation. Take a look around.
Are these servers made of bare metal suitable for my needs?
In addition to having rapid provisioning, excellent network speeds, and the most secure software-defined resources that are accessible within IBM, IBM  Cloud Bare Metal Servers for Virtual Private Clouds are hosted on their most recent and developer-friendly platform. Every single one of your central processing units would be based on the 4th gen Intel Xeon processors, which IBM initially introduced on IBM  Cloud Bare Metal Servers for traditional infrastructure in conjunction with Intel’s day-one release product.
The traditional IBM Cloud infrastructure is distinct from the IBM Cloud Virtual Private Cloud. More suitable for large, steady-state, predictable activities that call for the highest possible level of customisation is this method. However, IBM Cloud Virtual Private Cloud is an excellent solution for high-availability and maximum elasticity requirements. Take a look at this brief introduction video to get a better understanding of which environment would be most suitable for your workload requirements.
The customisation choices available to you include five pre-set profile families, which contain your number of  CPU instances, RAM, and bandwidth, in the event that IBM  Cloud Bare Metal Servers for Virtual Private  Cloud turns out to be your preferred choice. What sets IBM Cloud apart from other cloud services is the fact that each profile provides you with DDR-5 memory and dynamic network bandwidth ranging from 10 to 200 Gbps. For tasks that require a significant amount of  CPU power, such as heavy web traffic operations, production batch processing, and front-end web servers, compute profiles are the most effective solution.
Balanced profiles are designed to provide a combination of performance and scalability, making them a great choice for databases of a moderate size and cloud applications that experience moderate traffic.
Memory profiles are most effective when applied to workloads that require a significant amount of memory, such as large cache and database applications, as well as in-memory analytics.
When it comes to running small to medium in-memory databases and OLAP, such as SAP BW/4 HANA, very high profiles are the most effective solutions.
Large in-memory databases and online transaction processing workloads are both excellent for ultra-high profiles because they offer the most memory per core.
For these bare metal servers, what kinds of workloads do you propose they handle?
Over the course of this year, IBM’s beta programme was exposed to a wide variety of workloads; nonetheless, there were a few noteworthy success stories that particularly stood out:
Building on top of IBM  Cloud, VMware Cloud Foundation These workloads required a high core performance, interoperability with VMware, licencing portability, a smaller core count variety, and a Generic operating system, which IBM just recently launched. In a dedicated location, they conducted tests for VMware managed virtual cloud functions (VCF) as well as build-your-own VMware virtual cloud functions (VCF).
They were happy with the customisation freedom and benchmark performance enhancements that backed up their findings. During the second half of the year, these workloads will be accessible on Intel Xeon profiles of the fourth generation within the IBM  Cloud Virtual Private Cloud.
With regard to HPCaaS, this workload was one of a kind, and IBM believe that it is a primary use case for this distribution. Terraform and IBM Storage Scale were used in their tests to see whether or not they could get improved performance. They were delighted with the throughput improvement and the agile provisioning experience between platforms and networking.
The task of providing financial services and banking necessitated both powerful and dedicated system performance, as well as the highest possible level of security and compliance. After conducting tests to determine capacity expansion, user interface experience, security controls, and security management, they were thrilled to find that production times had been reduced.
Beginning the process
In the data centres of IBM  Cloud Dallas, Texas, bare metal servers powered by 4th gen Intel Xeon processors are currently accessible. Additional sites will be added in the second half of the year 24. The IBM  Cloud Bare Metal Servers for Virtual Private Cloud catalogue allows you to view all of the pricing and provisioning options for their new 4th Gen Intel Xeon processors and save a quote to your account. As an alternative, you could start a chat and obtain some answers right now. Within their cloud documents, you can find more information by reading their getting started guides and tutorials.
Spend one thousand dollars in IBM Cloud credits
If you are an existing customer who is interested in provisioning new workloads or if you are inquisitive about deploying your first workload on IBM Cloud VPC, then you should be sure to take advantage of their limited time promotion for IBM Cloud VPC. By entering the promotional code VPC1000 within either the bare metal or virtual server catalogues, you will receive USD 1,000 in credits that may be used towards the purchase of your new virtual private cloud (VPC) resources. These resources include computing, network, and storage components. Only profiles based on the second generation of Intel Xeon processors and profiles from earlier generations are eligible for this promotion, which is only available for a limited period.
Read more on Govindhtech.com
0 notes
govindhtech · 6 months ago
Text
Intel Data Direct I/O Performance With Intel VTune Profiler
Tumblr media
Improve Intel Data Direct I/O (DDIO) Workload Performance with Intel VTune Profiler.
Profile uncore hardware performance events in Intel Xeon processors with oneAPI
One hardware feature included in Intel Xeon CPUs is Intel Data Direct I/O (DDIO) technology. By making the CPU cache the primary point of entry and exit for I/O data going into and out of the Intel Ethernet controllers and adapters, it contributes to advances in I/O performance.
To monitor the effectiveness of DDIO and Intel Virtualization Technology (Intel VT) for Directed I/O (Intel VT-d), which permits the independent execution of several operating systems and applications, it is essential to monitor uncore events, or events that take place outside the CPU core. By analyzing uncore hardware events, you may improve the performance of Intel Data Direct I/O (DDIO) workloads using Intel VTune Profiler, a performance analysis and debugging tool driven by the oneAPI.
We’ll talk about using VTune Profiler to evaluate and enhance directed I/O performance in this blog. Let’s take a quick look at Intel Data Direct I/O technology before we go into the profiling approach.
Overview of the Intel Data Direct I/O (DDIO) Technology
Intel Integrated I/O technology Intel DDIO was launched in 2012 for the Intel Xeon processor E5 and E7 v2 generations. It aims to increase system-level I/O performance by employing a new processor-to-I/O data flow.
I/O operations were sluggish and processor cache was a scarce resource prior to the development of Data Direct I/O technology. It was necessary for the host processor’s main memory to store and retrieve any incoming or departing data from an Ethernet controller or adapter, respectively. It used to be necessary to move the data from main memory to the cache before working with it.
This led to a lot of read and write operations in the memory. This also caused some additional, speculative read operations from the I/O hub in some of the older designs. Excessive memory accesses often lead to higher system power consumption and deterioration of I/O performance.
Intel DDIO technology was created to rearrange the flow of I/O data by making the processor cache the primary source and destination of I/O data instead of the main memory, as the processor cache is no longer a restricted resource.
Depending on the kind of workload at the workstation or on the server, the DDIO approach offers benefits like:
Higher transaction rates, reduced battery usage, reduced latency, increased bandwidth, and more.
There is no industry enablement needed for the Data Direct I/O technology.
It doesn’t rely on any hardware, and it doesn’t need any modifications to your operating system, drivers, or software.
Boost DDIO Performance Using Intel VTune Profiler
A function carried out in a CPU’s uncore section, outside of the processor core itself, that yet affects processor performance as a whole is referred to as an uncore event. For instance, these occurrences may be connected to the Intel Ultra Path Interconnect (UPI) block, memory controller, or I/O stack action.
A new recipe in the VTune Profiler Cookbook explains how to count these kinds of uncore hardware events using the tool’s input and output analysis function. You may analyze Data Direct I/O and VT-d efficiency by using the data to better understand the traffic and behavior of the Peripheral Component Interconnect Express (PCIe).
The recipe explains how to do input and output analysis, evaluate the findings, and classify the resulting I/O metrics. In essence, VTune Profiler v2023.2 or later and an Intel Xeon scalable CPU of the first or later generation are needed. Although the approach is suitable to the most recent version of Intel Xeon Processors, the I/O metrics and events covered in the recipe are based on the third generation Intel Xeon Scalable Processor.
Perform I/O Analysis with VTune Profiler
Start by analyzing your application’s input and output using VTune Profiler. With the analysis function, you may examine CPU, bus, and I/O subsystem use using a variety of platform-level metrics. You may get data indicating the Intel Data Direct I/O(DDIO) use efficiency by turning on the PCIe traffic analysis option.
Analyze the I/O Metrics
VTune Profiler Web Server or VTune Profiler GUI may be used to examine the report that is produced as a consequence of the input and output analysis. Using the VTune Profiler Web Server Interface, the recipe illustrates the examination of many I/O performance indicators, including:
Platform diagram use of the physical core, DRAM, PCIe, and Intel UPI linkages.
PCIe Traffic Summary, which includes metrics for both outgoing (caused by the CPU) and incoming (caused by I/O devices) PCIe traffic.
These measurements aid in the computation of CPU/IO conflicts, latency for incoming read/write requests, PCIe bandwidth and efficient use, and other factors.
Metrics to assess the workload’s effectiveness in re-mapping incoming I/O device memory locations to various host addresses using Intel VT-d technology.
Usage of DRAM and UPI bandwidth.
Read more on Govindhtech.com
0 notes
govindhtech · 8 months ago
Text
Stable Diffusion Upscale Pipeline Using PyTorch & Diffusers
Tumblr media
Stable Diffusion is a cutting-edge technique that makes use of latent diffusion models to produce high-quality images from textual descriptions. Easy-to-use pipelines for deploying and utilizing the Stable Diffusion model, including creating, editing, and upscaling photos, are provided by the Hugging Face diffusers package.
Best way to upscale Stable Diffusion
We will tell how to use the Stable Diffusion Upscale Pipeline from the diffusers library to upscale images produced by stable diffusion in this article. Go over the rationale behind upscaling and show you how to use the Intel Extension for the PyTorch package (a Python package where Intel releases its latest optimizations and features before upstreaming them into open-source PyTorch) to optimize this process for better performance on Intel Xeon Processors.
How can the Stable Diffusion Upscale Pipeline be made more efficient for inference?
Using the Stable Diffusion model, the Stable Diffusion Upscale Pipeline from the Hugging Face diffusers library is intended to improve input image resolution, specifically increasing the resolution by a factor of four. This pipeline employs a number of different components, such as a frozen CLIP text model for text encoding, a Variational Auto-Encoder (VAE) for picture encoding and decoding, a UNet architecture for image latent noise reduction, and multiple schedulers to control the diffusion process during image production.
This pipeline is perfect for bringing out the details in artificial or real-world photos, as it is especially helpful for applications that need to produce high-quality image outputs from lower resolution inputs. In order to balance fidelity to the input text versus image quality, users can define a variety of parameters, including the number of denoising steps. Custom callbacks can also be supported during the inference process to allow for monitoring or modification of the generation.
Tune each of the Stable Diffusion Upscale Pipeline’s component parts separately before combining them to improve the pipeline’s performance. An essential component of this improvement is the PyTorch Intel Extension. With sophisticated optimizations, the addon improves PyTorch and offers a further boost in speed on Intel technology. These improvements make use of the capabilities of Intel CPUs’ Vector Neural Network Instructions (VNNI), Intel Advanced Vector Extensions 512 (Intel AVX-512), and Intel Advanced Matrix Extensions (Intel AMX). The Python API ipex.optimize(), which is accessible through the Intel Extension for PyTorch, automatically optimizes the pipeline module, enabling it to take advantage of these advanced hardware instructions for increased performance efficiency.
Sample Code
Using the Intel Extension for PyTorch for performance enhancements, the code sample below shows how to upscale a picture using the Stable Diffusion Upscale Pipeline from the diffusers library. The pipeline’s U-Net, VAE, and text encoder components are all individually targeted and CPU inference-optimized.
Configuring the surroundings
It is advised to carry out the installations in a virtualized Conda environment. Install the Intel Extension for PyTorch, diffusers, and PyTorch itself:
python -m pip install torch torchvision torchaudio –index-url https://download.pytorch.org/whl/cpu python -m pip install intel-extension-for-pytorch python -m pip install oneccl_bind_pt –extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/ pip install transformers pip install diffusers
How to Improve
First, let’s load the sample image that you want to upscale and import all required packages, including the Intel Extension for PyTorch.
Let’s now examine how the PyTorch Intel Extension capabilities can be used to optimize the upscaling pipeline.
To optimize, each pipeline component is targeted independently. Initially, you configure the text encoder, VAE, and UNet to use the Channels Last format. Tensor dimensions are ordered as batch, height, width, and channels using the channel’s last format. Because it better fits specific memory access patterns, this structure is more effective and improves performance. For convolutional neural networks, channels last is especially helpful because it minimizes the requirement for data reordering during operations, which can greatly increase processing speed.
Similar to this, ipex.optimize()} is used by Intel Extension for PyTorch to optimize each component, with the data type set to BFloat16. With the help of Intel AMX, which is compatible with 4th generation Xeon Scalable Processors and up, BFloat16 precision operations are optimized. You can enable Intel AMX, an integrated artificial intelligence accelerator, for lower precision data types like BFloat16 and INT8 by utilizing IPEX'soptimize()} function.
Ultimately, you can achieve the best results for upscaling by employing mixed precision, which combines the numerical stability of higher precision (e.g., FP32) with the computational speed and memory savings of lower precision arithmetic (e.g., BF16). This pipeline automatically applies mixed precision when `torch.cpu.amp.autocast()} is set. Now that the pipeline object has been optimized using Intel Extension for PyTorch, it can be utilized to achieve minimal latency and upscale images.
Configuring an Advanced Environment
This section explains how to configure environment variables and configurations that are optimized for Intel Xeon processor performance, particularly for memory management and parallel processing, to gain even more performance boosts. Some environment variables unique to the Intel OpenMP library are configured by the script ‘env_activate.sh’. Additionally, it specifies which shared libraries are loaded before others by using LD_PRELOAD. In order to ensure that particular libraries are loaded at runtime prior to the application starting, the script constructs the path to those libraries dynamically.
How to configure Advanced Environment on Intel Xeon CPUs for optimal performance:
Install two packages that serve as dependencies to use the script
pip install intel-openmp conda install -y gperftools -c conda-forge
git clone https://github.com/intel/intel-extension-for-pytorch.git cd intel-extension-for-pytorch git checkout v2.3.100+cpu
cd examples/cpu/inference/python/llm
Activate environment variables
source ./tools/env_activate.sh
Run a script with the code from the previous section
python run_upscaler_pipeline.py
Your environment is prepared to run Stable Diffusion Upscale Pipeline, which was performance-flagged in the previous stage to optimize for higher performance. Moreover, additional performance can be obtained by inference utilizing the Intel Extension for Pytorch optimized pipeline.
Read more on Govindhtech.com
0 notes
govindhtech · 10 months ago
Text
WiNGPT: Intel Xeon Processor Powered Winning Health LLM
Tumblr media
Healthcare organisations’ performance requirements are met by WiNGPT, Winning Health LLM, which uses  Intel technology.
Summary
Large language models (LLMs) are a novel technology with great promise for use in medical settings. This is well acknowledged in the current state of smart hospital improvement. Medical report generation, AI-assisted imaging diagnosis, pathology analysis, medical literature analysis, healthcare Q&A, medical record sorting, and chronic disease monitoring and management are just a few of the applications that are powered by LLMs and help to improve patient experience while lowering costs for medical institutions in terms of personnel and other resources.
The absence of high-performance and reasonably priced computing systems, however, is a significant barrier to the wider adoption of LLMs in healthcare facilities. Consider the inference process for models: the sheer magnitude and complexity of LLMs considerably outweigh those of typical AI applications, making it difficult for conventional computing platforms to satisfy their needs.
WiNGPT, a medical LLM leader, has been enhanced by Winning Health with the release of its 5th generation Intel Xeon Scalable processor-based WiNGPT system. The method efficiently makes use of the processors’ integrated accelerators, such as Intel Advanced Matrix Extensions (Intel AMX), for model inference. In comparison with the platform based on the 3rd Gen Intel Xeon Scalable processors, the inference performance has been boosted by nearly three times through collaboration with Intel in areas such as graph optimisation and weight-only quantization. Accelerating the implementation of LLM applications in healthcare organisations, the upgrade satisfies the performance demand for scenarios like automated medical report creation.
Challenge: Medical LLM Inference’s Compute Dilemma
The broad implementation of LLMs across multiple industries, including healthcare, is seen as a significant achievement for the practical usage of this technology. Healthcare organisations are investing more and have advanced significantly in LLMs for medical services, management, and diagnosis. According to research, the healthcare industry will embrace LLMs at a rapid pace between 2023 and 2027. By that time, the market is predicted to grow to a size of over 7 billion yuan.
Typically, LLMs are compute-intensive applications with high computing costs because of the significant computing resources required for training, fine-tuning, and inference. Model inference is one of the most important phases in the deployment of LLM among them. Healthcare facilities frequently face the following difficulties when developing model inference solutions:
There is a great need for real-time accuracy in these intricate settings. For this to work, the computing platform must have sufficient inference capacity. Furthermore, healthcare organisations typically prefer that the platform be deployed locally rather than on the cloud due to the strict security requirements for medical data.
While GPU improvements may be necessary in response to LLM updates, hardware upgrades are rarely common. Updated models might therefore not be compatible with outdated hardware.
The inference of Transformer-based LLMs now requires significantly more hardware than it did previously. It is challenging to completely use prior computational resources since memory and time complexity both rise exponentially with the length of the input sequence. As such, hardware utilisation is still below maximum potential.
From an economic standpoint, it would be more expensive to deploy computers specifically for model inference, and their utilisation would be restricted. Because of this, many healthcare organisations choose to employ CPU-based server systems for inference in order to reduce hardware costs while maintaining the ability to handle a range of workloads.
WiNGPT based on Intel 5th Generation Xeon Scalable Processors
An LLM created especially for the healthcare industry is called WiNGPT by Winning Health. WiNGPT, which is based on the general-purpose LLM, combines high-quality medical data and is tailored and optimised for medical scenarios, enabling it to offer intelligent knowledge services for various healthcare scenarios. The three unique characteristics of WiNGPT are as follows:
Perfected and specialised: WiNGPT delivers remarkable data accuracy that satisfies a variety of business needs. It has been trained and refined for medical scenarios and on high-quality data.
Cheap: By means of algorithm optimisation, the CPU-based deployment has achieved generation efficiency that is comparable to that of the GPU through testing.
Assist with personalised private deployment: In addition to improving system stability and dependability, private deployment guarantees that medical data stays inside healthcare facilities and prevents data leaks. It also enables tailored alternatives to suit different financial plans for organisations with different needs.
Winning Health has teamed with  Intel and selected the 5th Gen Intel Xeon Scalable processors in order to speed up WiNGPT’s inference speed. These processors have a lower total cost of ownership (TCO), provide remarkable performance gains per watt across a range of workloads, and are more reliable and energy-efficient. They also perform exceptionally well in data centres, networks, AI, and HPC. In the same power consumption range, the 5th Gen Intel Xeon Scalable processors provide faster memory and more processing capability than their predecessors. Furthermore, when deploying new systems, they greatly reduce the amount of testing and validation that must be done because they are interoperable with platforms and software from previous generations.
AI performance is elevated by the integrated Intel AMX and other AI-optimized technologies found in the 5th generation Intel Xeon Scalable CPUs. Intel AMX introduces a novel instruction set and circuit design that allows matrix operations, resulting in a large increase in instructions per cycle (IPC) for AI applications. For training and inference workloads in AI, the breakthrough results in a significant performance boost.
The Intel Xeon Scalable 5th generation processor enables:
Up to 21% increase in overall performance
A 42 percent increase in inference performance
Memory speed up to 16 percent quicker
Larger L3 cache by up to 2.7 times
Greater performance per watt by up to ten times
Apart from the upcoming 5th generation Intel Xeon Scalable processors, Winning Health and Intel are investigating methods to tackle the memory access limitation in LLM inference using the existing hardware platform. LLMs are typically thought of being memory-bound because of their large number of parameters, which frequently necessitate loading billions or even tens of billions of model weights into memory before calculating. Large amounts of data must be momentarily stored in memory during computation and accessed for further processing. Thus, the main factor preventing inference speed from increasing has shifted from processing power to memory access speed.
In order to maximise memory access and beyond, Winning Health and Intel have implemented the following measures:
Graph optimisation is the technique of combining several operators in order to minimise the overhead associated with operator/core calls. By combining many operators into one operation, performance is increased because fewer memory resources are used than when reading in and reading out separate operators. Winning Health has optimised the algorithms in these procedures using Intel Extension for PyTorch, which has effectively increased performance. Intel employs the Intel Extension for PyTorch plug-in, which is built on Intel Xeon Scalable processors and Intel Iris Xe graphics, to enhance PyTorch performance on servers by utilising acceleration libraries like oneDNN and oneCCL.
Weight-only quantization: This kind of optimisation is specifically designed for LLMs. The parameter weights are transformed to an INT8 data type as long as the computing accuracy is ensured, but they are then returned to half-precision. This helps to speed up the computing process overall by reducing the amount of memory used for model inference.
WiNGPT’s inference performance has been enhanced by Winning Health and Intel working together to improve memory utilisation. In order to further accelerate inference for the deep learning framework, the two have also worked together to optimise PyTorch’s main operator algorithms on CPU platforms.
The LLaMA2 model’s inference performance reached 52 ms/token in a test-based validation environment. An automated medical report is generated in less than three seconds for a single output.
Winning Health also contrasted the 3rd Gen and 5th Gen  Intel Xeon Scalable processor-based solutions’ performance during the test. According to the results, current generation CPUs outperform third generation processors by nearly three times. Image credit to Intel
The robust performance of the 5th Gen Intel Xeon Scalable CPU is sufficient to suit customer expectations because the business situations in which WiNGPT is employed are relatively tolerant of LLM latency. In the meantime, the CPU-based approach may be readily scaled to handle inference instances and altered to run inference on several platforms.
Advantages
Healthcare facilities have benefited from WiNGPT’s solution, which is built on 5th generation Intel Xeon Scalable processors, in the following ways:
Enhanced application experience and optimised LLM performance: Thanks to technical improvements on both ends, the solution has fully benefited from AI’s performance benefits when using 5th generation Intel Xeon Scalable processors. It can ensure a guaranteed user experience while reducing generation time by meeting the performance requirements for model inference in scenarios like medical report generation.
Increased economy while controlling platform construction costs: By avoiding the requirement to add specialised inference servers, the solution can use the general-purpose servers that are currently in use in healthcare facilities for inference. This lowers the cost of procurement, deployment, operation, maintenance, and energy usage.
LLMs and other IT applications are ideally balanced: Due to the solution’s ability to employ CPU for inference, computing power allocation for healthcare organisations can be more agile and flexible as needed, as CPU can be divided between LLM inference and other IT operations.
Observing Up Front
When combined with WiNGPT, the 5th generation Intel Xeon Scalable CPUs offer exceptional inferencing speed, which facilitates and lowers the cost of LLM implementation. To increase user accessibility and benefit from Winning Health’s most recent AI technologies, both parties will keep refining their work on LLMs.
Read more on govindhtech.com
0 notes