#OpenCLIP | Explore Tumblr posts and blogs

govindhtech · 2 months ago

Text

Chroma and OpenCLIP Reinvent Image Search With Intel Max

OpenCLIP Image search

Building High-Performance Image Search with Intel Max, Chroma, and OpenCLIP GPUs

After reviewing the Intel Data Centre GPU Max 1100 and Intel Tiber AI Cloud, Intel Liftoff mentors and AI developers prepared a field guide for lean, high-throughput LLM pipelines.

All development, testing, and benchmarking in this study used the Intel Tiber AI Cloud.

Intel Tiber AI Cloud was intended to give developers and AI enterprises scalable and economical access to Intel’s cutting-edge AI technology. This includes the latest Intel Xeon Scalable CPUs, Data Centre GPU Max Series, and Gaudi 2 (and 3) accelerators. Startups creating compute-intensive AI models can deploy Intel Tiber AI Cloud in a performance-optimized environment without a large hardware investment.

Advised AI startups to contact Intel Liftoff for AI Startups to learn more about Intel Data Centre GPU Max, Intel Gaudi accelerators, and Intel Tiber AI Cloud’s optimised environment.

Utilising resources, technology, and platforms like Intel Tiber AI Cloud.

AI-powered apps increasingly use text, audio, and image data. The article shows how to construct and query a multimodal database with text and images using Chroma and OpenCLIP embeddings.

These embeddings enable multimodal data comparison and retrieval. The project aims to build a GPU or XPU-accelerated system that can handle image data and query it using text-based search queries.

Advanced AI uses Intel Data Centre GPU Max 1100

The performance described in this study is attainable with powerful hardware like the Intel Data Centre GPU Max Series, specifically Intel Extension for PyTorch acceleration. Dedicated instances and the free Intel Tiber AI Cloud JupyterLab environment with the GPU (Max 1100):

The Xe-HPC Architecture:

GPU compute operations use 56 specialised Xe-cores. Intel XMX engines: Deep systolic arrays from 448 engines speed up dense matrix and vector operations in AI and deep learning models. XMX units are complemented with 448 vector engines for larger parallel computing workloads. 56 hardware-accelerated ray tracing units increase visualisation.

Memory hierarchy

48 GB of HBM2e delivers 1.23 TB/s of bandwidth, which is needed for complex models and large datasets like multimodal embeddings. Cache: A 28 MB L1 and 108 MB L2 cache keeps data near processing units to reduce latency.

Connectivity

PCIe Gen 5: Uses a fast x16 host link to transport data between the CPU and GPU. OneAPI Software Ecosystem: Integrating the open, standards-based Intel oneAPI programming architecture into Intel Data Centre Max Series GPUs is simple. HuggingFace Transformers, Pytorch, Intel Extension for Pytorch, and other Intel architecture-based frameworks allow developers to speed up AI pipelines without being bound into proprietary software.

This code’s purpose?

This code shows how to create a multimodal database using Chroma as the vector database for picture and text embeddings. It allows text queries to search the database for relevant photos or metadata. The code also shows how to utilise Intel Extension for PyTorch (IPEX) to accelerate calculations on Intel devices including CPUs and XPUs using Intel’s hardware acceleration.

This code’s main components:

It embeds text and images using OpenCLIP, a CLIP-based approach, and stores them in a database for easy access. OpenCLIP was chosen for its solid benchmark performance and easily available pre-trained models.

Chroma Database: Chroma can establish a permanent database with embeddings to swiftly return the most comparable text query results. ChromaDB was chosen for its developer experience, Python-native API, and ease of setting up persistent multimodal collections.

Function checks if XPU is available for hardware acceleration. High-performance applications benefit from Intel’s hardware acceleration with IPEX, which speeds up embedding generation and data processing.

Application and Use Cases

This code can be used whenever:

Fast, scalable multimodal data storage: You may need to store and retrieve text, images, or both.

Image Search: Textual descriptions can help e-commerce platforms, image search engines, and recommendation systems query photographs. For instance, searching for “Black colour Benz” will show similar cars.

Cross-modal Retrieval: Finding similar images using text or vice versa, or retrieving images from text. This is common in caption-based photo search and visual question answering.

The recommendation system: Similarity-based searches can lead consumers to films, products, and other content that matches their query.

AI-based apps: Perfect for machine learning pipelines including training data, feature extraction, and multimodal model preparation.

Conditions:

Deep learning torch.

Use intel_extension_for_pytorch for optimal PyTorch performance.

Utilise chromaDB for permanent multimodal vector database creation and querying, and matplotlib for image display.

Embedding extraction and image loading employ chromadb.utils’ OpenCLIP Embedding Function and Image Loader.

#technology #technews #govindhtech #news #technologynews #OpenCLIP #Intel Tiber AI Cloud #Intel Tiber #Intel Data Center GPU Max 1100 #GPU Max 1100 #Intel Data Center

0 notes

holyjak · 1 year ago

Text

Are you also constantly hearing about vector embeddings, vector databases (e.g. usearch), and how they can be used e.g. to find an image matching a textual prompt? Adrian provides a great explanation.

See usearch.clj (vectordb wrapper), clip.clj (clj wrapper for a neural network model for computing embeddings -> OpenClip).

#ai #learning #clojure

0 notes

machine-saint · 2 years ago

Text

starting in 2.0 they switched to openclip, though i know a lot of people stuck with 1.5 (due to tooling/fine-tunes being developed for it, image quality, whatever). afaict openclip was developed by a team of academics. no clue about midjurney.

ultimately though I don't think using the original CLIP makes image generation immoral any more than e.g. hosting source code on a platform owned by Microsoft or writing web applications in a framework developed by Facebook is immoral, or (to pick a closer comparison) drawing images in Photoshop using Adobe stock photos as reference. you can make an argument that people *ought* to stop using these tools (and I moved many of my projects off github!) but I think to insist that making AI art is immoral if any component of the process was made by a shitty company with shitty goals is a standard people don't apply to much else.

man something about comparing a shitpost made with a tool explicitly stated by its creators as intended to devalue creative labor to a shitpost made in order to challenge and decry a rising fascist movement feels really fucking bleak to me

#discourse

2K notes · View notes

unsettlingstories · 2 years ago

Note

Where do you source the base images for your ai art from? Are they stock photos, photographs you've taken, etc? :3

The base images are from OpenCLIP, which is a filtered version of the LAION-5B dataset.

4 notes · View notes

ai-news · 2 years ago

Link

How LAION used more data and new ML training techniques to improve image and text embeddings for various applications #AI #ML #Automation

#Blockchain #Crypto

0 notes

pulipuli · 2 years ago

Photo

要如何用文字來描述一張圖片，這是一項非常困難的任務。現在有許多用提示詞產生圖片的AI繪圖，那自然也有用圖片來產生提示詞的AI圖片介紹囉。 ---- # 圖片審查器 / CLIP Interrogator。 https://huggingface.co/spaces/pharma/CLIP-Interrogator?fbclid=IwAR32gsHNiblu4FSPWdSNne7Z-Gl3LcHbhESuTcoEfGSsv8JH0BDdE1QQL_o。 Sylvain Filoni釋出的CLIP Interrogator已經來到了第二代。 CLIP Interrogator的用途是從既有的圖片中產生合適的提示詞(prompt)。在2.1版裡，他使用了Stable Diffusion 2.0中使用ViT-H-14 OpenCLIP模型的版本。 CLIP Interrogator的用法非常簡單，直接打開網頁、上傳圖片，等候一段時間就可以取得結果。可是，這原本是設計給用AI繪圖的提示詞用的，那拿不是AI繪圖的照片的話，它也可以正常運作嗎？。答案是……��然可以！這張是隨手拍的午餐。用初代CLIP Interrogator去分析，得到的結果如下：。 | a wooden table topped with bowls of food, a stock photo, inspired by Yokoyama Taikan, gutai group, 2019 trending photo, crowded inn in the background, android close to camera, a photo of sephiroth, hoses:10, panoramic shot, yuruyuri, breakfast, trending on pixv, round-cropped 看起來好像多了很多奇妙的提示詞。讓我們換CLIP Interrogator 2.1試試看，出來的結果如下：。 | a wooden table topped with bowls of food, a picture, by Nōami, noodles, 2 0 2 2 photo, unedited, drink 好像合理多了？你覺得呢？ ---- ---- #AI #CLIPInterrogatorm #HuggingFace 繼續閱讀 ⇨ 新的圖片索引工具：CLIP Interrogator / Can CLIP Interrogator be an AI Indexer? https://blog.pulipuli.info/2023/02/can-clip-interrogator-be-an-ai-indexer.html

#AI #CLIPInterrogatorm #HuggingFace

0 notes

thebourisbox · 3 years ago

Text

Release of Stable Diffusion 2.0 !

See on Scoop.it - Design, Science and Technology

The new Stable Diffusion 2.0 release features many improvements, including new base models, a 4x upscaling model, and a depth-guided stable diffusion model. These new models offer many creative possibilities for image transformation and synthesis. New features in v2 include:

•Base 512x512 and 768x768 models trained from scratch with new OpenCLIP text encoder

X4 upscaling text-guided diffusion model

New “Depth2Image” functionality Blog: https://t.co/o3udlBN8uz

This release is led by @robrombach @StabilityAI

Read the full article at: mem.ai

0 notes

piclaunch · 8 years ago

Text

PRESS RELEASE On Blockage of Working Capital of Exporters.

#gallery-0-7 { margin: auto; } #gallery-0-7 .gallery-item { float: left; margin-top: 10px; text-align: center; width: 33%; } #gallery-0-7 img { border: 2px solid #cfcfcf; } #gallery-0-7 .gallery-caption { margin-left: 0; } /* see gallery_shortcode() in wp-includes/media.php */

For those who search knowledge…

LoggaWiggler / Pixabay

frankspandl / Pixabay

geralt / Pixabay

OpenClips / Pixabay

PublicDomainPictures / Pixabay

geralt / Pixabay

stux / Pixabay

Whatsq.com

http://whatsq.com/

Rete

makamuki0 / Pixabay

Source : Check Here

PRESS RELEASE : On Blockage of Working Capital of Exporters PRESS RELEASE On Blockage of Working Capital of Exporters. Source : Check Here

#gst news #news #PRESS RELEASE

0 notes

govindhtech · 2 years ago

Text

NVIDIA’s Robotics Expansion Exploits AI

NVIDIA’s AI Robotics and Generative AI

Cloud-native APIs and microservices, together with potent generative AI models, are advancing to the edge.

Almost every business is benefiting from the strength of transformer models and huge language models thanks to generative AI. Defect detection, real-time asset tracking, autonomous planning and navigation, human-robot interactions, and other areas that impact on robotics and logistics systems are now included in that reach.

On the NVIDIA Jetson platform for edge AI and robotics, NVIDIA today announced significant expansions to two frameworks: the NVIDIA Isaac ROS robotics framework has gone generally available, and the NVIDIA Metropolis extension on Jetson will follow.

NVIDIA has also developed a Jetson Generative AI Lab for developers to work with the most recent open-source generative AI models in order to hasten the creation and deployment of AI applications at the edge.

NVIDIA AI and the Jetson platform have been adopted by more than 1.2 million developers and over 10,000 clients, including Amazon Web Services, Cisco, John Deere, Medtronic, Pepsi, and Siemens.

Longer development cycles make it difficult for developers to create AI applications for the edge since the AI environment is continuously changing to handle scenarios that are getting more complex. It takes time and expertise to reprogramme robotics and AI systems on the go to adapt to shifting surroundings, production lines, and client automation demands.

To make the creation, deployment, and administration of AI at the edge simpler, generative AI delivers zero-shot learning, which enables a model to recognize things explicitly not seen previously in training.

AI Landscape Transformation

The simplicity of usage is significantly enhanced by generative AI since it can comprehend human language commands to modify models. These AI models outperform conventional convolutional neural network-based models because they are more adaptable in terms of detecting, segmenting, tracking, searching, and even reprogramming.

ABI Research estimates that by 2033, generative AI will increase manufacturing operations’ global revenue by $10.5 billion.

According to Deepu Talla, vice president of embedded and edge computing at NVIDIA, “Generative AI will significantly accelerate deployments of AI at the edge with better generalization, ease of use, and higher accuracy than previously possible.” The strength of transformer models and generative AI, along with the largest-ever software extension of our Metropolis and Isaac frameworks on Jetson, answers this demand.

Creating at the Edge of Generative AI

Developers can utilize the Jetson Generative AI Lab’s optimized tools and tutorials to deploy open-source LLMs, diffusion models to create stunning interactive images, vision language models (VLMs), and vision transformers (ViTs), which integrate vision AI and natural language processing to provide a thorough understanding of the scene.

The NVIDIA TAO Toolkit allows developers to construct accurate and effective AI models for edge applications. To optimize vision AI models, such as ViT and vision fundamental models, TAO offers a low-code interface. To construct extremely accurate vision AI models with very minimal data, they can also adapt and improve open-source models like OpenCLIP or fundamental models like NVIDIA NV-DINOv2. Additionally, a new transformer-based defect inspection model called VisualChangeNet is now part of TAO.

Using the Isaac and New Metropolis Frameworks

Enterprises may more easily and affordably adopt cutting-edge, vision AI-enabled solutions to solve pressing operational efficiency and safety issues thanks to NVIDIA Metropolis. For developers to create complicated vision-based apps fast, the platform offers a collection of strong APIs and microservices.

NVIDIA Metropolis developer tools are being used by more than 1,000 businesses, including BMW Group, Pepsico, Kroger, Tyson Foods, Infosys, and Siemens, to tackle operational, Internet of Things, and sensor processing concerns using vision AI, and the adoption rate is accelerating. By individuals wishing to develop vision AI applications, the tools have already been downloaded more than one million times.

By the end of the year, an enhanced set of Metropolis APIs and microservices on NVIDIA Jetson will be accessible, enabling developers to design and deploy scaled vision AI applications more rapidly.

The NVIDIA Isaac platform is used by hundreds of clients to create high-performance robotics solutions for a variety of industries, including agricultural, warehouse automation, last-mile delivery, and service robots.

With updated versions of the Isaac ROS and Isaac Sim software, NVIDIA made significant advancements to perception and simulation capabilities at ROSCon 2023. Isaac ROS, which is based on the widely used open-source Robot Operating System (ROS), adds awareness to automation by providing moving objects eyes and hearing. Robotics developers may quickly create robotics systems specifically suited for a wide range of applications by utilizing the capability of GPU-accelerated GEMs, such as visual odometry, depth perception, 3D scene reconstruction, localization, and planning.

Since the most recent Isaac ROS 2.0 version is now generally accessible, developers may now use Jetson to design and commercialize high-performance robotics products.

According to Geoff Biggs, CTO of the Open Source Robotics Foundation, “ROS continues to grow and evolve to provide open-source software for the entire robotics community.” This release’s introduction of NVIDIA’s new prebuilt ROS 2 packages, which make ROS 2 easily accessible to the sizable NVIDIA Jetson development community, will hasten this expansion.

Providing fresh examples of AI workflows

A production-ready AI solution must be created by optimizing the creation and training of AI models that are suited to certain use cases, putting strong security measures on the platform, coordinating the application, managing fleets, and more.

The Metropolis and Isaac frameworks are the foundation of NVIDIA’s curated collection of AI reference workflows, which allow developers to rapidly adopt the complete workflow or only integrate certain components, resulting in significant savings in both development time and cost. Network video recording, automatic optical inspection, and autonomous mobile robot are the three different AI processes.

“NVIDIA Jetson, with its broad and diverse user base and partner ecosystem, has helped drive a revolution in robotics and AI at the edge,” said Jim McGregor, principal analyst at Tirias Research. “As application needs get more complicated, we need to make a fundamental switch to platforms that make building edge deployments easier and faster. Developers now have access to new multi-sensor models and generative AI capabilities thanks to NVIDIA’s major software expansion.

There will be more to come

When developing cutting-edge AI solutions, every developer needs a set of essential capabilities, which NVIDIA has unveiled as a collection of system services. These services will make workflow integration simpler and spare developers the laborious process of creating them from scratch.

By enabling AI developers to stay on the bleeding edge of computing without the requirement for a full Jetson Linux upgrade, the new NVIDIA JetPack 6, which is anticipated to be ready by the end of the year, will significantly shorten development schedules and free them from Jetson Linux dependencies. The joint efforts of JetPack 6 and Linux distribution partners will also be used to broaden the range of Linux-based distribution options, including Canonical’s Optimized and Certified Ubuntu, Wind River Linux, Concurrent Real-Time Redhawk Linux, and numerous Yocto-based distributions.

Platform Expansion Benefits the Partner Ecosystem

A comprehensive range of support is offered by the Jetson partner ecosystem, including hardware, AI software, application design services, sensors, networking, and developer tools. These NVIDIA Partner Network innovators are essential in delivering the components and supporting systems for many commercially available devices.