#Intel LeanVec | Explore Tumblr posts and blogs

govindhtech · 3 months ago

Text

LeanVec Improves Out-of-Distribution Vector Search Accuracy

Intel LeanVec Conquer Vector Search with Smart Dimensionality Reduction

The last essay in this series highlighted how vector search is essential in many applications that need precise and fast replies. Vector search systems often perform poorly due to memory and computation strain from large vector dimensionality. Also common are cross-modal retrieval tasks, such as those in which a user provides a text query to find the most relevant photographs.

These searches often have statistical distributions that differ from database embeddings, making accuracy problematic. Intel's LeanVec integrates dimensionality reduction and vector quantisation to speed up vector search on huge vectors while retaining accuracy in out-of-distribution queries.

Introduction

Recently, deep learning models have enhanced their capacity to construct high-dimensional embedding vectors whose spatial similarities match inputs including pictures, music, video, text, genomics, and computer code. This capability allows programs to explore massive vector collections for semantically meaningful results by finding the closest neighbours to a query vector. Even though similarity search has improved, modern vector indices perform poorly as dimensionality increases.

The most frequent are graph indices, which are directed graphs with edges indicating vector neighbor-relationships and vertices representing dataset vectors. Graph traversal is effective to find nearest neighbours in sub-linear time.

Graph-based indices excel at small dimensionalities (D = 100) but struggle with deep learning model dimensionalities (D ≈ 512, 768, 1536). If deep learning model-derived vectors dominate similarity search deployments, eliminating this performance gap is crucial.

This graph search speed drop is caused by the system's memory latency and bandwidth, which are largely utilised to fetch database vectors from memory randomly. Vector compression sounds like a decent technique to minimise memory strain, however PQ and SCANN either don't compress sufficiently or perform poorly due to irregular memory access patterns.

The Out-of-Distribution Queries Challenge

The queries are out-of-distribution (OOD) when the database and query vector statistical distributions diverge, making vector compression harder. Unfortunately, two modern programs often do this. The first is cross-modal searching, when a user queries one modality to return relevant elements from another. Word searches help text2image find thematically similar pictures. Second, many models, including question-answering ones, may create queries and database vectors.

A two-dimensional example shows the importance of query-aware dimensionality reduction for maximum inner product search. For a query-agnostic method like PCA, projecting the database (𝒳) and query (Q) vectors onto the first main axis (large green arrow) is recommended. This selection will lower inner product resolution since this path is opposing Q's principal axis (orange arrow). Furthermore, the helpful direction (the second primary axis of 𝒳) is gone.

A Lightweight Dimensionality Reduction Method

To speed up similarity search for deep learning embedding vectors, LeanVec approximates the inner product of a database vector x and a query q.

How projection works LVQ reduces the number of bits per entry, whereas DRquery and DRDB reduce vector dimensionality. As shown in Figure, LeanVec down-projects query and database vectors using linear functions DRquery and DRDB.

Each database vector x is compressed twice via LeanVec:

First vector LVQ(DRDB(x)). Inner-product approximation is semi-accurate.

LVQ(x), secondary vector. An appropriate description is the inner-product approximation.

The graph is built and searched using main vectors. Intel experiments show that the graph construction resists LVQ quantisation and dimensionality reduction. Only secondary vectors are searched.

The graph index is searched using main vectors. Less memory footprint reduces vector retrieval time. Due to its decreased dimensionality, the approach requires fewer fused multiply-add operations, reducing processing effort. This approximation is ideal for graph search's random memory-access pattern because it permits inner product calculations with individual database vectors without batch processing.

Intel compensates for inner-product approximation errors by collecting additional candidates and reranking them using secondary vectors to return the top-k. Because query dimensionality reduction (i.e., computing f(q)) is only done once per search, there is some runtime overhead.

Searches are essential to graph formation. Intel's search acceleration directly affects graph construction.

LeanVec learns DRquery and DRDB from data using novel mathematical optimisation algorithms. Because these methods are computationally efficient, their execution time depends on the number of dimensions, not vectors. The approaches additionally consider the statistical distributions of a small sample of typical query vectors and database vectors.

Findings

The results are obvious. LeanVec improves SVS performance, exceeding the top open-source version of a top-performing algorithm (HNSWlib). The reduction in per-query memory capacity increases query speed approximately 4-fold with the same recall (95% 10 recall@10).

Conclusion

LeanVec uses linear dimensionality reduction and vector quantisation to speed up similarity searches on modern embedding models' high-dimensional vectors. As with text2image and question-answering systems, LeanVec excels when enquiries are out of distribution.

#technology #technews #govindhtech #news #technologynews #AI #artificial intelligence #LeanVec #Intel LeanVec #Vector Search #Out-of-Distribution Queries #Dimensionality Reduction

0 notes