-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Implementation notes
Here are a few notes on implementation details for which we sometimes get questions. We describe the tradeoffs and maybe a few unexpected design choices or results.
A typical operation in IndexFlatL2
is to exhaustively compare a set of nq
query vectors and a set of nb
database vectors in dimension d (then select the top-k smallest vectors).
Faiss has two implementations of this operation:
-
direct implementation that loops over nq, nb and the dimension of the vectors.
-
an implementation that uses the decomposition d(x, y) = ||x||^2 + ||y||^2 - 2 * <x, y>. This is faster because the most expensive operation in O(nq * nb * d) can be handed over to BLAS that normally does this efficiently.
We use implementation 1 when nq < 20, and implementation 2 otherwise.
The threshold 20 can be adjusted via global variable faiss::distance_compute_bias_threshold
(accessible in Python via faiss.cvar.distance_compute_bias_threshold
).
Note that solution 2 may be less stable numerically than 1 for vectors of very different magnitudes, see discussion in issue #297.
k-means is implemented in the Clustering object
After initialization, the k-means iterates two operations:
-
assign training points to centroids
-
recompute centroids as the center of mass of the points they are assigned to.
In terms of performance, the first operation is the most costly (by far). Incidentally, it can be performed by any index, since it is a nearest-neighbor search of the vectors to the centroids. Therefore the index is a parameter of the Clustering train method. It can be replaced with a GPU index (example) or a HNSW index (example).
PQ search boils down to doing look-ups in distance tables (as explained in the original paper “Product quantization for nearest neighbor search”).
For the IVFPQ, an additional level of pre-computation can be done to speed up the construction of these tables. This is explained in "Billion-scale similarity search with GPUs", section 5.2. There is a tradeoff between memory usage to store the additional tables and speed:
-
on CPU, it is governed by
IndexIVFPQ.use_precomputed_table
. It takes 4 possible values: -1=disable, 0=decide heuristically (default: use tables only if they are <IndexIVFPQ::precomputed_tables_max_bytes
, set to 2GB by default); 1=enable (size 256 * nlist * M); 2=specific version for theMultiIndexQuantizer
that is much more compact. Callingprecompute_table()
takesuse_precomputed_table
into account and updates the data for the next search. -
on GPU, precomputed tables are enabled if the
GpuIndexIVFConfig::usePrecomputedTables
is set to true at construction. The setting can be changed at runtime withGpuIndexIVF::setPrecomputedCodes(bool enable)
.
The PCA matrix is computed by:
-
if there are more training points than dimensions: compute the covariance matrix, then use the LAPACK dsyev to extract the eigenvalues and vectors.
-
if there are more dimensions than training points: compute the Gram matrix (pairwise dot products of training vectors), and extract the eigenvalues and vectors from that.
Non-exhaustive search means that only part of the database vectors are compared with.
This is the case with IndexIVF
variants and IndexHSNW
variants.
The number of distances that are effectively computed are collected in global variables.
For IndexIVF
this is indexIVF_stats.ndis
, for HNSW
it is hnsw_stats.ndis
.
These statistics are guaranteed to be accurate only if the search
function is not called from multiple threads.
To access the variables in Python, just prefix with faiss.cvar
, eg. faiss.cvar.indexIVF_stats.ndis
will contain the number of distances computed so far.
Faiss building blocks: clustering, PCA, quantization
Index IO, cloning and hyper parameter tuning
Threads and asynchronous calls
Inverted list objects and scanners
Indexes that do not fit in RAM
Brute force search without an index
Fast accumulation of PQ and AQ codes (FastScan)
Setting search parameters for one query
Binary hashing index benchmark