-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Index IO, cloning and hyper parameter tuning
Faiss indexes are often composite, which is not easy to manipulate for the individual index types. Also, they have a lot of parameters and it is often difficult to find the optimal structure for a given use case. Therefore, Faiss provides a high-level interface to manipulate indexes in bulk and automatically explore the parameter space.
All the functions below produce deep copies, i.e., you don't need to care about object ownership.
The I/O functions are:
-
write_index(index, "large.index")
: writes the given index to file large.index -
Index * index = read_index("large.index")
: reads a file
The cloning functions are:
-
Index* index2 = clone_index(index)
: returns a deep copy of the index. Note that some indexes are not supported by the cloning function. -
Index *index_cpu_to_gpu = index_cpu_to_gpu(resource, dev_no, index)
: deep copies an index, and puts the parts that can be GPU-ified on the GPU. For example, for anIndexPreCompute
that includes anIndexIVFPQ
, only theIndexIVFPQ
will be copied to GPU. A 4th argument can be provided to set the copying options (float16 precision and such), see Faiss on the GPU -
Index *index_gpu_to_cpu = index_gpu_to_cpu(index)
: copy in the other direction -
index_cpu_to_gpu_multiple
: uses anIndexShards
orIndexReplicas
to copy the index to several GPUs.
The basic I/O functions are for files. However, Faiss indexes can be serialized to any channel that supports a write operation for chunks of binary data (or read for reading). Random access to the channel is not required.
In the C++ API, this appears as the IOReader
and IOWriter
classes, see faiss/impl/io.h.
The only operation that these classes expose is operator ()
, which works similarly as stdio's fwrite
and fread
.
In Python, these operations are exposed as read_bytes
and write_bytes
.
-
FileIOReader
andFileIOWriter
: the object that are used to write to files. Note that memory mapping is supported only for these types of indexes. -
VectorIOReader
andVectorIOWriter
read/write to astd::vector<uint8_t>
. These are exposed in the Python functionsserialize_index
anddeserialize_index
, see python/faiss.py, that serialize indexes to numpy uint8 arrays. -
BufferedIOReader
andBufferedIOWriter
: wrap another index to add a buffering layer and avoid too small reads or writes.
To support I/O types that exist only in python, eg. file-like objects with a read()/write() function, use the PyCallbackIOReader
(and PyCallbackIOWriter
).
They just take a function callback (can be an object method) that reads from or writes a bytes
object (or str
in Python2).
Example here: tests/test_io.py.
In this section we focus on a subset of the Faiss indexes, that are selected because they are most useful to index 1M to 1G vectors under various constraints.
The parameters of the indexes can be classified in:
-
build-time parameters that must be set when the index is built
-
run-time parameters that can be adjusted before performing a search
Auto-tuning is performed on the runtime parameters. The parameters that are adjusted are:
index type | Index class | runtime parameter | comments |
---|---|---|---|
IVF*, IMI2x* | IndexIVF* | nprobe | the main parameter to adjust the speed-precision tradeoff |
IMI2x* | IndexIVF | max_codes | useful for the IMI, that often has unbalanced inverted lists |
HNSW* | IndexHNSW | efSearch | the depth of the HNSW search |
PQ* | IndexIVFPQ, IndexPQ | ht | Hamming threshold for polysemous |
PQ*+* | IndexIVFPQR | k_factor | determines how many result vectors are verified |
Refine | IndexRefine | kf_factor | determines how many vectors are verified per queried result |
The auto-tuning explores the speed-accuracy space and keeps the Pareto-optimal points in that space.
When a parameters applies to a coarse quantizer in an IVF index, it is prefixed by ivf_
.
For example for an IVF_HSNW32,Flat
index, the HNSW efSearch parameter can be set with ivf_efSearch
.
The AutoTuneCriterion
contains ground-truth results for the search and evaluates search results. It returns a performance measure between 0 and 1. Currently there are implementations for 1-recall@R and R-recall@R (aka. intersection) criterions.
The object stores all operating points as (performance, search time, parameter_set_id) tuples and selects the operating points that are optimal. An operating point is optimal if there is no other operating point that is faster to get at least the same performance.
The ParameterSpace object scans a given index for tunable parameters and builds a table of possible values for each parameter.
The parameter space is exponentially large with the number of parameters. We explore it in a random order.
We apply a heuristic to prune the exploration space. For each parameter, a higher value is slower and more accurate. Therefore, given two tuples of parameters a, b with their partial order, if
a >= b
then a is slower and more accurate than b. Therefore, before evaluating a set of parameters a we check if there exists (b, c) st. a >= b and c >= a. If the (perf_c, time_b) is not an optimal operating point, then we skip the experiment because it is guaranteed to produce a non-optimal operating point. This quickly prunes the set of experiments to realize.
A set of parameters is expressed as a human-readable string ("nprobe=2,ht=52") and an index (cno).
See faiss/tests/demo_sift1M.cpp
for an example usage of auto-tuning. The output reads like:
0/500: cno=0 nprobe=1,ht=2 bounds [perf<=1.000 t>=0.000] perf 0.000 t 0.061
1/500: cno=779 nprobe=2048,ht=256 bounds [perf<=1.000 t>=0.061] perf 0.627 t 16.697 *
...
7/500: cno=247 nprobe=128,ht=42 bounds [perf<=0.000 t>=0.270] skip
8/500: cno=664 nprobe=16,ht=112 bounds [perf<=0.282 t>=0.064] perf 0.017 t 0.150 *
9/500: cno=555 nprobe=8,ht=94 bounds [perf<=0.017 t>=0.064] perf 0.000 t 0.092
10/500: cno=292 nprobe=16,ht=50 bounds [perf<=0.000 t>=0.061] skip
...
This means that experiment 8 tests parameter combination 247 which sets nprobe to 128 and the Hamming threshold to 42. Previous experiments show that the performance is below 0.282 and the runtime above 0.064 s. The result performance (1-recall at 1 in this case) is 0.017 obtained in 0.150s. The star at the end of the line indicates that the operating point is optimal.
The output result is the list of operating points that are optimal:
cno=720 key=nprobe=1,ht=122 perf=0.0819 t=0.046
cno=710 key=nprobe=4,ht=120 perf=0.0895 t=0.059
cno=732 key=nprobe=1,ht=124 perf=0.1139 t=0.068
cno=744 key=nprobe=1,ht=126 perf=0.1488 t=0.069
cno=734 key=nprobe=4,ht=124 perf=0.1775 t=0.080
cno=746 key=nprobe=4,ht=126 perf=0.2264 t=0.089
cno=770 key=nprobe=4,ht=256 perf=0.5010 t=0.095
cno=771 key=nprobe=8,ht=256 perf=0.5564 t=0.156
cno=772 key=nprobe=16,ht=256 perf=0.5936 t=0.197
cno=774 key=nprobe=64,ht=256 perf=0.6226 t=0.626
cno=776 key=nprobe=256,ht=256 perf=0.6267 t=2.246
cno=778 key=nprobe=1024,ht=256 perf=0.6268 t=8.601
The string "nprobe=4,ht=126"
can be used to set the parameters on the index.
There are 3 parameters that impact the reliability of auto-tuning (or manual tuning, btw):
-
sensitivity of the query set and the criterion. If it is too easy (eg. R@1 on Twitter Glove) or too hard (eg. the random dataset), no useful signal will come from the tuning.
-
the dataset should have enough query points, at least 1000, preferably 10000, to be sensitive enough to the parameter settings.
-
reliability of the timings. Times are measured in wall-clock time. This is most reliable on an unoccupied GPU or 1 thread of CPU. It is less reliable in multi-threading. The smaller the dataset, the less reliable the timing. Multi-thread measurements below 1s are not reliable.
Faiss building blocks: clustering, PCA, quantization
Index IO, cloning and hyper parameter tuning
Threads and asynchronous calls
Inverted list objects and scanners
Indexes that do not fit in RAM
Brute force search without an index
Fast accumulation of PQ and AQ codes (FastScan)
Setting search parameters for one query
Binary hashing index benchmark