Skip to content

Index IO, cloning and hyper parameter tuning

Matthijs Douze edited this page Mar 24, 2020 · 14 revisions

Faiss indexes are often composite, which is not easy to manipulate for the individual index types. Also, they have a lot of parameters and it is often difficult to find the optimal structure for a given use case. Therefore, Faiss provides a high-level interface to manipulate indexes in bulk and automatically explore the parameter space.

I/O and deep copying indexes

All the functions below produce deep copies, i.e., you don't need to care about object ownership.

The I/O functions are:

  • write_index(index, "large.index"): writes the given index to file large.index

  • Index * index = read_index("large.index"): reads a file

The cloning functions are:

  • Index* index2 = clone_index(index): returns a deep copy of the index. Note that some indexes are not supported by the cloning function.

  • Index *index_cpu_to_gpu = index_cpu_to_gpu(resource, dev_no, index): deep copies an index, and puts the parts that can be GPU-ified on the GPU. For example, for an IndexPreCompute that includes an IndexIVFPQ, only the IndexIVFPQ will be copied to GPU. A 4th argument can be provided to set the copying options (float16 precision and such), see Faiss on the GPU

  • Index *index_gpu_to_cpu = index_gpu_to_cpu(index): copy in the other direction

  • index_cpu_to_gpu_multiple: uses an IndexShards or IndexReplicas to copy the index to several GPUs.

I/O with python callbacks

TODO

Auto-tuning the runtime parameters

In this section we focus on a subset of the Faiss indexes, that are selected because they are most useful to index 1M to 1G vectors under various constraints.

The parameters of the indexes can be classified in:

  • build-time parameters that must be set when the index is built

  • run-time parameters that can be adjusted before performing a search

Auto-tuning is performed on the runtime parameters. The parameters that are adjusted are:

key Index class runtime parameter comments
IVF*, IMI2x* IndexIVF* nprobe the main parameter to adjust the speed-precision tradeoff
IMI2x* IndexIVF max_codes useful for the IMI, that often has unbalanced inverted lists
PQ* IndexIVFPQ, IndexPQ ht Hamming threshold for polysemous
PQ*+* IndexIVFPQR k_factor determines how many result vectors are verified

The auto-tuning explores the speed-accuracy space and keeps optimal points.

The AutoTuneCriterion object

The AutoTuneCriterion contains ground-truth results for the search and evaluates search results. It returns a performance measure between 0 and 1. Currently there are implementations for 1-recall@R and R-recall@R (aka. intersection) criterions.

The OperatingPoints object

The object stores all operating points as (performance, search time, parameter_set_id) tuples and selects the operating points that are optimal. An operating point is optimal if there is no other operating point that is faster to get at least the same performance.

The ParameterSpace object

The ParameterSpace object scans a given index for tunable parameters and builds a table of possible values for each parameter.

The parameter space is exponentially large with the number of parameters. We explore it in a random order.

We apply a heuristic to prune the exploration space. For each parameter, a higher value is slower and more accurate. Therefore, given two tuples of parameters a, b with their partial order, if

a >= b

then a is slower and more accurate than b. Therefore, before evaluating a set of parameters a we check if there exists (b, c) st. a >= b and c >= a. If the (perf_c, time_b) is not an optimal operating point, then we skip the experiment because it is guaranteed to produce a non-optimal operating point. This quickly prunes the set of experiments to realize.

A set of parameters is expressed as a human-readable string ("nprobe=2,ht=52") and an index (cno).

Example usage

See faiss/tests/demo_sift1M.cpp for an example usage of auto-tuning. The output reads like:

  0/500: cno=0 nprobe=1,ht=2 bounds [perf<=1.000 t>=0.000]  perf 0.000 t 0.061
  1/500: cno=779 nprobe=2048,ht=256 bounds [perf<=1.000 t>=0.061]  perf 0.627 t 16.697 *
  ...
  7/500: cno=247 nprobe=128,ht=42 bounds [perf<=0.000 t>=0.270] skip
  8/500: cno=664 nprobe=16,ht=112 bounds [perf<=0.282 t>=0.064]  perf 0.017 t 0.150 *
  9/500: cno=555 nprobe=8,ht=94 bounds [perf<=0.017 t>=0.064]  perf 0.000 t 0.092
  10/500: cno=292 nprobe=16,ht=50 bounds [perf<=0.000 t>=0.061] skip
  ...

This means that experiment 8 tests parameter combination 247 which sets nprobe to 128 and the Hamming threshold to 42. Previous experiments show that the performance is below 0.282 and the runtime above 0.064 s. The result performance (1-recall at 1 in this case) is 0.017 obtained in 0.150s. The star at the end of the line indicates that the operating point is optimal.

The output result is the list of operating points that are optimal:

cno=720 key=nprobe=1,ht=122 perf=0.0819 t=0.046
cno=710 key=nprobe=4,ht=120 perf=0.0895 t=0.059
cno=732 key=nprobe=1,ht=124 perf=0.1139 t=0.068
cno=744 key=nprobe=1,ht=126 perf=0.1488 t=0.069
cno=734 key=nprobe=4,ht=124 perf=0.1775 t=0.080
cno=746 key=nprobe=4,ht=126 perf=0.2264 t=0.089
cno=770 key=nprobe=4,ht=256 perf=0.5010 t=0.095
cno=771 key=nprobe=8,ht=256 perf=0.5564 t=0.156
cno=772 key=nprobe=16,ht=256 perf=0.5936 t=0.197
cno=774 key=nprobe=64,ht=256 perf=0.6226 t=0.626
cno=776 key=nprobe=256,ht=256 perf=0.6267 t=2.246
cno=778 key=nprobe=1024,ht=256 perf=0.6268 t=8.601

The string "nprobe=4,ht=126" can be used to set the parameters on the index.

Reliability of the optimization

There are 3 parameters that impact the reliability of auto-tuning (or manual tuning, btw):

  • sensitivity of the query set and the criterion. If it is too easy (eg. R@1 on Twitter Glove) or too hard (eg. the random dataset), no useful signal will come from the tuning.

  • the dataset should have enough query points, at least 1000, preferably 10000, to be sensitive enough to the parameter settings.

  • reliability of the timings. Times are measured in wall-clock time. This is most reliable on an unoccupied GPU or 1 thread of CPU. It is less reliable in multi-threading. The smaller the dataset, the less reliable the timing. Multi-thread measurements below 1s are not reliable.

Clone this wiki locally