-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexIVFPQFastScan crashes with certain nlist values #4089
Comments
Moreover, I looked into reconstruction using import faiss
import numpy as np
n = 100
dim = 64
nlist = 256
train_vectors = np.random.random((100000, dim)).astype("float32")
ind = np.arange(n)
def run_example(IndexClass):
print("##", IndexClass.__name__)
# Create an IVF index
index = IndexClass(faiss.IndexFlatL2(dim), dim, nlist, dim // 2, 4, faiss.METRIC_L2)
index.make_direct_map(True)
index.train(train_vectors)
index.add(train_vectors)
print(
"Is reconstruction correct?",
(index.reconstruct_batch(ind) == index.sa_decode(index.sa_encode(train_vectors[ind]))).all()
)
run_example(faiss.IndexIVFPQ)
run_example(faiss.IndexIVFPQFastScan) Output:
This might be related to the issue. |
This is probably because the quantizer gets deallocated, it is missing in https://github.com/facebookresearch/faiss/blob/main/faiss/python/__init__.py#L175 for IndexIVFFastScan. Could you try def run_example(IndexClass):
# Create an IVF index
quantizer = faiss.IndexFlatL2(dim)
index = IndexClass(quantizer, dim, nlist, dim // 2, 4, faiss.METRIC_L2)
index.make_direct_map(True)
.... |
Unfortunately, still the same error. Edit: I believe this is more related to how the |
I tried to dig into it by reading the codes, and running a few experiments. I believe the main bug here is the following: void IndexIVFFastScan::reconstruct_from_offset(
int64_t list_no,
int64_t offset,
float* recons) const {
// unpack codes
InvertedLists::ScopedCodes list_codes(invlists, list_no);
std::vector<uint8_t> code(code_size, 0);
BitstringWriter bsw(code.data(), code_size);
// Possible fix: we should start by writing the encoded `list_no` to the `code` using `bsw`
for (size_t m = 0; m < M; m++) {
uint8_t c =
pq4_get_packed_element(list_codes.get(), bbs, M2, offset, m);
bsw.write(c, nbits);
}
sa_decode(1, code.data(), recons);
// add centroid to it
if (by_residual) {
std::vector<float> centroid(d);
quantizer->reconstruct(list_no, centroid.data());
for (int i = 0; i < d; ++i) {
recons[i] += centroid[i];
}
}
} We have two different ways to fix this. The first:
void IndexIVFFastScan::reconstruct_from_offset(
int64_t list_no,
int64_t offset,
float* recons) const {
InvertedLists::ScopedCodes list_codes(invlists, list_no);
std::vector<uint8_t> code(code_size, 0);
// Encode the list_no into the `code` for compatibility with the `sa_decode`
encode_listno(list_no, code.data());
// Shift the `code` to the right by `coarse_code_size()` bytes
BitstringWriter bsw(code.data() + coarse_code_size(), code_size);
for (size_t m = 0; m < M; m++) {
uint8_t c =
pq4_get_packed_element(list_codes.get(), bbs, M2, offset, m);
bsw.write(c, nbits);
}
sa_decode(1, code.data(), recons);
} The second solution is to implement the decoding in I am not sure what is better for faiss. |
IndexIVFPQFastScan crashes with certain nlist values
The
IndexIVFPQFastScan
index in the Faiss similarity search library crashes when thenlist
parameter (number of Voronoi cells) is not byte-aligned (e.g. 256, 65536). This occurs when calling eitherindex.reconstruct(i)
orindex.reconstruct_batch(ids)
after adding vectors.Platform
Reproduction Instructions
Error Message
Note: If you set
by_residual=True
in theIndexIVFPQFastScan
constructor, the error only occurs forreconstruct_batch()
, notreconstruct()
.Potential Reasons
The issue seems to stem from the values in
codes
inIndexIVFPQFastScan::sa_decode()
: here.The potential reason is that these codes are unpacked incorrectly. Which makes the
list_no
value read fromcode
invalid whennlist
is not byte-aligned, causing the bounds check'list_no >= 0 && list_no < nlist'
to fail.Additional note: The call to
decode_listno
is unnecessary whenby_residual
is false.The text was updated successfully, but these errors were encountered: