How do I evaluate openai clip image embedder so that values match your .npy files? #100

hubenjm · 2022-01-20T23:40:33Z

hubenjm
Jan 20, 2022

Can you specify the exact clip model parameters used and how they are evaluated on each image? I am trying to search for the k closest images in your 400 million image dataset to my own dataset of images using OpenAI's ViT-B/32 clip image embedder. But I cannot seem to get my self-computed image embedding outputs to match what shows up in your precomputed embeddings here: http://deploy.laion.ai/8f83b608504d46bb81708ec86e912220/embeddings/img_emb/

e.g. in part 0 the embedding at index 100 begins as:

x = np.load('img_emb_0.npy')
print(x[100,:])
[-0.003084  -0.0285    -0.02127    0.01313    0.00976   -0.0008097
  0.05478    0.04456   -0.005177  -0.05322    0.03958   -0.00241
  0.09863    0.02107   -0.008995   0.02281    0.0628     0.02582 ...]

In the corresponding parquet file I find:

df_m0 = pd.read_parquet('./metadata_0.parquet', engine='pyarrow')
df_m0.loc[100, 'url']
https://preppingsurvival.com/wp-content/uploads/2021/04/ed-food-canned-apricots-4-years-expired-how-long-does-canned-food-last-bdb2xLfDwT0sddefault-300x300.jpg

If I download this image and run the following:

model, preprocess = clip.load("ViT-B/32", device='cpu')
image = preprocess(Image.open("./ed-food-canned-apricots-4-years-expired-how-long-does-canned-food-last-bdb2xLfDwT0sddefault-300x300.jpg")).unsqueeze(0)
with torch.no_grad():
    v = model.encode_image(image)
v = v.cpu().numpy()
v = v / np.linalg.norm(v)

Then we have the first entries of v are:

[-0.01340919 -0.04178299 -0.01926079  0.02442356 -0.00797546  0.01472267
  0.05920784  0.01699294 -0.02090414 -0.03895969  0.05996456  0.01401705
  0.10676499  0.0010551   0.02233738 -0.00436341  0.02495931  0.01011568 ...]

These values do not match what was stored in the .npy file. So is there something I am doing wrong?

On a side note, it would be helpful if you gave a few lines of sample code to load and query your precomputed faiss indices.

Answered by rom1504

Jan 22, 2022

Hi, the difference you see is probably due to the resizing applied in img2dataset, we did the resizing called border in https://github.com/rom1504/img2dataset#api ; you can see the code there https://github.com/rom1504/img2dataset/blob/main/img2dataset/resizer.py#L98

here is how to query the index locally https://github.com/rom1504/clip-retrieval/blob/main/notebook/simple_filter.ipynb

you can also query our backend directly with https://colab.research.google.com/drive/1d234Gp_7xGI5pAQ0dE71LT4rklZS_OsK#scrollTo=Xp3EBMHsMf6n

View full answer

rom1504 · 2022-01-22T16:07:47Z

rom1504
Jan 22, 2022
Maintainer

Hi, the difference you see is probably due to the resizing applied in img2dataset, we did the resizing called border in https://github.com/rom1504/img2dataset#api ; you can see the code there https://github.com/rom1504/img2dataset/blob/main/img2dataset/resizer.py#L98

here is how to query the index locally https://github.com/rom1504/clip-retrieval/blob/main/notebook/simple_filter.ipynb

you can also query our backend directly with https://colab.research.google.com/drive/1d234Gp_7xGI5pAQ0dE71LT4rklZS_OsK#scrollTo=Xp3EBMHsMf6n

3 replies

hubenjm Feb 2, 2022
Author

Thanks for your suggestion. I did end up trying to match the process you mention above, by using the resizer class to first resize to 256 and then using the clip transform to downsize to 224. This reduced the norm difference for the example above down to about 0.1, which is closer, so maybe the difference is just due to some variations in my image decoder library versions.

Regardless, I ended up using your img2dataset library to download most all of laion400m myself, and then I am using clip-retrieval inference script to generate the embeddings myself. Was very helpful to have such code available 👍

rom1504 Feb 2, 2022
Maintainer

I'm glad this was helpful!
I'm curious, what are you building?

hubenjm Feb 2, 2022
Author

I am working on an image similarity problem and am using this laion dataset to mine for potentially helpful hard negatives. But I may also experiment with training a clip model on this dataset in the future.

hubenjm · 2022-02-02T22:37:24Z

hubenjm
Feb 2, 2022
Author

One more question I had pertains to the clip-retrieval inference script in webdataset mode. It seems when setting a cache_path, the files created there by the webdataset object are never deleted once they are finished being processed. It would be nice if there was some kind of cleanup step that was able to tell once all images from a given .tar file have been processed and then delete the corresponding file from the cache folder. This is only an issue if trying to evaluate a really large dataset all at once, e.g. 100+ million images, depending on size of hard drive. But maybe this is already handled using some other function arguments that I'm missing?

1 reply

rom1504 Feb 2, 2022
Maintainer

yeah indeed I noticed that as well. What I did was simply not to use the webdataset cache (cache_path=None)
However if you want to use it, you could consider changing the value of cache_size there https://github.com/rom1504/clip-retrieval/blob/main/clip_retrieval/clip_inference.py#L128 so it keeps a smaller cache (here 10GB per default)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I evaluate openai clip image embedder so that values match your .npy files? #100

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How do I evaluate openai clip image embedder so that values match your .npy files? #100

hubenjm Jan 20, 2022

Replies: 2 comments · 4 replies

rom1504 Jan 22, 2022 Maintainer

hubenjm Feb 2, 2022 Author

rom1504 Feb 2, 2022 Maintainer

hubenjm Feb 2, 2022 Author

hubenjm Feb 2, 2022 Author

rom1504 Feb 2, 2022 Maintainer

hubenjm
Jan 20, 2022

Replies: 2 comments 4 replies

rom1504
Jan 22, 2022
Maintainer

hubenjm Feb 2, 2022
Author

rom1504 Feb 2, 2022
Maintainer

hubenjm Feb 2, 2022
Author

hubenjm
Feb 2, 2022
Author

rom1504 Feb 2, 2022
Maintainer