Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mieb] Fix siglip bug & add retrieval datasets #1424

Merged
merged 5 commits into from
Nov 10, 2024

Conversation

gowitheflow-1998
Copy link
Contributor

@gowitheflow-1998 gowitheflow-1998 commented Nov 10, 2024

  1. fixed SigLIP text tokenizer bug. padding=True->padding="max_length". Turns out this led to significant performance drop in zero-shot classification and retrieval that has texts (comparing the results here and https://github.com/embeddings-benchmark/tmp, e.g., MNISTZeroShot ~10 accuracy -> 80+ accuracy). Reference: https://github.com/huggingface/transformers/blob/v4.46.2/src/transformers/pipelines/zero_shot_image_classification.py#L147

  2. Remove normalization in SigLIP single-modality encoding which constantly hurts linear probing performance (comparing CIFAR and MNIST results here and https://github.com/embeddings-benchmark/tmp).

  3. Fix [mieb] google/siglip-large-patch16-384 fails on ImageNet10Clustering #1417

  4. added EDIS and GLV-v2 I2I retrieval.

cc @Muennighoff

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Copy link
Collaborator

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I do wonder if points 1 and 2 should be applied to any other models 🤔

@gowitheflow-1998
Copy link
Contributor Author

gowitheflow-1998 commented Nov 10, 2024

Nice! I do wonder if points 1 and 2 should be applied to any other models 🤔

looks like the tokenizer thing a siglip only thing! https://github.com/huggingface/transformers/blob/v4.46.2/src/transformers/pipelines/zero_shot_image_classification.py#L147

re normalization, looks like all other models by deafult do not normalize embeddings in single-modality encoding in our implementation, although we can potentially investigate normalizing them before fusion I think!

@gowitheflow-1998 gowitheflow-1998 merged commit f60465a into mieb Nov 10, 2024
10 checks passed
@gowitheflow-1998 gowitheflow-1998 deleted the fix-siglip-add-datasets branch November 10, 2024 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants