Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Implement Image Loading Function for Image Search and CLIP Support #3152

Open
mingshl opened this issue Oct 23, 2024 · 2 comments
Open
Assignees
Labels
enhancement New feature or request untriaged

Comments

@mingshl
Copy link
Collaborator

mingshl commented Oct 23, 2024

Is your feature request related to a problem?
To support CLIP model and image search, We need to implement a function in the Connector level that can load images from URLs or file path similar to using PIL (Python Imaging Library).

This function should support image search capabilities and be compatible with CLIP (Contrastive Language-Image Pre-training) for advanced image-text understanding

What solution would you like?
Similar to:

from PIL import Image
import requests

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)


with the image loading, we can use the image as model input for clip model to execute prediction

inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image  # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1)  # we can take the softmax to get the label probabilities

Objectives:

  • Create a function that takes a URL as input and returns a PIL Image object.
  • Ensure the function can handle various image formats (JPEG, PNG, etc.).
  • Implement error handling for invalid URLs or unsupported image types.
  • Optimize the function for performance, considering potential high-volume usage in image search scenarios.
  • Ensure compatibility with CLIP for further processing and analysis.

Acceptance Criteria:

  • The function successfully loads images from valid URLs.
  • It properly handles errors for invalid URLs or unsupported image types.
  • The loaded images are compatible with our image search pipeline.
  • The function's output can be directly used with CLIP models.
  • Performance tests show the function can handle high-volume requests efficiently.
  • Code is well-documented and follows our coding standards.
  • Unit tests are implemented to cover various scenarios (successful loads, error cases, etc.).

Related issue
##3054

@mingshl mingshl added enhancement New feature or request untriaged labels Oct 23, 2024
@mingshl mingshl assigned mingshl and unassigned mingshl Oct 23, 2024
@mingshl
Copy link
Collaborator Author

mingshl commented Oct 23, 2024

There is an implemented method in connector level toString() method which will convert list/map and other data type to String. This feature can call loadImage(). Please see this PR as reference #2871

@brianf-aws
Copy link
Contributor

Hi, this looks interesting could I be assigned this please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request untriaged
Projects
None yet
Development

No branches or pull requests

2 participants