You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem?
To support CLIP model and image search, We need to implement a function in the Connector level that can load images from URLs or file path similar to using PIL (Python Imaging Library).
This function should support image search capabilities and be compatible with CLIP (Contrastive Language-Image Pre-training) for advanced image-text understanding
What solution would you like?
Similar to:
from PIL import Image
import requests
from transformers import CLIPProcessor, CLIPModel
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
with the image loading, we can use the image as model input for clip model to execute prediction
inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
Objectives:
Create a function that takes a URL as input and returns a PIL Image object.
Ensure the function can handle various image formats (JPEG, PNG, etc.).
Implement error handling for invalid URLs or unsupported image types.
Optimize the function for performance, considering potential high-volume usage in image search scenarios.
Ensure compatibility with CLIP for further processing and analysis.
Acceptance Criteria:
The function successfully loads images from valid URLs.
It properly handles errors for invalid URLs or unsupported image types.
The loaded images are compatible with our image search pipeline.
The function's output can be directly used with CLIP models.
Performance tests show the function can handle high-volume requests efficiently.
Code is well-documented and follows our coding standards.
Unit tests are implemented to cover various scenarios (successful loads, error cases, etc.).
There is an implemented method in connector level toString() method which will convert list/map and other data type to String. This feature can call loadImage(). Please see this PR as reference #2871
Is your feature request related to a problem?
To support CLIP model and image search, We need to implement a function in the Connector level that can load images from URLs or file path similar to using PIL (Python Imaging Library).
This function should support image search capabilities and be compatible with CLIP (Contrastive Language-Image Pre-training) for advanced image-text understanding
What solution would you like?
Similar to:
with the image loading, we can use the image as model input for clip model to execute prediction
Objectives:
Acceptance Criteria:
Related issue
##3054
The text was updated successfully, but these errors were encountered: