[FEATURE] Implement Image Loading Function for Image Search and CLIP Support #3152

mingshl · 2024-10-23T22:42:42Z

Is your feature request related to a problem?
To support CLIP model and image search, We need to implement a function in the Connector level that can load images from URLs or file path similar to using PIL (Python Imaging Library).

This function should support image search capabilities and be compatible with CLIP (Contrastive Language-Image Pre-training) for advanced image-text understanding

What solution would you like?
Similar to:

from PIL import Image
import requests

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

with the image loading, we can use the image as model input for clip model to execute prediction

inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image  # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1)  # we can take the softmax to get the label probabilities

Objectives:

Create a function that takes a URL as input and returns a PIL Image object.
Ensure the function can handle various image formats (JPEG, PNG, etc.).
Implement error handling for invalid URLs or unsupported image types.
Optimize the function for performance, considering potential high-volume usage in image search scenarios.
Ensure compatibility with CLIP for further processing and analysis.

Acceptance Criteria:

The function successfully loads images from valid URLs.
It properly handles errors for invalid URLs or unsupported image types.
The loaded images are compatible with our image search pipeline.
The function's output can be directly used with CLIP models.
Performance tests show the function can handle high-volume requests efficiently.
Code is well-documented and follows our coding standards.
Unit tests are implemented to cover various scenarios (successful loads, error cases, etc.).

Related issue
##3054

The text was updated successfully, but these errors were encountered:

mingshl · 2024-10-23T22:52:03Z

There is an implemented method in connector level toString() method which will convert list/map and other data type to String. This feature can call loadImage(). Please see this PR as reference #2871

brianf-aws · 2024-10-23T22:56:08Z

Hi, this looks interesting could I be assigned this please?

mingshl added enhancement New feature or request untriaged labels Oct 23, 2024

mingshl assigned mingshl and unassigned mingshl Oct 23, 2024

mingshl assigned brianf-aws Oct 23, 2024

brianf-aws mentioned this issue Oct 23, 2024

Improve image search UX opensearch-project/dashboards-flow-framework#431

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Implement Image Loading Function for Image Search and CLIP Support #3152

[FEATURE] Implement Image Loading Function for Image Search and CLIP Support #3152

mingshl commented Oct 23, 2024 •

edited

Loading

mingshl commented Oct 23, 2024 •

edited

Loading

brianf-aws commented Oct 23, 2024

[FEATURE] Implement Image Loading Function for Image Search and CLIP Support #3152

[FEATURE] Implement Image Loading Function for Image Search and CLIP Support #3152

Comments

mingshl commented Oct 23, 2024 • edited Loading

mingshl commented Oct 23, 2024 • edited Loading

brianf-aws commented Oct 23, 2024

mingshl commented Oct 23, 2024 •

edited

Loading

mingshl commented Oct 23, 2024 •

edited

Loading