Small experiment on combining CLIP with SAM to do open-vocabulary image segmentation.
The approach is to first identify all the parts of an image using SAM, and then use CLIP to find the ones that best match a specific description.
-
Download weights and place them in this repos root.
-
Install dependencies:
pip install torch opencv-python Pillow
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/facebookresearch/segment-anything.git
- Run Notebook
main.ipynb
Example output for prompt "kiwi"