A Python base cli tool for tagging images with joy-caption-pre-alpha models.
I make this repo because I want to caption some images cross-platform (On My old MBP, my game win pc or docker base linux cloud-server(like Google colab))
But I don't want to install a huge webui just for this little work. And some cloud-service are unfriendly to gradio base ui.
So this repo born.
Huggingface are original sources, modelscope are pure forks from Huggingface(Because HuggingFace was blocked in Some place).
Model | HuggingFace Link | ModelScope Link |
---|---|---|
joy-caption-pre-alpha | HuggingFace | ModelScope |
siglip-so400m-patch14-384(Google) | HuggingFace | ModelScope |
Meta-Llama-3.1-8B | HuggingFace | ModelScope |
make a simple ui by Jupyter widget(When my lazy cancer cured😊)
Python 3.10 works fine.
Open a shell terminal and follow below steps:
# Clone this repo
git clone https://github.com/fireicewolf/joy-caption-cli.git
cd joy-caption-cli
# create a Python venv
python -m venv .venv
.\venv\Scripts\activate
# Install torch
# Install torch base on your GPU driver. ex.
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
# Base dependencies, models for inference will download via python request libs.
pip install -U -r requirements.txt
# If you want to download or cache model via huggingface hub, install this.
pip install -U -r huggingface-requirements.txt
# If you want to download or cache model via modelscope hub, install this.
pip install -U -r modelscope-requirements.txt
Make sure your python venv has been activated first!
python caption.py your_datasets_path
To run with more options, You can find help by run with this or see at Options
python caption.py -h
Advance options
`data_path`path for data
--recursive
Will include all support images format in your input datasets path and its sub-path.
config
config json for llava models, default is "default.json"
--model_name MODEL_NAME
model name for inference, default is "Joy-Caption-Pre-Alpha", please check configs/default.json
--model_site MODEL_SITE
Model site where onnx model download from(huggingface or modelscope), default is huggingface.
--models_save_path MODEL_SAVE_PATH
Path for models to save, default is models(under project folder).
--download_method SDK
Download models via sdk or url, default is sdk.
If huggingface hub or modelscope sdk not installed or download failed, will auto retry with url download.
--use_sdk_cache
Use huggingface or modelscope sdk cache to store models, this option need huggingface_hub or modelscope sdk installed.
If this enabled, --models_save_path
will be ignored.
--custom_caption_save_path CUSTOM_CAPTION_SAVE_PATH
Save caption files to a custom path but not with images(But keep their directory structure)
--log_level LOG_LEVEL
Log level for terminal console and log file, default is INFO
(DEBUG
,INFO
,WARNING
,ERROR
,CRITICAL
)
--save_logs
Save logs to a file, log will be saved at same level with data_dir_path
--caption_extension CAPTION_EXTENSION
Caption file extension, default is .txt
--not_overwrite
Do not overwrite caption file if it existed.
--user_prompt USER_PROMPT
user prompt for caption.
--temperature TEMPERATURE
temperature for Llama model,default is 0.5.
--max_tokens MAX_TOKENS
max tokens for output, default is 300.
Base on oy-caption-pre-alpha
Without their works(👏👏), this repo won't exist.