Rewrite documentation about how to convert a safetensor file and add a model to the model zoo #3521

ThiloteE · 2024-11-15T16:14:26Z

How to reproduce the problem:

Check MTEB Leaderboard for Embedding models. See that there are models not yet supported in djl. Plan to convert models.
Realize that models are in .safetensor format.
Follow the documentation at
- https://docs.djl.ai/master/extensions/tokenizers/index.html#use-djl-huggingface-model-converter
- https://docs.djl.ai/master/docs/development/add_model_to_model-zoo.html
Here is an example for a model that was converted to be compatible with djl and was added to the model zoo already. On Windows, this embedding model is stored in this path: C:\Users\USER\.djl.ai\cache\repo\model\nlp\text_embedding\ai\djl\huggingface\pytorch\sentence-transformers\all-MiniLM-L12-v2\0.0.1\all-MiniLM-L12-v2
Why is the documentation about converting a model under "tokenizers"? Why is there no section about "how to convert a model into djl format" or something like it? Tokenizers could be a sub-section of the convert page. Everybody understands what "converting", means, as it is a word that is used in everyday language, but not everybody understands what a tokenizer is and would need to read up on that. Same goes for "NLP support".
Why is there no link in the "uploading to the model zoo" section about how to convert the model into the right format?
.safetensors are the de-facto industry standard on huggingface. There is no mention about them anywhere. Can I or can I not convert .safetensor files? If so, how?
In the documentation, there is no example shown how djl model file(s) actually look like. There is no example for a successfully converted model. I want to know what files are necessary as input and what are the files that will be the output.
In the screenshot below .params files are mentioned. I have not seen any .params file anywhere ever. How can I create it with DJL? If a conversion is necessary to end up with them, where can I find the conversion tool?
I used the djl-huggingface-model-converter on https://huggingface.co/HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1 and am now having the following model files:

What should I do with them? Is there a link that I can upload the models to?

What I am asking for

Please rewrite the documentation and use easier language so that users that are not very good at java coding can understand it.
Who will benefit from this enhancement? - Users that want to convert a model that is hosted on huggingface or somewhere else and want to add the model to the djl model zoo. I want to add the model to the publicly hosted model zoo so that everybody can use the model and not just me on my local machine. Maybe also point to a pull-request that can serve as an example or create a video about how to do it.

The text was updated successfully, but these errors were encountered:

ThiloteE added the enhancement New feature or request label Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite documentation about how to convert a safetensor file and add a model to the model zoo #3521

Rewrite documentation about how to convert a safetensor file and add a model to the model zoo #3521

ThiloteE commented Nov 15, 2024 •

edited

Loading

Rewrite documentation about how to convert a safetensor file and add a model to the model zoo #3521

Rewrite documentation about how to convert a safetensor file and add a model to the model zoo #3521

Comments

ThiloteE commented Nov 15, 2024 • edited Loading

How to reproduce the problem:

What I am asking for

ThiloteE commented Nov 15, 2024 •

edited

Loading