Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite documentation about how to convert a safetensor file and add a model to the model zoo #3521

Open
ThiloteE opened this issue Nov 15, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@ThiloteE
Copy link
Contributor

ThiloteE commented Nov 15, 2024

How to reproduce the problem:

  1. Check MTEB Leaderboard for Embedding models. See that there are models not yet supported in djl. Plan to convert models.

  2. Realize that models are in .safetensor format.

  3. Follow the documentation at

  4. Here is an example for a model that was converted to be compatible with djl and was added to the model zoo already. On Windows, this embedding model is stored in this path: C:\Users\USER\.djl.ai\cache\repo\model\nlp\text_embedding\ai\djl\huggingface\pytorch\sentence-transformers\all-MiniLM-L12-v2\0.0.1\all-MiniLM-L12-v2

    image

  5. Why is the documentation about converting a model under "tokenizers"? Why is there no section about "how to convert a model into djl format" or something like it? Tokenizers could be a sub-section of the convert page. Everybody understands what "converting", means, as it is a word that is used in everyday language, but not everybody understands what a tokenizer is and would need to read up on that. Same goes for "NLP support".

  6. Why is there no link in the "uploading to the model zoo" section about how to convert the model into the right format?

  7. .safetensors are the de-facto industry standard on huggingface. There is no mention about them anywhere. Can I or can I not convert .safetensor files? If so, how?

  8. In the documentation, there is no example shown how djl model file(s) actually look like. There is no example for a successfully converted model. I want to know what files are necessary as input and what are the files that will be the output.

  9. In the screenshot below .params files are mentioned. I have not seen any .params file anywhere ever. How can I create it with DJL? If a conversion is necessary to end up with them, where can I find the conversion tool?
    image

  10. I used the djl-huggingface-model-converter on https://huggingface.co/HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1 and am now having the following model files:
    image
    What should I do with them? Is there a link that I can upload the models to?

What I am asking for

Please rewrite the documentation and use easier language so that users that are not very good at java coding can understand it.
Who will benefit from this enhancement? - Users that want to convert a model that is hosted on huggingface or somewhere else and want to add the model to the djl model zoo. I want to add the model to the publicly hosted model zoo so that everybody can use the model and not just me on my local machine. Maybe also point to a pull-request that can serve as an example or create a video about how to do it.

@ThiloteE ThiloteE added the enhancement New feature or request label Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant