Skip to content

Use of Huggingface Transformers for Embeddings trades inference speed for embedding speed? #251

Closed Answered by imartinez
andrewginns asked this question in Q&A
Discussion options

You must be logged in to vote

It should not affect the speed of the LLM directly.
A change on the size of the returned pieces of context from the embeddings could affect it indirectly: the longer the prompt, the longer it takes for the LLM to process and respond. Maybe with the new embeddings we are generating slightly bigger prompts. You could adjust the chunk and overlap size in ingest.py file and test it out. Also, you could reduce the number of sources from the default (4) to, for example, 2; that'd should a big impact on the overall speed

Replies: 3 comments 2 replies

Comment options

You must be logged in to vote
2 replies
@andrewginns
Comment options

@andrewginns
Comment options

Answer selected by andrewginns
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants