Faster Memory._add_to_vector_store #1888
Kowalskiexe
started this conversation in
Ideas
Replies: 1 comment
-
Great suggestion @Kowalskiexe. I will look into this |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I'm currently working on a chat bot so response times are crucial.
I've been looking into the source code of mem0 and I'm curious about this code snippet:
In short, using an LLM it decides what to do with newly extracted memories and assigns them actions. In my application this snippet can easily take over 3 seconds.
My question / idea is: Since response time from LLMs is directly proportional to the number of generated tokens, wouldn't this task be accomplished way faster if we were to make a separate, concurrent LLM call per every extracted memory? As far as I know they don't need to be processed sequentially.
Such approach would consume more input tokens but the time necessary to run this would be capped by the slowest LLM call which would be way faster than the LLM call being made currently, as it would consist of far less output tokens.
Beta Was this translation helpful? Give feedback.
All reactions