Caching previous prompts for later reuse, part 1 #3073

Signed-off-by: Jared Van Bortel <[email protected]>

The value of n_past is no longer used here, so remove the parameter and the logic that tries to maintain it. Signed-off-by: Jared Van Bortel <[email protected]>

Signed-off-by: Jared Van Bortel <[email protected]>

These were taking up too many lines, were too repetitive, and weren't marked [[noreturn]] even though they all throw unconditionally. Signed-off-by: Jared Van Bortel <[email protected]>

Writing to these directly behind the implementation's back is ill-defined. Trying to write internal save/restore for `tokens` hurts my brain when the chat UI is trying to manage this directly. Signed-off-by: Jared Van Bortel <[email protected]>

If n_past is smaller in a successive call to prompt() but the input shares a common prefix with the token cache after n_past, we can reuse the tokens that are already in the KV cache. Signed-off-by: Jared Van Bortel <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching previous prompts for later reuse, part 1 #3073

Caching previous prompts for later reuse, part 1 #3073

Commits on Oct 10, 2024

Caching previous prompts for later reuse, part 1 #3073

Are you sure you want to change the base?

Caching previous prompts for later reuse, part 1 #3073

Commits on Oct 10, 2024