-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix difference of LLM export for the direct vs paged cache #347
Fix difference of LLM export for the direct vs paged cache #347
Conversation
thanks for doing this! is this ready to merge? Would so love to have it in main asap - blocked by this and currently using some hacky solutions. (Please make sure this works for |
rn when i try to run this i get
|
might need to manually test this because this file isn't exercised by the CI |
@renxida thank you for catching that. No matter how small of a change, I can always make a mistake. After the fix I tested the direct cache path also. |
Before work on unifying the cache interfaces there are some differences between sharded, direct and paged caches. The direct cache uses a list of tensors for each transformer block while paged cache has on slab and paged sharded expect a list of shards.
e982607
to
62a037d
Compare
Ack export works but now compile doesn't Saving to '/home/xidaren2/xshortfin/goldens/exported_llama_model/model.mlir'
|
@renxida this is the issue I'm working on |
Before work on unifying the cache interfaces there are some differences between sharded, direct and paged caches.
The direct cache uses a list of tensors for each transformer block while paged cache has on slab and paged sharded expect a list of shards.