Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sharktank] Export Attention IRs for LLMs #175

Merged
merged 25 commits into from
Oct 4, 2024

Conversation

archana-ramalingam
Copy link
Collaborator

Export Attention IRs for paged causal and non causal LLMs.

@sogartar
Copy link
Contributor

sogartar commented Oct 2, 2024

We can't use directly torch ops with our tensor types.
https://github.com/nod-ai/SHARK-Platform/actions/runs/11148274191/job/30984526213?pr=175#step:6:123
In this example torch.concat would need to be substituted with sharktank.ops.cat.
At some instances we may be lacking the equivalent op or it may have no implementation for certain tensor types, like ReplicatedTensor. In this case we have to add it. For some tensor types this may be difficult, but in most cases it is easy. I have been meaning to make a general implementation when all tensors are replicated across devices, but have not gotten to it. I will add the cat overload for the replicated tensor.

@archana-ramalingam archana-ramalingam merged commit 5300b05 into main Oct 4, 2024
8 checks passed
@archana-ramalingam archana-ramalingam deleted the attention_microbenchmark branch October 4, 2024 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants