-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sharktank] Evaluation - Add Perplexity test for vmfb #306
Conversation
…tform into perplexity-vmfb
…tform into perplexity-vmfb
…tform into perplexity-vmfb
…tform into perplexity-vmfb
token_batch, | ||
self.seq_lens_batch, | ||
seq_block_ids, | ||
self.batch.cache_state[0].to(torch.float16), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@archana-ramalingam, I think this may make a copy.
First the .to(...) may make a copy.
Then before the making the actual call to the IREE module function it will make a copy to the device if we are not targeting the CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would need to first explicitly make a iree.runtime.DeviceArray
and then copy back to the cache state after the call.
I have also added some IREE related functionality here. We should unify this aspect to reduce code duplication.
Add Perplexity test for vmfb