Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: tensor parallelism support for bnb quantization (via IBM's fork) #767

Open
BlairSadewitz opened this issue Sep 28, 2024 · 3 comments

Comments

@BlairSadewitz
Copy link

BlairSadewitz commented Sep 28, 2024

🚀 The feature, motivation and pitch

I don't know if it's feasible or worthwhile to merge this, as maybe the trees are too divergent, etc., but cherry-picking commits for projects I don't fully understand is somehow a pastime for me, so ...

Alternatives

I could always use one of the other 8.4234234*10^23 quantization methods, but, hey, variety is the spice of life--or something.

Additional context

It doesn't work for pre-quantized models. 🎉~

@AlpinDale
Copy link
Member

Perhaps, I'll have to look into it. bnb hasn't been a priority

@BlairSadewitz
Copy link
Author

Yeah, I hear you. I'm gonna file a better PR in a second, though, so ... ;-)

@AlpinDale
Copy link
Member

FYI I'm working on new kernels for massively speeding up bnb quants + add TP support for them. You might want to hold on for now, or help out with that upcoming PR if you're comfortable with CUDA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants