-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better HF integration for MambaLMHeadModel
#471
base: main
Are you sure you want to change the base?
Conversation
MambaLMHeadModel
@tridao @NielsRogge I've update this PR + the description following your instructions. It is now ready to be reviewed. Only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks like a nice clean-up. Have you tested pushing and pulling a Mamba model to/from the hub (to ensure it works fine)?
Yes I did: |
This PR adds https://github.com/state-spaces/mamba as an official library on the Hub. Let's wait for Mamba integration to be merged (coming soon) + having a few first models uploaded on the Hub. cc @osanseviero @NielsRogge @tridao Related PR on Mamba side: state-spaces/mamba#471
Hi @tridao and team,
This is a follow-up PR after #469 and in particular #469 (comment). This PR:
huggingface_hub
as an explicit dependency. It is already a dependency sincemamba_ssm
depends ontransformers
which depends onhuggingface_hub
but it's better to be explicit. I pinned a recent version that contains all recentPyTochModelHubMixin
updates and fixes.PyTorchModelHubMixin
fromMamba2
layer (introduced in Add HF integration, better discoverability #469)PyTorchModelHubMixin
toMambaLMHeadModel
for a better HF integration. I removed the existingfrom_pretrained
/save_pretrained
that were previously implemented. They still exists thanks to the mixin, and in a more robust way. The mixin also adds apush_to_hub
method to directly save a model and push it to the Hub. All three helpers supports safetensors (if installed on the users's machine) and parameters likecache_dir
/token
/revision
/etc. that can prove useful to users.When doing
MambaLMHeadModel(...).push_to_hub("username/my-cool-mamba")
, a model card will be automatically created with some metadata in it (see docs). This improves a lot the UX on the Hub: better discoverability, better documentation, etc. In particular, I have added:library_name: mamba-ssm
. I've opened a PR on our side to makemamba-ssm
recognized as a library by the HF Hub (see Add MambaSSM as a library huggingface/huggingface.js#802). Users landing on amamba_ssm
model will automatically get a code snippet on how to instantiate the model + link to the mamba repo + download count enabledhttps://github.com/state-spaces/mamba
as "repo_url" => all model cards will have a sentence "this model have been pushed using https://github.com/state-spaces/mamba" => better for documentationarXiv:2312.00752
andarXiv:2405.21060
as tags so that your papers will be automatically linked to all Mamba models uploaded on the Hub => better for referencingpipeline_tag: text-generation
=> models will appear when users filter models by task on the hub (https://huggingface.co/models?pipeline_tag=text-generation&sort=trending)In parallel of this PR, it would be good to update existing models to add metadata as well. I opened 3 PRs to showcase what should be updated:
I have a script to open such a PR on other models. Let me know what you think and if you validate, I'll proceed with the others.