Better HF integration for `MambaLMHeadModel` #471

Wauplin · 2024-07-16T08:45:16Z

Hi @tridao and team,

This is a follow-up PR after #469 and in particular #469 (comment). This PR:

adds huggingface_hub as an explicit dependency. It is already a dependency since mamba_ssm depends on transformers which depends on huggingface_hub but it's better to be explicit. I pinned a recent version that contains all recent PyTochModelHubMixin updates and fixes.
removes PyTorchModelHubMixin from Mamba2 layer (introduced in Add HF integration, better discoverability #469)
adds PyTorchModelHubMixin to MambaLMHeadModel for a better HF integration. I removed the existing from_pretrained / save_pretrained that were previously implemented. They still exists thanks to the mixin, and in a more robust way. The mixin also adds a push_to_hub method to directly save a model and push it to the Hub. All three helpers supports safetensors (if installed on the users's machine) and parameters like cache_dir/token/revision/etc. that can prove useful to users.

from mamba_ssm import MambaLMHeadModel

model = MambaLMHeadModel.from_pretrained("state-spaces/mamba2-130m")

(...)

model.push_to_hub("my-finetuned-mamba")

When doing MambaLMHeadModel(...).push_to_hub("username/my-cool-mamba"), a model card will be automatically created with some metadata in it (see docs). This improves a lot the UX on the Hub: better discoverability, better documentation, etc. In particular, I have added:

library_name: mamba-ssm. I've opened a PR on our side to make mamba-ssm recognized as a library by the HF Hub (see Add MambaSSM as a library huggingface/huggingface.js#802). Users landing on a mamba_ssm model will automatically get a code snippet on how to instantiate the model + link to the mamba repo + download count enabled
https://github.com/state-spaces/mamba as "repo_url" => all model cards will have a sentence "this model have been pushed using https://github.com/state-spaces/mamba" => better for documentation
arXiv:2312.00752 and arXiv:2405.21060 as tags so that your papers will be automatically linked to all Mamba models uploaded on the Hub => better for referencing
pipeline_tag: text-generation => models will appear when users filter models by task on the hub (https://huggingface.co/models?pipeline_tag=text-generation&sort=trending)

In parallel of this PR, it would be good to update existing models to add metadata as well. I opened 3 PRs to showcase what should be updated:

I have a script to open such a PR on other models. Let me know what you think and if you validate, I'll proceed with the others.

mamba_ssm/modules/mamba2.py

mamba_ssm/modules/mamba_simple.py

Wauplin · 2024-07-16T14:49:08Z

@tridao @NielsRogge I've update this PR + the description following your instructions. It is now ready to be reviewed. Only MambaLMHeadModel inherits from PyTorchModelHubMixin in the end, making it much simpler.

NielsRogge

Thanks, looks like a nice clean-up. Have you tested pushing and pulling a Mamba model to/from the hub (to ensure it works fine)?

Wauplin · 2024-07-16T14:56:50Z

Yes I did:

Colab: https://colab.research.google.com/drive/1jDFxLmq0uWcQQrY2vGroTHbAIkkCfVYz?usp=sharing
Pushed model: https://huggingface.co/Wauplin/my-cool-mamba

@osanseviero

This PR adds https://github.com/state-spaces/mamba as an official library on the Hub. Let's wait for Mamba integration to be merged (coming soon) + having a few first models uploaded on the Hub. cc @osanseviero @NielsRogge @tridao Related PR on Mamba side: state-spaces/mamba#471

Wauplin added 2 commits July 16, 2024 10:16

Support HF integration in Mamba and Mamba2Simple + add metadata

404df4d

Add huggingface_hub as core dependency

44458c4

Wauplin mentioned this pull request Jul 16, 2024

Add MambaSSM as a library huggingface/huggingface.js#802

Merged

NielsRogge reviewed Jul 16, 2024

View reviewed changes

mamba_ssm/modules/mamba2.py Outdated Show resolved Hide resolved

NielsRogge reviewed Jul 16, 2024

View reviewed changes

mamba_ssm/modules/mamba_simple.py Outdated Show resolved Hide resolved

NielsRogge mentioned this pull request Jul 16, 2024

Add HF integration, better discoverability #469

Merged

Wauplin mentioned this pull request Jul 16, 2024

Hot-fix: do not share tags between ModelHubMixin siblings huggingface/huggingface_hub#2394

Merged

Wauplin added 2 commits July 16, 2024 13:37

mamba-ssm as library name

961eccb

requires 0.23.5

4bd4af9

Wauplin marked this pull request as draft July 16, 2024 13:05

Wauplin added 4 commits July 16, 2024 15:05

Mixin in MambaLMHeadModel only

a157ec5

last line

bc402de

remove mixin from mamba2

0c4686f

add pipeline_tag

e536a97

Wauplin changed the title ~~Support HF integration in Mamba and Mamba2Simple + add metadata~~ Better HF integration for MambaLMHeadModel Jul 16, 2024

Wauplin marked this pull request as ready for review July 16, 2024 14:41

Wauplin requested a review from NielsRogge July 16, 2024 14:49

NielsRogge approved these changes Jul 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better HF integration for `MambaLMHeadModel` #471

Better HF integration for `MambaLMHeadModel` #471

Wauplin commented Jul 16, 2024 •

edited

Loading

Wauplin commented Jul 16, 2024

NielsRogge left a comment

Wauplin commented Jul 16, 2024

Better HF integration for MambaLMHeadModel #471

Are you sure you want to change the base?

Better HF integration for MambaLMHeadModel #471

Conversation

Wauplin commented Jul 16, 2024 • edited Loading

Wauplin commented Jul 16, 2024

NielsRogge left a comment

Choose a reason for hiding this comment

Wauplin commented Jul 16, 2024

Better HF integration for `MambaLMHeadModel` #471

Better HF integration for `MambaLMHeadModel` #471

Wauplin commented Jul 16, 2024 •

edited

Loading