Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better HF integration for MambaLMHeadModel #471

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Wauplin
Copy link

@Wauplin Wauplin commented Jul 16, 2024

Hi @tridao and team,

This is a follow-up PR after #469 and in particular #469 (comment). This PR:

  • adds huggingface_hub as an explicit dependency. It is already a dependency since mamba_ssm depends on transformers which depends on huggingface_hub but it's better to be explicit. I pinned a recent version that contains all recent PyTochModelHubMixin updates and fixes.
  • removes PyTorchModelHubMixin from Mamba2 layer (introduced in Add HF integration, better discoverability #469)
  • adds PyTorchModelHubMixin to MambaLMHeadModel for a better HF integration. I removed the existing from_pretrained / save_pretrained that were previously implemented. They still exists thanks to the mixin, and in a more robust way. The mixin also adds a push_to_hub method to directly save a model and push it to the Hub. All three helpers supports safetensors (if installed on the users's machine) and parameters like cache_dir/token/revision/etc. that can prove useful to users.
from mamba_ssm import MambaLMHeadModel

model = MambaLMHeadModel.from_pretrained("state-spaces/mamba2-130m")

(...)

model.push_to_hub("my-finetuned-mamba")

When doing MambaLMHeadModel(...).push_to_hub("username/my-cool-mamba"), a model card will be automatically created with some metadata in it (see docs). This improves a lot the UX on the Hub: better discoverability, better documentation, etc. In particular, I have added:

  • library_name: mamba-ssm. I've opened a PR on our side to make mamba-ssm recognized as a library by the HF Hub (see Add MambaSSM as a library huggingface/huggingface.js#802). Users landing on a mamba_ssm model will automatically get a code snippet on how to instantiate the model + link to the mamba repo + download count enabled
  • https://github.com/state-spaces/mamba as "repo_url" => all model cards will have a sentence "this model have been pushed using https://github.com/state-spaces/mamba" => better for documentation
  • arXiv:2312.00752 and arXiv:2405.21060 as tags so that your papers will be automatically linked to all Mamba models uploaded on the Hub => better for referencing
  • pipeline_tag: text-generation => models will appear when users filter models by task on the hub (https://huggingface.co/models?pipeline_tag=text-generation&sort=trending)

In parallel of this PR, it would be good to update existing models to add metadata as well. I opened 3 PRs to showcase what should be updated:

I have a script to open such a PR on other models. Let me know what you think and if you validate, I'll proceed with the others.

@Wauplin Wauplin marked this pull request as draft July 16, 2024 13:05
@Wauplin Wauplin changed the title Support HF integration in Mamba and Mamba2Simple + add metadata Better HF integration for MambaLMHeadModel Jul 16, 2024
@Wauplin Wauplin marked this pull request as ready for review July 16, 2024 14:41
@Wauplin
Copy link
Author

Wauplin commented Jul 16, 2024

@tridao @NielsRogge I've update this PR + the description following your instructions. It is now ready to be reviewed. Only MambaLMHeadModel inherits from PyTorchModelHubMixin in the end, making it much simpler.

@Wauplin Wauplin requested a review from NielsRogge July 16, 2024 14:49
Copy link
Contributor

@NielsRogge NielsRogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks like a nice clean-up. Have you tested pushing and pulling a Mamba model to/from the hub (to ensure it works fine)?

@Wauplin
Copy link
Author

Wauplin commented Jul 16, 2024

Yes I did:

osanseviero pushed a commit to huggingface/huggingface.js that referenced this pull request Jul 16, 2024
This PR adds https://github.com/state-spaces/mamba as an official
library on the Hub.
Let's wait for Mamba integration to be merged (coming soon) + having a
few first models uploaded on the Hub.

cc @osanseviero @NielsRogge @tridao

Related PR on Mamba side: state-spaces/mamba#471
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants