-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HuggingFace trainer #108
Comments
Here is an example script that trains Mamba with huggingface transformer library. Haven't tried it yet or looked closely, but I'd speculate they side-step the issue by overriding the save_model method in their MambaTrainer subclass of Trainer. |
Thank you I have implemented a very similar trainer ... I'll have a very close look to this one, thank you ! |
One other thing: it is fairly trivial to extend MambaConfig to add a to_json_string method. This PR includes that change. |
I have added the method to the config, and it works thank you ! |
Any tips on how to reduce VRAM requirements? I'm training the 2.8B Mamba and I'm oom on 16k context on an A100 80GB. Batch size of 1. I guess the huggingface trainer is probably materializing |
I'm not focus on NLP tasks with mamba, but i encountered the problem too. I don't know how it works but i reinstalled the mamba-ssm library and the VRAM has been decreased so much. With the same training settings, it may only need about 15GB VRAM, which would rise OOM before the lib is installed. I don't know if it can help you, just give you an idea to solve it. |
Thanks appreciate that, yeah I'll try that next time - reinstalling
mamba-ssm
…On Mon, Mar 11, 2024 at 11:32 AM lqf0624 ***@***.***> wrote:
Any tips on how to reduce VRAM requirements?
I'm training the 2.8B Mamba and I'm oom on 16k context on an A100 80GB.
Batch size of 1.
I guess the huggingface trainer is probably materializing h in high
bandwidth GPU memory, and that has to be stored for each input token for
the current layer that is being worked on? So that's what's making memory
requirements high...?
I'm not focus on NLP tasks with mamba, but i encountered the problem too.
I don't know how it works but i reinstalled the mamba-ssm library and the
VRAM has been decreased so much. With the same training settings, it may
only need about 15GB VRAM, which would rise OOM before the lib is
installed. I don't know if it can help you, just give you an idea to solve
it.
—
Reply to this email directly, view it on GitHub
<#108 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASVG6CXJTM7LU6L43RIMT5DYXWI5PAVCNFSM6AAAAABB5DGYT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBYGIZTCNRRGY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@lolofo how exactly did you add the method to the config? |
@lurchyy I did something like this class MambaCustomConfig(MambaConfig): |
Hello,
I'm trying to fine-tune the mamba model with a huggingface trainer but I'm facing an issue :
AttributeError: 'MambaConfig' object has no attribute 'to_json_string'
This is due to the fact that the MambaConfig does not follow the classical huggingface format for the configurations.
The MambaConfig is a dataclass which is at the origin of this error.
The text was updated successfully, but these errors were encountered: