Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HuggingFace trainer #108

Open
lolofo opened this issue Jan 16, 2024 · 9 comments
Open

HuggingFace trainer #108

lolofo opened this issue Jan 16, 2024 · 9 comments

Comments

@lolofo
Copy link

lolofo commented Jan 16, 2024

Hello,

I'm trying to fine-tune the mamba model with a huggingface trainer but I'm facing an issue :
AttributeError: 'MambaConfig' object has no attribute 'to_json_string'

This is due to the fact that the MambaConfig does not follow the classical huggingface format for the configurations.
The MambaConfig is a dataclass which is at the origin of this error.

@jyegerlehner
Copy link

Here is an example script that trains Mamba with huggingface transformer library.

Haven't tried it yet or looked closely, but I'd speculate they side-step the issue by overriding the save_model method in their MambaTrainer subclass of Trainer.

@lolofo
Copy link
Author

lolofo commented Jan 16, 2024

Thank you I have implemented a very similar trainer ... I'll have a very close look to this one, thank you !

@jyegerlehner
Copy link

One other thing: it is fairly trivial to extend MambaConfig to add a to_json_string method. This PR includes that change.

@lolofo
Copy link
Author

lolofo commented Jan 20, 2024

I have added the method to the config, and it works thank you !

@RonanKMcGovern
Copy link

RonanKMcGovern commented Jan 31, 2024

Any tips on how to reduce VRAM requirements?

I'm training the 2.8B Mamba and I'm oom on 16k context on an A100 80GB. Batch size of 1.

I guess the huggingface trainer is probably materializing h in high bandwidth GPU memory, and that has to be stored for each input token for the current layer that is being worked on? So that's what's making memory requirements high...?

@lqf0624
Copy link

lqf0624 commented Mar 11, 2024

Any tips on how to reduce VRAM requirements?

I'm training the 2.8B Mamba and I'm oom on 16k context on an A100 80GB. Batch size of 1.

I guess the huggingface trainer is probably materializing h in high bandwidth GPU memory, and that has to be stored for each input token for the current layer that is being worked on? So that's what's making memory requirements high...?

I'm not focus on NLP tasks with mamba, but i encountered the problem too. I don't know how it works but i reinstalled the mamba-ssm library and the VRAM has been decreased so much. With the same training settings, it may only need about 15GB VRAM, which would rise OOM before the lib is installed. I don't know if it can help you, just give you an idea to solve it.

@RonanKMcGovern
Copy link

RonanKMcGovern commented Mar 11, 2024 via email

@lurchyy
Copy link

lurchyy commented Jun 6, 2024

@lolofo how exactly did you add the method to the config?

@lolofo
Copy link
Author

lolofo commented Jun 6, 2024

@lurchyy I did something like this

class MambaCustomConfig(MambaConfig):
""" custom config to make the model run with HF Trainer """
def to_json_string(self,):
return json.dumps(
{
"d_model" : int(self.d_model),
"n_layer" : int(self.n_layer),
"vocab_size" : int(self.vocab_size),
"ssm_config" : self.ssm_cfg,
"rms_norm" : self.rms_norm,
"residual_in_fp32" : self.residual_in_fp32,
"fused_add_norm" : self.fused_add_norm,
"pad_vocab_size_multiple" : self.pad_vocab_size_multiple
}
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants