Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of large custom transformer model #63

Open
Adaickalavan opened this issue May 22, 2024 · 1 comment
Open

Use of large custom transformer model #63

Adaickalavan opened this issue May 22, 2024 · 1 comment

Comments

@Adaickalavan
Copy link

Refering to the active learning for text classification example given here.

In the given example, we have:

transformer_model_name = 'bert-base-uncased'
transformer_model = TransformerModelArguments(transformer_model_name)
clf_factory = TransformerBasedClassificationFactory(
    transformer_model, 
    num_classes, 
    kwargs=dict({'device': 'cuda', 'mini_batch_size': 32, 
    'class_weight': 'balanced'}))

In my case, I would like to use the language model meta-llama/Llama-2-7b-chat-hf as a sequence classifier by calling it as

base_model = AutoModelForSequenceClassification.from_pretrained(
    pretrained_model_name_or_path="meta-llama/Llama-2-7b-chat-hf",
    num_labels=1,
)

Then, I would like to perform supervised training with active learning of the Llama sequence-classifer transformer model on the dataset Birchlabs/openai-prm800k-stepwise-critic.

Questions:

  1. How do I modify the example in the repository to get a clf_factory which uses the above base_model instead of providing TransformerModelArguments?

  2. How do I use small-text to handle the large model size of Llama and potentially distribute its training over multiple GPUs?

@chschroeder
Copy link
Contributor

chschroeder commented May 23, 2024

Hi @Adaickalavan, thank you for your interest.

  1. TransformerModelArguments is just a wrapper for the Hugging Face names/paths for model, tokenizer and config. Some models work out of the box, others need adaptations. I cannot cover this 100% since the transformers library does not impose too much restrictions on the different models, and the newest one can always deviate from this.

I briefly tried a smaller Llama model (1B):

<...>
File [/path/to/site-packages/transformers/models/llama/modeling_llama.py#line=1371), in LlamaForSequenceClassification.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
   1369     batch_size = inputs_embeds.shape[0]
   1371 if self.config.pad_token_id is None and batch_size != 1:
-> 1372     raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
   1373 if self.config.pad_token_id is None:
   1374     sequence_lengths = -1

ValueError: Cannot handle batch sizes > 1 if no padding token is defined.

The error seems to be known but the workaround is difficult to achieve with the current API. I will keep this in mind for v2.0.0, but for now I would recommend just copying or subclassing TransformerBasedClassification and adapting it until it fits your needs.

  1. This is currently not supported. You could write your own Classifier implementation to do that. Somewhere down my list of ideas I have a PyTorch Lightning integration which can help with distributed training, however, I think for Llama 2 you will still need other repos as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants