Harnessing the Potential of Pretrained Language Models and Active Learning for Tweets Sentiment Analysis
Authors:
- Marc-Antoine ALLARD ([email protected])
- Antoine MAGRON ([email protected])
- Paul TEILETCHE ([email protected])
Repository of APMA-AI team's report for CS-433 project 2.
This codebase is built to be compatible with any HuggingFace listed model. You can look for available models on their models page.
-
Requirements: Here are the requirements to use our code:
pip install -r requirements.txt
-
Experiment Arguments: You are free to set any of these arguments for your experiment:
- Model & Data Arguments
BASE_MODEL
: Base model used for training.N
: Number of instances in the dataset.test_ratio
: Ratio of the dataset used for testing.
- Training Arguments
epochs
: Number of training epochs.optimizer
: The training optimizer.bs
: Batch size used during training.lr
: Learning rate for the training process.wd
: Weight decay parameter.warm_pct
: Ratio of steps used to warmup the optimizer
- Active Learning Arguments
active_learning
: Boolean indicating whether active learning is enabled.T
: A parameter related to active learning.aware_sampling
: Boolean indicating whether aware sampling is enabled.aware_sampling_type
: Type of aware sampling.
- Global Arguments
SAVE_DIR
: Directory for saving the model and related files.DATA_PATH
: Path to the dataset.seed
: Random seed for reproducibility.device
: Device used for training (e.g., "cuda:0" for GPU).
-
Launch An Experiment: Our code is really simple to use. You can produce a sample test AI-Crowd submission using our notebook experiment.
-
Specify your arguments in the Parameters section. Here is an example of use.
# This launch an experiment using DistillBERT model with 10 000 samples using 3 epochs. exp = Experiment( N=10_000, epochs=3, BASE_MODEL='distilbert-base-uncased' ) # This launch the training procedure model = exp.finetune() # Perform the prediction with the previous model and store it as a csv file ready to be submit on a platform such as AI-Crowd predictions = exp.predict(save=True)
-
Run the Training section to fine-tune your model.
-
Run the Predict section to predict your test data.
-