diff --git a/README.md b/README.md index f003ebc..79a2606 100644 --- a/README.md +++ b/README.md @@ -42,13 +42,13 @@ If you want to use _MethylBERT_ as a python library, please follow the [tutorial MethylBERT supports a command line tool. Before using the command line tool, please check [the input file requirements](https://github.com/hanyangii/methylbert/blob/main/tutorials/01_Data_Preparation.md) ``` > methylbert -MethylBERT v0.0.1 +MethylBERT v0.0.2 One option must be given from ['preprocess_finetune', 'finetune', 'deconvolute'] ``` #### 1. Data Preprocessing to fine-tune MethylBERT ``` > methylbert preprocess_finetune --help -MethylBERT v0.0.1 +MethylBERT v0.0.2 usage: methylbert preprocess_finetune [-h] [-s SC_DATASET] [-f INPUT_FILE] -d F_DMR -o OUTPUT_DIR -r F_REF [-nm N_MERS] [-p SPLIT_RATIO] @@ -89,20 +89,21 @@ optional arguments: #### 2. MethylBERT fine-tuning ``` > methylbert finetune --help -MethylBERT v0.0.1 +MethylBERT v0.0.2 usage: methylbert finetune [-h] -c TRAIN_DATASET [-t TEST_DATASET] -o - OUTPUT_PATH [-p PRETRAIN] [-nm N_MERS] [-s SEQ_LEN] + OUTPUT_PATH [-p PRETRAIN] [-l N_ENCODER] + [-nm N_MERS] [-s SEQ_LEN] [-b BATCH_SIZE] + [--valid_batch VALID_BATCH] + [--corpus_lines CORPUS_LINES] [--max_grad_norm MAX_GRAD_NORM] [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS] - [-b BATCH_SIZE] [--valid_batch VALID_BATCH] [-e STEPS] [--save_freq SAVE_FREQ] [-w NUM_WORKERS] [--with_cuda WITH_CUDA] [--log_freq LOG_FREQ] - [--eval_freq EVAL_FREQ] - [--corpus_lines CORPUS_LINES] [--lr LR] + [--eval_freq EVAL_FREQ] [--lr LR] [--adam_weight_decay ADAM_WEIGHT_DECAY] [--adam_beta1 ADAM_BETA1] [--adam_beta2 ADAM_BETA2] - [--warm_up WARM_UP] [--seed SEED] - [--decrease_steps DECREASE_STEPS] + [--warm_up WARM_UP] + [--decrease_steps DECREASE_STEPS] [--seed SEED] optional arguments: -h, --help show this help message and exit @@ -113,24 +114,30 @@ optional arguments: -o OUTPUT_PATH, --output_path OUTPUT_PATH ex)output/bert.model -p PRETRAIN, --pretrain PRETRAIN - a saved pretrained model to restore + path to the saved pretrained model to restore + -l N_ENCODER, --n_encoder N_ENCODER + number of encoder blocks. One of [12, 8, 6] need to be + given. A pre-trained MethylBERT model is downloaded + accordingly. Ignored when -p (--pretrain) is given. -nm N_MERS, --n_mers N_MERS n-mers (default: 3) -s SEQ_LEN, --seq_len SEQ_LEN maximum sequence len (default: 150) - --max_grad_norm MAX_GRAD_NORM - Max gradient norm (default: 1.0) - --gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS - Number of updates steps to accumulate before - performing a backward/update pass. (default: 1) -b BATCH_SIZE, --batch_size BATCH_SIZE number of batch_size (default: 50) --valid_batch VALID_BATCH number of batch_size in valid set. If it's not given, valid_set batch size is set same as the train_set batch size + --corpus_lines CORPUS_LINES + total number of lines in corpus + --max_grad_norm MAX_GRAD_NORM + Max gradient norm (default: 1.0) + --gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS + Number of updates steps to accumulate before + performing a backward/update pass. (default: 1) -e STEPS, --steps STEPS - number of steps (default: 10) + number of training steps (default: 600) --save_freq SAVE_FREQ Steps to save the interim model -w NUM_WORKERS, --num_workers NUM_WORKERS @@ -138,11 +145,9 @@ optional arguments: --with_cuda WITH_CUDA training with CUDA: true, or false (default: True) --log_freq LOG_FREQ Frequency (steps) to print the loss values (default: - 1000) + 100) --eval_freq EVAL_FREQ - Evaluate the model every n iter (default: 100) - --corpus_lines CORPUS_LINES - total number of lines in corpus + Evaluate the model every n iter (default: 10) --lr LR learning rate of adamW (default: 4e-4) --adam_weight_decay ADAM_WEIGHT_DECAY weight_decay of adamW (default: 0.01) @@ -150,22 +155,22 @@ optional arguments: adamW first beta value (default: 0.9) --adam_beta2 ADAM_BETA2 adamW second beta value (default: 0.98) - --warm_up WARM_UP steps for warm-up (default: 10000) - --seed SEED seed number (default: 950410) + --warm_up WARM_UP steps for warm-up (default: 100) --decrease_steps DECREASE_STEPS - step to decrease the learning rate (default: 1500) + step to decrease the learning rate (default: 200) + --seed SEED seed number (default: 950410) ``` #### 3. MethylBERT tumour deconvolution ``` > methylbert deconvolute --help -MethylBERT v0.0.1 +MethylBERT v0.0.2 usage: methylbert deconvolute [-h] -i INPUT_DATA -m MODEL_DIR [-o OUTPUT_PATH] [-b BATCH_SIZE] [--save_logit] [--adjustment] optional arguments: -h, --help show this help message and exit -i INPUT_DATA, --input_data INPUT_DATA - Bulk data to deconvolve + Bulk data to deconvolute -m MODEL_DIR, --model_dir MODEL_DIR Trained methylbert model -o OUTPUT_PATH, --output_path OUTPUT_PATH