Update README.md

v.0.0.2 update
CompEpigen · Feb 29, 2024 · 4e828e1 · 4e828e1
1 parent c50b287
commit 4e828e1
Showing 1 changed file with 30 additions and 25 deletions.
diff --git a/README.md b/README.md
@@ -42,13 +42,13 @@ If you want to use _MethylBERT_ as a python library, please follow the [tutorial
 MethylBERT supports a command line tool. Before using the command line tool, please check [the input file requirements](https://github.com/hanyangii/methylbert/blob/main/tutorials/01_Data_Preparation.md)
 ```
 > methylbert 
-MethylBERT v0.0.1
+MethylBERT v0.0.2
 One option must be given from ['preprocess_finetune', 'finetune', 'deconvolute']
 ```
 #### 1. Data Preprocessing to fine-tune MethylBERT
 ```
 > methylbert preprocess_finetune --help
-MethylBERT v0.0.1
+MethylBERT v0.0.2
 usage: methylbert preprocess_finetune [-h] [-s SC_DATASET] [-f INPUT_FILE] -d
                                       F_DMR -o OUTPUT_DIR -r F_REF
                                       [-nm N_MERS] [-p SPLIT_RATIO]
@@ -89,20 +89,21 @@ optional arguments:
 #### 2. MethylBERT fine-tuning
 ```
 > methylbert finetune --help
-MethylBERT v0.0.1
+MethylBERT v0.0.2
 usage: methylbert finetune [-h] -c TRAIN_DATASET [-t TEST_DATASET] -o
-                           OUTPUT_PATH [-p PRETRAIN] [-nm N_MERS] [-s SEQ_LEN]
+                           OUTPUT_PATH [-p PRETRAIN] [-l N_ENCODER]
+                           [-nm N_MERS] [-s SEQ_LEN] [-b BATCH_SIZE]
+                           [--valid_batch VALID_BATCH]
+                           [--corpus_lines CORPUS_LINES]
                            [--max_grad_norm MAX_GRAD_NORM]
                            [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
-                           [-b BATCH_SIZE] [--valid_batch VALID_BATCH]
                            [-e STEPS] [--save_freq SAVE_FREQ] [-w NUM_WORKERS]
                            [--with_cuda WITH_CUDA] [--log_freq LOG_FREQ]
-                           [--eval_freq EVAL_FREQ]
-                           [--corpus_lines CORPUS_LINES] [--lr LR]
+                           [--eval_freq EVAL_FREQ] [--lr LR]
                            [--adam_weight_decay ADAM_WEIGHT_DECAY]
                            [--adam_beta1 ADAM_BETA1] [--adam_beta2 ADAM_BETA2]
-                           [--warm_up WARM_UP] [--seed SEED]
-                           [--decrease_steps DECREASE_STEPS]
+                           [--warm_up WARM_UP]
+                           [--decrease_steps DECREASE_STEPS] [--seed SEED]
 
 optional arguments:
   -h, --help            show this help message and exit
@@ -113,59 +114,63 @@ optional arguments:
   -o OUTPUT_PATH, --output_path OUTPUT_PATH
                         ex)output/bert.model
   -p PRETRAIN, --pretrain PRETRAIN
-                        a saved pretrained model to restore
+                        path to the saved pretrained model to restore
+  -l N_ENCODER, --n_encoder N_ENCODER
+                        number of encoder blocks. One of [12, 8, 6] need to be
+                        given. A pre-trained MethylBERT model is downloaded
+                        accordingly. Ignored when -p (--pretrain) is given.
   -nm N_MERS, --n_mers N_MERS
                         n-mers (default: 3)
   -s SEQ_LEN, --seq_len SEQ_LEN
                         maximum sequence len (default: 150)
-  --max_grad_norm MAX_GRAD_NORM
-                        Max gradient norm (default: 1.0)
-  --gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS
-                        Number of updates steps to accumulate before
-                        performing a backward/update pass. (default: 1)
   -b BATCH_SIZE, --batch_size BATCH_SIZE
                         number of batch_size (default: 50)
   --valid_batch VALID_BATCH
                         number of batch_size in valid set. If it's not given,
                         valid_set batch size is set same as the train_set
                         batch size
+  --corpus_lines CORPUS_LINES
+                        total number of lines in corpus
+  --max_grad_norm MAX_GRAD_NORM
+                        Max gradient norm (default: 1.0)
+  --gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS
+                        Number of updates steps to accumulate before
+                        performing a backward/update pass. (default: 1)
   -e STEPS, --steps STEPS
-                        number of steps (default: 10)
+                        number of training steps (default: 600)
   --save_freq SAVE_FREQ
                         Steps to save the interim model
   -w NUM_WORKERS, --num_workers NUM_WORKERS
                         dataloader worker size (default: 20)
   --with_cuda WITH_CUDA
                         training with CUDA: true, or false (default: True)
   --log_freq LOG_FREQ   Frequency (steps) to print the loss values (default:
-                        1000)
+                        100)
   --eval_freq EVAL_FREQ
-                        Evaluate the model every n iter (default: 100)
-  --corpus_lines CORPUS_LINES
-                        total number of lines in corpus
+                        Evaluate the model every n iter (default: 10)
   --lr LR               learning rate of adamW (default: 4e-4)
   --adam_weight_decay ADAM_WEIGHT_DECAY
                         weight_decay of adamW (default: 0.01)
   --adam_beta1 ADAM_BETA1
                         adamW first beta value (default: 0.9)
   --adam_beta2 ADAM_BETA2
                         adamW second beta value (default: 0.98)
-  --warm_up WARM_UP     steps for warm-up (default: 10000)
-  --seed SEED           seed number (default: 950410)
+  --warm_up WARM_UP     steps for warm-up (default: 100)
   --decrease_steps DECREASE_STEPS
-                        step to decrease the learning rate (default: 1500)
+                        step to decrease the learning rate (default: 200)
+  --seed SEED           seed number (default: 950410)
 ```
 #### 3. MethylBERT tumour deconvolution
 ```
 > methylbert deconvolute --help
-MethylBERT v0.0.1
+MethylBERT v0.0.2
 usage: methylbert deconvolute [-h] -i INPUT_DATA -m MODEL_DIR [-o OUTPUT_PATH]
                               [-b BATCH_SIZE] [--save_logit] [--adjustment]
 
 optional arguments:
   -h, --help            show this help message and exit
   -i INPUT_DATA, --input_data INPUT_DATA
-                        Bulk data to deconvolve
+                        Bulk data to deconvolute
   -m MODEL_DIR, --model_dir MODEL_DIR
                         Trained methylbert model
   -o OUTPUT_PATH, --output_path OUTPUT_PATH