Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
v.0.0.2 update
  • Loading branch information
hanyangii authored Feb 29, 2024
1 parent c50b287 commit 4e828e1
Showing 1 changed file with 30 additions and 25 deletions.
55 changes: 30 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,13 @@ If you want to use _MethylBERT_ as a python library, please follow the [tutorial
MethylBERT supports a command line tool. Before using the command line tool, please check [the input file requirements](https://github.com/hanyangii/methylbert/blob/main/tutorials/01_Data_Preparation.md)
```
> methylbert
MethylBERT v0.0.1
MethylBERT v0.0.2
One option must be given from ['preprocess_finetune', 'finetune', 'deconvolute']
```
#### 1. Data Preprocessing to fine-tune MethylBERT
```
> methylbert preprocess_finetune --help
MethylBERT v0.0.1
MethylBERT v0.0.2
usage: methylbert preprocess_finetune [-h] [-s SC_DATASET] [-f INPUT_FILE] -d
F_DMR -o OUTPUT_DIR -r F_REF
[-nm N_MERS] [-p SPLIT_RATIO]
Expand Down Expand Up @@ -89,20 +89,21 @@ optional arguments:
#### 2. MethylBERT fine-tuning
```
> methylbert finetune --help
MethylBERT v0.0.1
MethylBERT v0.0.2
usage: methylbert finetune [-h] -c TRAIN_DATASET [-t TEST_DATASET] -o
OUTPUT_PATH [-p PRETRAIN] [-nm N_MERS] [-s SEQ_LEN]
OUTPUT_PATH [-p PRETRAIN] [-l N_ENCODER]
[-nm N_MERS] [-s SEQ_LEN] [-b BATCH_SIZE]
[--valid_batch VALID_BATCH]
[--corpus_lines CORPUS_LINES]
[--max_grad_norm MAX_GRAD_NORM]
[--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
[-b BATCH_SIZE] [--valid_batch VALID_BATCH]
[-e STEPS] [--save_freq SAVE_FREQ] [-w NUM_WORKERS]
[--with_cuda WITH_CUDA] [--log_freq LOG_FREQ]
[--eval_freq EVAL_FREQ]
[--corpus_lines CORPUS_LINES] [--lr LR]
[--eval_freq EVAL_FREQ] [--lr LR]
[--adam_weight_decay ADAM_WEIGHT_DECAY]
[--adam_beta1 ADAM_BETA1] [--adam_beta2 ADAM_BETA2]
[--warm_up WARM_UP] [--seed SEED]
[--decrease_steps DECREASE_STEPS]
[--warm_up WARM_UP]
[--decrease_steps DECREASE_STEPS] [--seed SEED]
optional arguments:
-h, --help show this help message and exit
Expand All @@ -113,59 +114,63 @@ optional arguments:
-o OUTPUT_PATH, --output_path OUTPUT_PATH
ex)output/bert.model
-p PRETRAIN, --pretrain PRETRAIN
a saved pretrained model to restore
path to the saved pretrained model to restore
-l N_ENCODER, --n_encoder N_ENCODER
number of encoder blocks. One of [12, 8, 6] need to be
given. A pre-trained MethylBERT model is downloaded
accordingly. Ignored when -p (--pretrain) is given.
-nm N_MERS, --n_mers N_MERS
n-mers (default: 3)
-s SEQ_LEN, --seq_len SEQ_LEN
maximum sequence len (default: 150)
--max_grad_norm MAX_GRAD_NORM
Max gradient norm (default: 1.0)
--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS
Number of updates steps to accumulate before
performing a backward/update pass. (default: 1)
-b BATCH_SIZE, --batch_size BATCH_SIZE
number of batch_size (default: 50)
--valid_batch VALID_BATCH
number of batch_size in valid set. If it's not given,
valid_set batch size is set same as the train_set
batch size
--corpus_lines CORPUS_LINES
total number of lines in corpus
--max_grad_norm MAX_GRAD_NORM
Max gradient norm (default: 1.0)
--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS
Number of updates steps to accumulate before
performing a backward/update pass. (default: 1)
-e STEPS, --steps STEPS
number of steps (default: 10)
number of training steps (default: 600)
--save_freq SAVE_FREQ
Steps to save the interim model
-w NUM_WORKERS, --num_workers NUM_WORKERS
dataloader worker size (default: 20)
--with_cuda WITH_CUDA
training with CUDA: true, or false (default: True)
--log_freq LOG_FREQ Frequency (steps) to print the loss values (default:
1000)
100)
--eval_freq EVAL_FREQ
Evaluate the model every n iter (default: 100)
--corpus_lines CORPUS_LINES
total number of lines in corpus
Evaluate the model every n iter (default: 10)
--lr LR learning rate of adamW (default: 4e-4)
--adam_weight_decay ADAM_WEIGHT_DECAY
weight_decay of adamW (default: 0.01)
--adam_beta1 ADAM_BETA1
adamW first beta value (default: 0.9)
--adam_beta2 ADAM_BETA2
adamW second beta value (default: 0.98)
--warm_up WARM_UP steps for warm-up (default: 10000)
--seed SEED seed number (default: 950410)
--warm_up WARM_UP steps for warm-up (default: 100)
--decrease_steps DECREASE_STEPS
step to decrease the learning rate (default: 1500)
step to decrease the learning rate (default: 200)
--seed SEED seed number (default: 950410)
```
#### 3. MethylBERT tumour deconvolution
```
> methylbert deconvolute --help
MethylBERT v0.0.1
MethylBERT v0.0.2
usage: methylbert deconvolute [-h] -i INPUT_DATA -m MODEL_DIR [-o OUTPUT_PATH]
[-b BATCH_SIZE] [--save_logit] [--adjustment]
optional arguments:
-h, --help show this help message and exit
-i INPUT_DATA, --input_data INPUT_DATA
Bulk data to deconvolve
Bulk data to deconvolute
-m MODEL_DIR, --model_dir MODEL_DIR
Trained methylbert model
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Expand Down

0 comments on commit 4e828e1

Please sign in to comment.