GitHub - EleutherAI/transformer-reasoning: Experiments in transformer knowledge and reasoning

Transformer Reasoning

Forked from the repository for Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization (https://arxiv.org/abs/2405.15071)

File Structure

GrokkedTranformer/
├─  {composition/comparison/complex_reasoning}.ipynb: scripts for training/evaluation data generation
├─  data/: cached training/evaluation data
├─  main.py: main script for model training
├─  eval_qa.py: evaluation script for trained model
├─  causal_tracing_{composition/comparison}.py: causal tracing & logit lens
├─  LLM/: cached testing data & model outputs for LLMs based on non-parametric memory
    ├─ {prompt/retrieval}_{directna/cot}_*.txt: input for the setting {without/with} retrieval augmentation and {without/with} CoT
    ├─ answer_*.txt: ground truth answer
    ├─ {gemini/gpt4turbo}_*.txt: predictions of Gemini-Pro-1.5 and GPT-4-Turbo
├─  LLM.ipynb: evaluation script and cached evaluation results for LLMs
└── utils.py: other helper functions

Environmental Setup

pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116

cd transformers
pip install -e .
cd ..

cd simpletransformers
pip install -e .
cd ..

Data Preparation

Download from link and unzip into data/, or alternatively, run {composition/comparison/complex_reasoning}.ipynb to generate the data
Download from link and unzip into LLM/

Model Training

MODEL_PATH=gpt2

DATASET=data/$1/
WEIGHT_DECAY=$2
N_LAYERS=$3
GPU=$4

OUTPUT_DIR=<your_dir>/$1_$2_$3

# CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port 12345 main.py \
CUDA_VISIBLE_DEVICES=$GPU python main.py \
--data_dir $DATASET \
--model_name_or_path ${MODEL_PATH} \
--weight_decay $WEIGHT_DECAY \
--output_dir $OUTPUT_DIR \
--max_seq_length 10 \
--max_length 10 \
--block_size 10 \
--train_batch_size 512 \
--eval_batch_size 512 \
--learning_rate 1e-4 \
--gradient_accumulation_steps 1 \
--save_step 50000 \
--save_step_dense 40000 \
--max_steps 1500000 \
--do_train \
--scheduler constant_schedule_with_warmup \
--fp16 \
--evaluate_during_training \
--predict_during_training \
--init_weights \
--add_tokens \
--n_layer $N_LAYERS

For the parameter sharing scheme in Section Appendix E.2, run the above command with --n_layer 4 and --add_recurrence flag.
Pretrained model checkpoints could be downloaded from here, where the directories are named by "<dataset_name>_<weight_decay>_<num_layers>" and contain downsampled checkpoints (full checkpoints are too large to upload) during training, labeled by "checkpoint-<training_step>/".

Evaluation

python eval_qa.py --dir <path_to_saved_checkpoints>

For LLMs based on non-parametric memory, the cached evaluation scripts and results are included in LLM.ipynb.

Logit lens & Causal tracing

python causal_tracing_{comparison/composition}.py \
    --dataset <dataset_name> \
    --model_dir <your_dir> \
    --save_path <your_save_path> \
    --num_layer <number_layer_of_model> \
    --wd <weight_decay_used>

Citation

@misc{wang2024grokked,
      title={Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization}, 
      author={Boshi Wang and Xiang Yue and Yu Su and Huan Sun},
      year={2024},
      eprint={2405.15071},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/pdf/2405.15071}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
src/transformer_reasoning		src/transformer_reasoning
.gitignore		.gitignore
LICENSE		LICENSE
LLM.ipynb		LLM.ipynb
README.md		README.md
causal_tracing_comparison.py		causal_tracing_comparison.py
causal_tracing_composition.py		causal_tracing_composition.py
comparison.ipynb		comparison.ipynb
complex_reasoning.ipynb		complex_reasoning.ipynb
composition.ipynb		composition.ipynb
eval_qa.py		eval_qa.py
generate_bio_data.ipynb		generate_bio_data.ipynb
interactive_analysis.ipynb		interactive_analysis.ipynb
main.py		main.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer Reasoning

File Structure

Environmental Setup

Data Preparation

Model Training

Evaluation

Logit lens & Causal tracing

Citation

About

Releases

Packages

Languages

License

EleutherAI/transformer-reasoning

Folders and files

Latest commit

History

Repository files navigation

Transformer Reasoning

File Structure

Environmental Setup

Data Preparation

Model Training

Evaluation

Logit lens & Causal tracing

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages