This codebase contains a collection of tools for performing experiments on LLMs (specifically, Pythia) to characterize the nature and evolution of circuits over the course of training.
To run this code, you will need a GPU with sufficient GPU RAM (ideally 50 GB or more), and to install the packages listed in requirements.txt
and environment.yml
. Install with the following code:
conda activate your_environment_name
And then install the pip packages:
pip install -r requirements.txt
In addition, you will need to clone this repo https://github.com/hannamw/EAP-IG/tree/10035b88ceecf8bc7e444ba50449107d3e163069 to the edge_attribution_patching
folder in the root of this code folder.
Key scripts are all located in the root folder. They rely on settings that are specified in the configs
folder.
- Circuit graphs are obtained via EAP-IG with
get_circuits_over_time.py
; this can be run with configs specifying different datasets and models. - Attention head component scores can be obtained with
get_full_model_components_over_time.py
,get_new_successor_head_scores_over_time.py
, andget_model_cspa.py
. - Algorithmic consistency is verified with the
get_ioi_consistency.py
file and variants. - Behavior/performance can be collected with
get_model_task_performance.py
.
This folder contains a collection of notebooks documenting experiments and results that were conducted in order to characterize the IOI circuit, mostly in models that have already completed training. This was done to establish a baseline for each model, as well as to confirm that the circuit is similar to that implemented in GPT-2 and that it uses similar subcomponents across model sizes and random seeds.
This folder contains various scripts/notebooks for plotting results.
Contains all the key functions for running most of the experiments.
With written permission from the authors, we include CSPA code produced by Arthur Comny and Callum McDougall in the utils. This code is used to obtain the CSPA score for each attention head.