Circuits Over Time Experimentation Suite

This codebase contains a collection of tools for performing experiments on LLMs (specifically, Pythia) to characterize the nature and evolution of circuits over the course of training.

Environment

To run this code, you will need a GPU with sufficient GPU RAM (ideally 50 GB or more), and to install the packages listed in requirements.txt and environment.yml. Install with the following code:

conda activate your_environment_name

And then install the pip packages:

pip install -r requirements.txt

In addition, you will need to clone this repo https://github.com/hannamw/EAP-IG/tree/10035b88ceecf8bc7e444ba50449107d3e163069 to the edge_attribution_patching folder in the root of this code folder.

Structure

Key Scripts

Key scripts are all located in the root folder. They rely on settings that are specified in the configs folder.

Circuit graphs are obtained via EAP-IG with get_circuits_over_time.py; this can be run with configs specifying different datasets and models.
Attention head component scores can be obtained with get_full_model_components_over_time.py, get_new_successor_head_scores_over_time.py, and get_model_cspa.py.
Algorithmic consistency is verified with the get_ioi_consistency.py file and variants.
Behavior/performance can be collected with get_model_task_performance.py.

./circuit_sketches

This folder contains a collection of notebooks documenting experiments and results that were conducted in order to characterize the IOI circuit, mostly in models that have already completed training. This was done to establish a baseline for each model, as well as to confirm that the circuit is similar to that implemented in GPT-2 and that it uses similar subcomponents across model sizes and random seeds.

./plotting

This folder contains various scripts/notebooks for plotting results.

./utils

Contains all the key functions for running most of the experiments.

Outside Code

With written permission from the authors, we include CSPA code produced by Arthur Comny and Callum McDougall in the utils. This code is used to obtain the CSPA score for each attention head.

Name		Name	Last commit message	Last commit date
Latest commit History 376 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.vscode		.vscode
ACDCPP/__pycache__		ACDCPP/__pycache__
TransformerLens @ 7554558		TransformerLens @ 7554558
__pycache__		__pycache__
backup_research		backup_research
circuit_sketches		circuit_sketches
configs		configs
cspa		cspa
data		data
edge_attribution_patching @ 10035b8		edge_attribution_patching @ 10035b8
exports		exports
old		old
path_patching_cm		path_patching_cm
plotting		plotting
results		results
tests		tests
utils		utils
.gitattributes		.gitattributes
.gitattributes~merged		.gitattributes~merged
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
analyze_nodes.ipynb		analyze_nodes.ipynb
attention_head_comparer.ipynb		attention_head_comparer.ipynb
backup_experiments.ipynb		backup_experiments.ipynb
batched_test.ipynb		batched_test.ipynb
component_swap_test_gt.ipynb		component_swap_test_gt.ipynb
component_swap_test_ioi.ipynb		component_swap_test_ioi.ipynb
ctigges.code-workspace		ctigges.code-workspace
dataset-scratch.ipynb		dataset-scratch.ipynb
dataset-test.ipynb		dataset-test.ipynb
environment.yml		environment.yml
exports.tar.gz		exports.tar.gz
get_circuit_components_over_time.py		get_circuit_components_over_time.py
get_circuits.py		get_circuits.py
get_circuits_over_time.py		get_circuits_over_time.py
get_full_model_components_over_time.py		get_full_model_components_over_time.py
get_ioi_consistency-1.4b.py		get_ioi_consistency-1.4b.py
get_ioi_consistency-2.8b.py		get_ioi_consistency-2.8b.py
get_ioi_consistency.py		get_ioi_consistency.py
get_model_components.ipynb		get_model_components.ipynb
get_model_cspa.py		get_model_cspa.py
get_model_results_ioi.py		get_model_results_ioi.py
get_model_task_performance.py		get_model_task_performance.py
get_new_successor_head_scores_over_time.py		get_new_successor_head_scores_over_time.py
get_successor_head_scores_over_time.py		get_successor_head_scores_over_time.py
gradient.py		gradient.py
gradient_graph.ipynb		gradient_graph.ipynb
head_metrics.py		head_metrics.py
jupyter.py		jupyter.py
jupyter.sbatch		jupyter.sbatch
prompts.yaml		prompts.yaml
requirements.txt		requirements.txt
similarity_through_time_8_9.json		similarity_through_time_8_9.json
similarity_through_time_last_pos.json		similarity_through_time_last_pos.json
successor_head_dataset.ipynb		successor_head_dataset.ipynb
test_successor_head.py		test_successor_head.py
verify_greater_than.py		verify_greater_than.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Circuits Over Time Experimentation Suite

Environment

Structure

Key Scripts

./circuit_sketches

./plotting

./utils

Outside Code

About

Releases

Packages

Contributors 2

Languages

License

curt-tigges/circuits-over-time

Folders and files

Latest commit

History

Repository files navigation

Circuits Over Time Experimentation Suite

Environment

Structure

Key Scripts

./circuit_sketches

./plotting

./utils

Outside Code

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages