Intro

This is my application project for the MATS program

The code is extremely ugly and hacky, please don't judge me for it lol. I have prioritized speed and making things working quickly over good software engineering.

I have liberaly copied code from Neel Nanda's tutorials, ARENA, Google, Stackoverflow and ChatGPT but all mistakes are mine obviously.

Project

I trained a small (3 million parameters) GPT-2 style model to play bishop-and-knight chess endgames and I tried to use mechanistic interpretability techniques to understand how it makes decisions and look for internal representations of the game state.

The model was producing a legal move in 99% of the cases The model was reaching a checkmate in 72% of the games

Report

For more information check the report. (there is also an html version that has a nice script to view the log probabilities of all moves in a given game but it is very heavy to load)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
model		model
visualisations		visualisations
.gitattributes		.gitattributes
Absent World Representations.html		Absent World Representations.html
Absent World Representations.md		Absent World Representations.md
README.md		README.md
dataProcessor.py		dataProcessor.py
games.txt		games.txt
generateData.py		generateData.py
sampler.py		sampler.py
transformer.py		transformer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intro

Project

Report

About

Releases

Packages

Languages

VasilGeorgiev39/MechInterp

Folders and files

Latest commit

History

Repository files navigation

Intro

Project

Report

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages