Skip to content

VasilGeorgiev39/MechInterp

Repository files navigation

Intro

This is my application project for the MATS program

The code is extremely ugly and hacky, please don't judge me for it lol. I have prioritized speed and making things working quickly over good software engineering.

I have liberaly copied code from Neel Nanda's tutorials, ARENA, Google, Stackoverflow and ChatGPT but all mistakes are mine obviously.

Project

I trained a small (3 million parameters) GPT-2 style model to play bishop-and-knight chess endgames and I tried to use mechanistic interpretability techniques to understand how it makes decisions and look for internal representations of the game state.

The model was producing a legal move in 99% of the cases The model was reaching a checkmate in 72% of the games

Report

For more information check the report. (there is also an html version that has a nice script to view the log probabilities of all moves in a given game but it is very heavy to load)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published