-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 7c42845
Showing
107 changed files
with
16,028 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2023 Lukas Fluri, Daniel Paleka, and Florian Tramèr | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# superhuman-ai-consistency | ||
|
||
![main figure](./docs/main_figure.png "Testing the consistency of superhuman AI via consistency checks") | ||
|
||
This repository contains the code for the paper [Evaluating Superhuman Models with Consistency Checks](https://arxiv.org/TODO) by [Lukas Fluri](https://www.linkedin.com/in/lukas-fluri-0b4721112), [Daniel Paleka](https://danielpaleka.com/), and [Florian Tramèr](https://floriantramer.com/). | ||
|
||
## tl;dr | ||
If machine learning models were to achieve *superhuman* abilities at various reasoning or decision-making tasks, | ||
how would we go about evaluating such models, given that humans would necessarily be poor proxies for ground truth? | ||
|
||
In this paper, we propose a framework for evaluating superhuman models via *consistency checks*. | ||
Our premise is that while the *correctness* of superhuman decisions may be impossible to evaluate, we can still surface mistakes if the model's decisions fail to satisfy certain logical, human-interpretable rules. | ||
|
||
We instantiate our framework on three tasks where correctness of decisions is hard to evaluate due to either superhuman model abilities, or to otherwise missing ground truth: evaluating chess positions, forecasting future events, and making legal judgments. | ||
|
||
We show that regardless of a model's (possibly superhuman) performance on these tasks, we can discover logical inconsistencies in decision making. | ||
For example: a chess engine assigning opposing valuations to semantically identical boards; GPT-4 forecasting that sports records will evolve non-monotonically over time; or an AI judge assigning bail to a defendant only after we add a felony to their criminal record. | ||
|
||
The code for our experiments is available in the following directories: | ||
|
||
- [RL testing](./chess-ai-testing): Code and data which were used for testing chess AIs for inconsistencies. | ||
- [LLMs forecasting future events](https://github.com/ethz-privsec/superhuman-ai-consistency/releases/tag/v1.0.0): Experimental data produced by GPT-4 and GPT-3.5. | ||
- [Legal AI testing](./legal-ai-testing): Code and data which were used for testing legal AIs for inconsistencies. | ||
|
||
**_Note:_** Our data files are not part of the git repository. Instead, they are packaged in [release v1.0.0](https://github.com/ethz-privsec/superhuman-ai-consistency/releases/tag/v1.0.0). | ||
|
||
## Chess AI experiments | ||
![chess failures](./docs/chess_failures.png "Chess failures") | ||
Game-playing AIs are a prime example of models that operate vastly beyond human levels. We focus on chess, a canonical example of a complex decision-making task where humans can easily evaluate end-to-end performance (i.e., did the model win?), but not individual model decisions. | ||
Nevertheless, the rules of chess imply several simple invariances that are readily apparent and verifiable even by amateur players --- a perfect application for our framework. | ||
|
||
In our experiments we test [Leela Chess Zero](https://github.com/LeelaChessZero/lc0), an open-source chess engine which plays at a superhuman level. We find large violations of various consistency constraints: | ||
- **Forced moves:** For board positions where there's only a single legal move, playing this move has no impact on the game’s outcome. Hence, the positions before and after the forced move must have the same evaluation. | ||
- **Board transformations:** For positions without pawns and castling, any change of orientation of the board (like board rotations or mirroring the board over any axis) has no effect on the game outcome. | ||
- **Position mirroring:** Mirroring the players’ position, such that White gets the piece-setup of Black and vice versa, | ||
with the rest of the game state fixed (e.g., castling rights), must results in a semantically identical position | ||
- **Recommended move:** The model’s evaluation of a position should remain similar if we play the strongest move predicted by | ||
the model. Indeed, chess engines typically aim to measure the expected game outcome under optimal play from both players, so any optimal move should not affect this measure. | ||
|
||
The code for our experiments is available in the [chess-ai-testing](./chess-ai-testing) directory. The data files are available in [release v1.0.0](https://github.com/ethz-privsec/superhuman-ai-consistency/releases/tag/v1.0.0). | ||
|
||
## LLMs forecasting future events | ||
![llm forecasting results](./docs/llm_forecasting_results.png "LLM forecasting results") | ||
Predicting and modeling the future is an important task for which the ground truth is inherently unknown: as the saying goes, "*it is difficult to make predictions, especially about the future.*" | ||
|
||
In our experiments we test [GPT-4](https://arxiv.org/abs/2303.08774) and [gpt-3.5-turbo](https://openai.com/blog/chatgpt) on their ability to forecast future events and give probability estimates for whether the events happen. | ||
|
||
We find large violations of various consistency constraints: | ||
- **Negation:** For any event A, the model should predicts opposite probabilities for A and ¬A; | ||
- **Paraphrasing:** The model should predict the same probability for multiple equivalent events; | ||
- **Monotonicity:** Numbers or quantities which are known to be monotonic in time, such as sports records or numbers of people accomplishing a given feat, have monotonic model prediction; | ||
- **Bayes' rule:** For two events A and B, the model's probability forecasts for the events A, B, A | B and B | A satisfy Bayes' theorem. | ||
|
||
The benchmark questions and model responses are available in [release v1.0.0](https://github.com/ethz-privsec/superhuman-ai-consistency/releases/tag/v1.0.0). | ||
|
||
## Legal AI experiments | ||
![legal failures](./docs/legal_ai_testing_pipeline.png "Legal AI testing pipeline") | ||
Reaching decisions on complex legal cases can be long and costly, and the "correctness" of decisions is often contested (e.g., as evidenced by appeal courts). | ||
The difficulties in assessing the correctness or fairness of legal decisions extend to AI tools that are used to assist or automate legal decisions. | ||
|
||
We show how to reveal clear logical inconsistencies in two different language models used for predicting legal verdicts: (1) a [BERT model that evaluates violations of the European Convention of Human Rights](https://huggingface.co/nlpaueb/legal-bert-base-uncased); (2) [gpt-3.5-turbo](https://openai.com/blog/chatgpt) prompted to predict bail decisions given a defendant's criminal record. | ||
|
||
In particular, we show violations of the following consistency constraints: | ||
- **Paraphrasing:** We test whether changing the phrasing of a legal case changes the model’s decision. | ||
- **Partial ordering:** While the "correctness" of legal decisions is hard to assess, there can still be clear | ||
ways of “ranking” different outcomes. We consider an extreme example here, where we test whether | ||
a bail-decision model could favorably switch its decision if the defendant commits more crimes. | ||
|
||
The code for our experiments is available in the [legal-ai-testing](./legal-ai-testing) directory. The data files are available in [release v1.0.0](https://github.com/ethz-privsec/superhuman-ai-consistency/releases/tag/v1.0.0).. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
pip-wheel-metadata/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# ides | ||
.vscode/ | ||
.idea/ | ||
|
||
# virtualenv | ||
.venv | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
*.py,cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
db.sqlite3-journal | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow | ||
__pypackages__/ | ||
|
||
# Celery stuff | ||
celerybeat-schedule | ||
celerybeat.pid | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# mlflow | ||
mlruns/ | ||
mlruns_data | ||
|
||
# Project specific stuff | ||
data/*.txt | ||
data/*.pgn | ||
experiments/results/ | ||
old/ | ||
*old.py | ||
*OLD.py | ||
*OLD.txt | ||
*.csv | ||
*.zip | ||
*OLD | ||
*old | ||
tensorboard | ||
experiments/analysis/images | ||
wandb/* | ||
logs/* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2022 Lukas Fluri | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
# rl-testing-experiments | ||
|
||
## Table of contents | ||
1. [Setup](#setup) | ||
- [Creating virtual environment](#creating-virtual-environment) | ||
- [Installing the package](#installing-the-package) | ||
- [Setting up a Leela Chess Zero instance](#setting-up-a-leela-chess-zero-instance) | ||
- [Downloading a Leela Chess Zero weight file](#downloading-a-leela-chess-zero-weight-file) | ||
- [Configuration file for Leela Chess Zero instance](#configuration-file-for-leela-chess-zero-instance) | ||
- [Configuration file for data](#configuration-file-for-data) | ||
2. [Reproducing the experiments](#reproducing-the-experiments) | ||
- [Prerequisites](#prerequisites) | ||
- [Running the experiments](#running-the-experiments) | ||
|
||
## Setup | ||
### Creating virtual environment | ||
This project was developed using Python 3.8. It is recommended to install this repository in a virtual environment. | ||
Make sure you have [Python 3.8](https://www.python.org/downloads/release/python-380/) installed on your machine. Then, initialize your virtual environment in this folder, for example via the command | ||
```bash | ||
python3.8 -m venv .venv | ||
``` | ||
You can activate the virtual environment via the command | ||
```bash | ||
source .venv/bin/activate | ||
``` | ||
|
||
### Installing the package | ||
The package can be installed via the command | ||
```bash | ||
pip install -e . | ||
``` | ||
|
||
### Setting up a Leela Chess Zero instance | ||
In order to run the experiments you need access to an instance of [Leela Chess Zero](https://github.com/LeelaChessZero/lc0). You can either install it on the same machine you want to run the experiments on, or on a remote machine to which you have SSH access. Our experiments use the `release/0.29` version, compiled from source and with GPU support enabled. | ||
|
||
### Downloading a Leela Chess Zero weight file | ||
All weight files can be found on [this website](https://training.lczero.org/networks/?show_all=1). For our experiments we used the network with ID `807785`. | ||
|
||
### Configuration file for Leela Chess Zero instance | ||
Configurations for the Leela Chess Zero instance must be stored in a configuration file in the `experiments/configs/engine_configs` folder. Each config file has to contain information about where to find the installed Leela Chess Zero instance and which configuration parameters should be set. See as example the following config: | ||
```python | ||
[General] | ||
# 'engine_type' Must be either 'local_engine' or 'remote_engine' | ||
engine_type = remote_engine | ||
engine_path = /path/to/lc0/on/the/machine/where/it/has/been/installed | ||
network_base_path = /path/to/folder/where/weightfiles/are/stored | ||
|
||
# Leela Chess Zero configs used for experiments | ||
# See https://github.com/LeelaChessZero/lc0/wiki/Lc0-options | ||
# for a list of all options | ||
[EngineConfig] | ||
Backend = cuda-fp16 | ||
VerboseMoveStats = true | ||
SmartPruningFactor = 0 | ||
Threads = 1 | ||
TaskWorkers = 0 | ||
MinibatchSize = 1 | ||
MaxPrefetch = 0 | ||
NNCacheSize = 200000 | ||
TwoFoldDraws = false | ||
|
||
# For how long Leela Chess Zero should evaluate a position | ||
# See https://python-chess.readthedocs.io/en/latest/engine.html#chess.engine.Limit | ||
# for a list of options. | ||
[SearchLimits] | ||
nodes = 400 | ||
|
||
|
||
# The following parameters are only required if you installed | ||
# Leela Chess Zero on a different machine than the one you're using | ||
# to run the experiments | ||
[Remote] | ||
remote_host = uri.of.server.com | ||
remote_user = username | ||
password_required = True | ||
``` | ||
|
||
### Configuration file for data | ||
In addition to the engine config, our experiments also require a config file containing information where to find the input data (usually chess positions). This configuration file must be stored in the `experiments/configs/data_generator_configs` folder. We support either a simple `.txt` file containing a list of [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation)s, or a `.pgn` database containing games in [PGN](https://en.wikipedia.org/wiki/Portable_Game_Notation). All data files should be stored in the `data` folder. Alternatively, you can also set the `DATASET_PATH` environment variable in which case the data-files are expected to be stored in `DATASET_PATH/chess-data`. See as example the following config: | ||
```python | ||
[General] | ||
# 'data_generator_type' must be either 'fen_database_board_generator' | ||
# (for a simple text file containing one fen per row) or | ||
# 'database_board_generator' (for a database file in .pgn format) | ||
data_generator_type = fen_database_board_generator | ||
|
||
[DataGeneratorConfig] | ||
database_name = name_of_data_file.txt | ||
open_now = True | ||
``` | ||
|
||
## Reproducing the experiments | ||
### Prerequisites | ||
- Leela Chess Zero instance installed and configured as described above | ||
- Data file containing chess positions stored in `data` folder. The specific chess positions used in our experiments can be extracted from the result files in the `experiments/results/final_data` folder. | ||
|
||
### Running the experiments | ||
All experiments can be run in a two-step process. First, the main experiment file is run. This file handles everything from loading the data, writing results, and coordinating the distributed queues. In a second steps, one or several workers are started. Each worker runs a Leela Chess Zero instance and evaluates positions provided by the main experiment file. | ||
|
||
For the forced-move and the recommended-move experiments, the main experiment file can be run via the command | ||
```bash | ||
python experiments/recommended_move_invariance_testing.py --engine_config_name your_engine_config.ini --data_config_name --your_data_config.ini --num_positions number_of_positions_to_evaluate | ||
``` | ||
|
||
For the board-mirroring and board-transformation experiments, the main experiment file can be run via the command | ||
```bash | ||
# '--transformations' must be a subset of [rot90, rot180, rot270, flip_diag, flip_anti_diag, flip_hor, flip_vert, mirror] | ||
python experiments/transformation_invariance_testing.py --engine_config_name your_engine_config.ini --data_config_name --your_data_config.ini --num_positions number_of_positions_to_evaluate --transformations a list of transformations to apply to the board | ||
``` | ||
|
||
For the evolutionary algorithm experiments, the main experiment file can be run via the command | ||
```bash | ||
python experiments/evolutionary_algorithms/evolutionary_algorithm_distributed_oracle_queries_async.py | ||
``` | ||
|
||
For all experiments, a worker can be started via the command | ||
```bash | ||
python rl_testing/engine_generators/worker.py --engine_config_name your_engine_config.ini --network_name name_of_weight_file | ||
``` |
Oops, something went wrong.