Initial commit.

ethz-spylab · Jun 17, 2023 · 7c42845 · 7c42845
commit 7c42845
Show file tree

Hide file tree

Showing 107 changed files with 16,028 additions and 0 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 Lukas Fluri, Daniel Paleka, and Florian Tramèr
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,69 @@
+# superhuman-ai-consistency
+
+![main figure](./docs/main_figure.png "Testing the consistency of superhuman AI via consistency checks")
+
+This repository contains the code for the paper [Evaluating Superhuman Models with Consistency Checks](https://arxiv.org/TODO) by [Lukas Fluri](https://www.linkedin.com/in/lukas-fluri-0b4721112), [Daniel Paleka](https://danielpaleka.com/), and [Florian Tramèr](https://floriantramer.com/).
+
+## tl;dr
+If machine learning models were to achieve *superhuman* abilities at various reasoning or decision-making tasks,
+how would we go about evaluating such models, given that humans would necessarily be poor proxies for ground truth?
+
+In this paper, we propose a framework for evaluating superhuman models via *consistency checks*.
+Our premise is that while the *correctness* of superhuman decisions may be impossible to evaluate, we can still surface mistakes if the model's decisions fail to satisfy certain logical, human-interpretable rules.
+
+We instantiate our framework on three tasks where correctness of decisions is hard to evaluate due to either superhuman model abilities, or to otherwise missing ground truth: evaluating chess positions, forecasting future events, and making legal judgments.
+
+We show that regardless of a model's (possibly superhuman) performance on these tasks, we can discover logical inconsistencies in decision making. 
+For example: a chess engine assigning opposing valuations to semantically identical boards; GPT-4 forecasting that sports records will evolve non-monotonically over time; or an AI judge assigning bail to a defendant only after we add a felony to their criminal record.
+
+The code for our experiments is available in the following directories:
+
+- [RL testing](./chess-ai-testing): Code and data which were used for testing chess AIs for inconsistencies.
+- [LLMs forecasting future events](https://github.com/ethz-privsec/superhuman-ai-consistency/releases/tag/v1.0.0): Experimental data produced by GPT-4 and GPT-3.5.
+- [Legal AI testing](./legal-ai-testing): Code and data which were used for testing legal AIs for inconsistencies.
+
+**_Note:_** Our data files are not part of the git repository. Instead, they are packaged in [release v1.0.0](https://github.com/ethz-privsec/superhuman-ai-consistency/releases/tag/v1.0.0).
+
+## Chess AI experiments
+![chess failures](./docs/chess_failures.png "Chess failures")
+Game-playing AIs are a prime example of models that operate vastly beyond human levels. We focus on chess, a canonical example of a complex decision-making task where humans can easily evaluate end-to-end performance (i.e., did the model win?), but not individual model decisions. 
+Nevertheless, the rules of chess imply several simple invariances that are readily apparent and verifiable even by amateur players --- a perfect application for our framework.
+
+In our experiments we test [Leela Chess Zero](https://github.com/LeelaChessZero/lc0), an open-source chess engine which plays at a superhuman level. We find large violations of various consistency constraints:
+- **Forced moves:** For board positions where there's only a single legal move, playing this move has no impact on the game’s outcome. Hence, the positions before and after the forced move must have the same evaluation.
+- **Board transformations:** For positions without pawns and castling, any change of orientation of the board (like board rotations or mirroring the board over any axis) has no effect on the game outcome.
+- **Position mirroring:** Mirroring the players’ position, such that White gets the piece-setup of Black and vice versa,
+with the rest of the game state fixed (e.g., castling rights), must results in a semantically identical position
+- **Recommended move:** The model’s evaluation of a position should remain similar if we play the strongest move predicted by
+the model. Indeed, chess engines typically aim to measure the expected game outcome under optimal play from both players, so any optimal move should not affect this measure.
+
+The code for our experiments is available in the [chess-ai-testing](./chess-ai-testing) directory. The data files are available in [release v1.0.0](https://github.com/ethz-privsec/superhuman-ai-consistency/releases/tag/v1.0.0).
+
+## LLMs forecasting future events
+![llm forecasting results](./docs/llm_forecasting_results.png "LLM forecasting results")
+Predicting and modeling the future is an important task for which the ground truth is inherently unknown: as the saying goes, "*it is difficult to make predictions, especially about the future.*"
+
+In our experiments we test [GPT-4](https://arxiv.org/abs/2303.08774) and [gpt-3.5-turbo](https://openai.com/blog/chatgpt) on their ability to forecast future events and give probability estimates for whether the events happen.
+
+We find large violations of various consistency constraints: 
+- **Negation:** For any event A, the model should predicts opposite probabilities for A and ¬A;
+- **Paraphrasing:** The model should predict the same probability for multiple equivalent events;
+- **Monotonicity:** Numbers or quantities which are known to be monotonic in time, such as sports records or numbers of people accomplishing a given feat, have monotonic model prediction;
+- **Bayes' rule:** For two events A and B, the model's probability forecasts for the events A, B, A | B and B | A satisfy Bayes' theorem.
+
+The benchmark questions and model responses are available in [release v1.0.0](https://github.com/ethz-privsec/superhuman-ai-consistency/releases/tag/v1.0.0).
+
+## Legal AI experiments
+![legal failures](./docs/legal_ai_testing_pipeline.png "Legal AI testing pipeline")
+Reaching decisions on complex legal cases can be long and costly, and the "correctness" of decisions is often contested (e.g., as evidenced by appeal courts). 
+The difficulties in assessing the correctness or fairness of legal decisions extend to AI tools that are used to assist or automate legal decisions. 
+
+We show how to reveal clear logical inconsistencies in two different language models used for predicting legal verdicts: (1) a [BERT model that evaluates violations of the European Convention of Human Rights](https://huggingface.co/nlpaueb/legal-bert-base-uncased); (2) [gpt-3.5-turbo](https://openai.com/blog/chatgpt) prompted to predict bail decisions given a defendant's criminal record.
+
+In particular, we show violations of the following consistency constraints:
+- **Paraphrasing:** We test whether changing the phrasing of a legal case changes the model’s decision.
+- **Partial ordering:** While the "correctness" of legal decisions is hard to assess, there can still be clear
+ways of “ranking” different outcomes. We consider an extreme example here, where we test whether
+a bail-decision model could favorably switch its decision if the defendant commits more crimes.
+
+The code for our experiments is available in the [legal-ai-testing](./legal-ai-testing) directory. The data files are available in [release v1.0.0](https://github.com/ethz-privsec/superhuman-ai-consistency/releases/tag/v1.0.0)..
diff --git a/chess-ai-testing/.gitignore b/chess-ai-testing/.gitignore
@@ -0,0 +1,157 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# ides
+.vscode/
+.idea/
+
+# virtualenv
+.venv
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# mlflow
+mlruns/
+mlruns_data
+
+# Project specific stuff
+data/*.txt
+data/*.pgn
+experiments/results/
+old/
+*old.py
+*OLD.py
+*OLD.txt
+*.csv
+*.zip
+*OLD
+*old
+tensorboard
+experiments/analysis/images
+wandb/*
+logs/*
diff --git a/chess-ai-testing/LICENSE b/chess-ai-testing/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2022 Lukas Fluri
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/chess-ai-testing/README.md b/chess-ai-testing/README.md
@@ -0,0 +1,119 @@
+# rl-testing-experiments
+
+## Table of contents
+1. [Setup](#setup)
+    - [Creating virtual environment](#creating-virtual-environment)
+    - [Installing the package](#installing-the-package)
+    - [Setting up a Leela Chess Zero instance](#setting-up-a-leela-chess-zero-instance)
+    - [Downloading a Leela Chess Zero weight file](#downloading-a-leela-chess-zero-weight-file)
+    - [Configuration file for Leela Chess Zero instance](#configuration-file-for-leela-chess-zero-instance)
+    - [Configuration file for data](#configuration-file-for-data)
+2. [Reproducing the experiments](#reproducing-the-experiments)
+    - [Prerequisites](#prerequisites)
+    - [Running the experiments](#running-the-experiments)
+
+## Setup
+### Creating virtual environment
+This project was developed using Python 3.8. It is recommended to install this repository in a virtual environment.
+Make sure you have [Python 3.8](https://www.python.org/downloads/release/python-380/) installed on your machine. Then, initialize your virtual environment in this folder, for example via the command 
+```bash
+python3.8 -m venv .venv
+```
+You can activate the virtual environment via the command
+```bash
+source .venv/bin/activate
+```
+
+### Installing the package
+The package can be installed via the command
+```bash
+pip install -e .
+```
+
+### Setting up a Leela Chess Zero instance
+In order to run the experiments you need access to an instance of [Leela Chess Zero](https://github.com/LeelaChessZero/lc0). You can either install it on the same machine you want to run the experiments on, or on a remote machine to which you have SSH access. Our experiments use the `release/0.29` version, compiled from source and with GPU support enabled.
+
+### Downloading a Leela Chess Zero weight file
+All weight files can be found on [this website](https://training.lczero.org/networks/?show_all=1). For our experiments we used the network with ID `807785`.
+
+### Configuration file for Leela Chess Zero instance
+Configurations for the Leela Chess Zero instance must be stored in a configuration file in the `experiments/configs/engine_configs` folder. Each config file has to contain information about where to find the installed Leela Chess Zero instance and which configuration parameters should be set. See as example the following config:
+```python
+[General]
+# 'engine_type' Must be either 'local_engine' or 'remote_engine'
+engine_type = remote_engine 
+engine_path = /path/to/lc0/on/the/machine/where/it/has/been/installed
+network_base_path = /path/to/folder/where/weightfiles/are/stored
+
+# Leela Chess Zero configs used for experiments
+# See https://github.com/LeelaChessZero/lc0/wiki/Lc0-options
+# for a list of all options
+[EngineConfig]
+Backend = cuda-fp16
+VerboseMoveStats = true
+SmartPruningFactor = 0
+Threads = 1
+TaskWorkers = 0
+MinibatchSize = 1
+MaxPrefetch = 0
+NNCacheSize = 200000
+TwoFoldDraws = false
+
+# For how long Leela Chess Zero should evaluate a position
+# See https://python-chess.readthedocs.io/en/latest/engine.html#chess.engine.Limit
+# for a list of options.
+[SearchLimits]
+nodes = 400
+
+
+# The following parameters are only required if you installed
+# Leela Chess Zero on a different machine than the one you're using
+# to run the experiments
+[Remote]
+remote_host = uri.of.server.com
+remote_user = username
+password_required = True
+```
+
+### Configuration file for data
+In addition to the engine config, our experiments also require a config file containing information where to find the input data (usually chess positions). This configuration file must be stored in the `experiments/configs/data_generator_configs` folder. We support either a simple `.txt` file containing a list of [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation)s, or a `.pgn` database containing games in [PGN](https://en.wikipedia.org/wiki/Portable_Game_Notation). All data files should be stored in the  `data` folder. Alternatively, you can also set the `DATASET_PATH` environment variable in which case the data-files are expected to be stored in `DATASET_PATH/chess-data`. See as example the following config:
+```python
+[General]
+# 'data_generator_type' must be either 'fen_database_board_generator' 
+# (for a simple text file containing one fen per row) or 
+# 'database_board_generator' (for a database file in .pgn format)
+data_generator_type = fen_database_board_generator
+
+[DataGeneratorConfig]
+database_name = name_of_data_file.txt
+open_now = True
+```
+
+## Reproducing the experiments
+### Prerequisites
+- Leela Chess Zero instance installed and configured as described above
+- Data file containing chess positions stored in `data` folder. The specific chess positions used in our experiments can be extracted from the result files in the `experiments/results/final_data` folder.
+
+### Running the experiments
+All experiments can be run in a two-step process. First, the main experiment file is run. This file handles everything from loading the data, writing results, and coordinating the distributed queues. In a second steps, one or several workers are started. Each worker runs a Leela Chess Zero instance and evaluates positions provided by the main experiment file.
+
+For the forced-move and the recommended-move experiments, the main experiment file can be run via the command
+```bash
+python experiments/recommended_move_invariance_testing.py --engine_config_name your_engine_config.ini --data_config_name --your_data_config.ini --num_positions number_of_positions_to_evaluate
+```
+
+For the board-mirroring and board-transformation experiments, the main experiment file can be run via the command
+```bash
+# '--transformations' must be a subset of [rot90, rot180, rot270, flip_diag, flip_anti_diag, flip_hor, flip_vert, mirror]
+python experiments/transformation_invariance_testing.py --engine_config_name your_engine_config.ini --data_config_name --your_data_config.ini --num_positions number_of_positions_to_evaluate --transformations a list of transformations to apply to the board
+```
+
+For the evolutionary algorithm experiments, the main experiment file can be run via the command
+```bash
+python experiments/evolutionary_algorithms/evolutionary_algorithm_distributed_oracle_queries_async.py 
+```
+
+For all experiments, a worker can be started via the command
+```bash
+python rl_testing/engine_generators/worker.py --engine_config_name your_engine_config.ini --network_name name_of_weight_file
+```