Skip to content

Commit

Permalink
Merge branch 'main' into john
Browse files Browse the repository at this point in the history
  • Loading branch information
GeorgeTsoukalas authored Jul 27, 2024
2 parents 5cfa1e7 + 7b66be7 commit 864d77f
Show file tree
Hide file tree
Showing 344 changed files with 2,018 additions and 14,796 deletions.
236 changes: 0 additions & 236 deletions .docs/index.html

This file was deleted.

37 changes: 0 additions & 37 deletions .docs/static/images/favicon.svg

This file was deleted.

39 changes: 39 additions & 0 deletions .github/workflows/lean.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
on:
push:
pull_request:

name: build lean

jobs:
build:
name: Build
runs-on: ubuntu-latest
steps:
- name: install elan
run: |
set -o pipefail
curl -sSfL https://github.com/leanprover/elan/releases/download/v3.1.1/elan-x86_64-unknown-linux-gnu.tar.gz | tar xz
./elan-init -y --default-toolchain none
echo "$HOME/.elan/bin" >> $GITHUB_PATH
- uses: actions/checkout@v3
- name: Set up olean cache
uses: actions/cache@v3
with:
path: ~/.cache
key: oleans

- name: Configure
working-directory: lean4
run: |
lake exe cache get
- name: Build
working-directory: lean4
run: |
lake build
- name: Save olean cache
working-directory: lean4
run: |
lake exe cache pack
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,6 @@ lean4/solutions_replaced_new/
informal/join_all_jsons*

isabelle_archive/
lean4_archive/
lean4_archive/

split_into_files.py
42 changes: 30 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,29 @@
# PutnamBench

PutnamBench is a benchmark for evaluation of theorem-proving algorithms on competition mathematics problems sourced from the William Lowell Putnam Mathematical Competition years 1962 - 2023. Our formalizations currently support three formal languages : Lean 4 $\land$ Isabelle $\land$ Coq. PutnamBench comprises of over 1400 manual formalizations, aggregated over all languages.
<p align="center">
<a href="https://trishullab.github.io/PutnamBench/leaderboard.html"><img src="https://img.shields.io/badge/%F0%9F%8F%86-leaderboard-%23ff8811"></a>
<a href="https://trishullab.github.io/PutnamBench/"><img src="https://img.shields.io/badge/%F0%9F%8F%86-website-8A2BE2"></a>
<a href="https://arxiv.org/abs/2407.11214"><img src="https://img.shields.io/badge/arXiv-2407.11214-b31b1b.svg"></a>
</p>

PutnamBench is a benchmark for evaluation of theorem-proving algorithms on competition mathematics problems sourced from the William Lowell Putnam Mathematical Competition years 1962 - 2023. Our formalizations currently support three formal languages : Lean 4 $\land$ Isabelle $\land$ Coq. PutnamBench comprises of 1697 manually-crafted formalizations, aggregated over all languages.

PutnamBench aims to support research in automated mathematical reasoning by providing a multilingual benchmark for evaluating theorem-proving algorithms. It is released under permissive licenses (Apache 2.0 for Lean 4 and Isabelle, MIT for Coq). The informal statements are also available with permission from the MAA.

We encourage community contributions through pull requests, though we kindly ask not to include proofs of any problems to reduce testing contamination. (TODO: After initial release, as of 7/8 this is expected to be in about 1 week.)
PutnamBench includes factored solutions for problems which require exhibiting a numerical answer in addition to its proof of correctness. For these problems, one can attempt two tasks: proving the problem with the numerical answer written into the theorem statement, or additionally producing the answer along with the proof.

## Citation
The associated paper for PutnamBench is {TODO}. Please consider including the following citation if you find PutnamBench useful.
{TODO}
We are hosting a [**leaderboard**](https://trishullab.github.io/PutnamBench/leaderboard.html) and will readily receive evaluation results which are accompanied by a preprint or publication. **Do not** include proofs as confirmation in any public setting. Please reach out privately at `[email protected]` with any requests for additions to the leaderboard.

**We strongly encourage community feedback!** Please let us know if you have any comments for improving PutnamBench. If you notice any mistakes, please raise an issue on the repository and we will address it. We kindly ask that you do not write formal proofs for any of the problems in an effort to reduce contamination.

## Statistics
| Language | Count |
| ------------- | -------------- |
| Lean 4 | 640 |
| Isabelle | 627 |
| Coq | 354 |
| Isabelle | 640 |
| Coq | 417 |

We also report the number of problems in a certain category. Note that some problems fall under multiple categories.
We also report the number of problems in a certain category. Note that some problems fall under multiple categories. While the categories are intended to capture general features of the problems, we also note that there is a high variance of problems inside an individual category.

| Category | Total Quantity |
| ---------------- | -------------- |
Expand All @@ -31,8 +37,20 @@ We also report the number of problems in a certain category. Note that some prob
| Probability | 9 |
| Set Theory | 8 |



## Versioning
- Version: `v-1`
- Not yet officially released. Please refrain from opening issues or making PRs until the initial release.
- Version: `v0`
- In preliminary release to allow for initial community feedback. We seek to release an official first version following several weeks of discussion with the community.

## Citation
The associated paper for PutnamBench is [available at this link](https://arxiv.org/abs/2407.11214). Please consider including the following citation if you find PutnamBench useful.
```
@misc{tsoukalas2024putnambenchevaluatingneuraltheoremprovers,
title={PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition},
author={George Tsoukalas and Jasper Lee and John Jennings and Jimmy Xin and Michelle Ding and Michael Jennings and Amitayush Thakur and Swarat Chaudhuri},
year={2024},
eprint={2407.11214},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2407.11214},
}
```
3 changes: 2 additions & 1 deletion coq/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
Here lies the Coq formalizations, currently in the format of batches of formalizations.
Note: We are continuing to make modifications to the Coq formalizations, in particular those which were recently added. We encourage feedback, but keep this in mind. The format of the formalizations will be standardized with the other languages upon completion.

We also note that while the dependencies listed in each formalization are sufficient to *state* the problem, one may need further mathematical theory developed in Coq to write a proof.
# Installation

You need to install `opam` and then run `setup.sh` to install the necessary dependencies.
Expand Down
Loading

0 comments on commit 864d77f

Please sign in to comment.