PutnamBench

PutnamBench is a benchmark for evaluation of theorem-proving algorithms on competition mathematics problems sourced from the William Lowell Putnam Mathematical Competition years 1962 - 2023. Our formalizations currently support three formal languages : Lean 4 $\land$ Isabelle $\land$ Coq. PutnamBench comprises of 1696 manually-crafted formalizations, aggregated over all languages.

PutnamBench aims to support research in automated mathematical reasoning by providing a multilingual benchmark for evaluating theorem-proving algorithms. It is released under permissive licenses (Apache 2.0 for Lean 4 and Isabelle, MIT for Coq). The informal statements are also available with permission from the MAA.

PutnamBench includes factored solutions for problems which require exhibiting a numerical answer in addition to its proof of correctness. For these problems, one can attempt two tasks: proving the problem with the numerical answer written into the theorem statement, or additionally producing the answer along with the proof.

We are hosting a leaderboard and will readily receive evaluation results which are accompanied by a preprint or publication. Do not include proofs as confirmation in any public setting. Please reach out privately at [email protected] with any requests for additions to the leaderboard.

We strongly encourage community feedback! Please let us know if you have any comments for improving PutnamBench. If you notice any mistakes, please raise an issue on the repository and we will address it. We kindly ask that you do not write formal proofs for any of the problems in an effort to reduce contamination. If you do wish to write formal proofs for a subset of the problems, we please ask that you first engage in discussion with us.

Statistics

Language	Count
Lean 4	644
Isabelle	640
Coq	412

We also report the number of problems in a certain category. Note that some problems fall under multiple categories. While the categories are intended to capture general features of the problems, we also note that there is a high variance of problems inside an individual category.

Category	Total Quantity
Algebra	253
Analysis	226
Number Theory	108
Geometry	69
Linear Algebra	51
Combinatorics	29
Abstract Algebra	28
Probability	10
Set Theory	8

Citation

The associated paper for PutnamBench is available at this link. Please consider including the following citation if you find PutnamBench useful.

@misc{tsoukalas2024putnambenchevaluatingneuraltheoremprovers,
      title={PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition}, 
      author={George Tsoukalas and Jasper Lee and John Jennings and Jimmy Xin and Michelle Ding and Michael Jennings and Amitayush Thakur and Swarat Chaudhuri},
      year={2024},
      eprint={2407.11214},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2407.11214}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 747 Commits
.github/workflows		.github/workflows
coq		coq
docs		docs
informal		informal
isabelle		isabelle
lean4		lean4
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PutnamBench

Statistics

Citation

About

Releases

Packages

Contributors 11

Languages

trishullab/PutnamBench

Folders and files

Latest commit

History

Repository files navigation

PutnamBench

Statistics

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 11

Languages

Packages