Skip to content

Latest commit

 

History

History
261 lines (200 loc) · 10.7 KB

tech_report.md

File metadata and controls

261 lines (200 loc) · 10.7 KB

Technical report on Minisat-ml

This is a technical report on an experiment we at Imandra have led on the comparative performance of C++, rust, and OCaml for writing SAT solvers and similar computational logic programs. C++ is widely used and can deliver very high performance, but is unsafe and sometimes unwieldy. The goal of this experiment is to assess the practicality of using OCaml (or rust) to write satisfiability solvers.

Our methodology was to adapt Minisat 2.2 in OCaml (minisat-ml) and in rust (batsat, derived from ratsat which is directly adapted from Minisat), and run them on sets of benchmarks to compare the results. Batsat is inspired from Minisat but diverged somewhat due to its parametrization over a theory, and some internal refactorings.

We also compare with Z3 4.8.4 and msat. Z3 is an alternative SMT and SAT solver in C++. Msat is an OCaml SAT solver that is roughly inspired from Minisat but has significant implementation differences, is functorized over a theory, and doesn't have the same heuristic parameters. As an OCaml codebase, its style is more idiomatic and generally more readable than Minisat-ml.

Implementation of Minisat-ml

The code of Minisat-ml was written in a way that closely mirrors Minisat's source. Some differences were unavoidable due to discrepancies between OCaml and C++.

Modules

Instead of .h and .cpp files, the OCaml version uses the module system with .ml and .mli files. The heap file was also modified to be parametrized (using an OCaml functor) over its content, rather than dealing entirely with integers. Inlining is still possible thanks to [@@inline] annotations on functors and functions.

Imperative programming

OCaml doesn't support return nor continue/break in loops, nor do {} while() loops. We had to rely on exceptions to emulate the former, and tail-recursive functions for the latter. Using OCaml 4.06 we can overload .%[ ] and .%[ ] <- to emulate operator[] overloading.

Compare this fragment of conflict analysis in OCaml and C++:

while
  let v = Lit.var self.trail.%[ !index ] in
  index := !index -1;
  not (seen self v)
do ()
done;

p := self.trail.%[ !index+1 ];
confl := reason_lit self !p;
set_seen self (Lit.var !p) false;
pathC := !pathC - 1;
while (!seen[var(trail[index--])]);
p     = trail[index+1];
confl = reason(var(p));
seen[var(p)] = 0;
pathC--;

In C++, the custom Vec type (a resizable array that is used for almost all data in Minisat) is manipulated like an array thanks to operator[] overloading.

Memory layout

OCaml doesn't provide value types, so we use explicit struct-of-arrays encoding for watch lists (where the C++ version stores struct Watcher(uint32,uint32) in a single vector, we have two int Vec.t of the same size to avoid allocating lots of small records).

32-bit integers are replaced by 63-bit integers (on x86_64) since OCaml only has word-sized values. Essentially, in Minisat-ml, almost everything is a int Vec.t or a foo Vec.t where type foo = private int (e.g. variables or literals or the ternary Lbool.t).

Optimizations

We use OCaml 4.06 + flambda, with the flags -O3 -unbox-closures -unbox-closures-factor 20. This allows a lot of function calls to be inlined, leading to large performance gains.

Some functions are annotated as [@@inline] to guide the optimizer better.

Code size

The whole of minisat-ml weights 1,901 lines of OCaml (according to David A. Wheeler's Sloccount). It implements only what's required for the default heuristics of Minisat 2.2 (e.g. full conflict minimization).

Experimental results

We compare solvers across a few benchmarks. The solver are:

  • minisat (2.2.0, C++)
  • batsat (0.2, rust)
  • minisat-ml (OCaml)
  • msat (0.8, OCaml)
  • z3 (4.8.4, C++)

and the benchmarks are:

we ran these benchmarks on a machine with 20 cores 2.20GHz Intel Xeon CPUs, and 64GB of RAM. Each solver is limited to 2GB of memory.

For each solver, we list the number of problems it solved, number of unknown, and the total and average time spent on solved problems. This is why sometimes a stronger solver has a higher average time, as it succeeds on some harder benchmarks that other solvers die on.

Raw results can be found in the snapshots directory; analysis was performed using src/scripts/analyze.py.

Basic

30s timeout

prover sat unsat solved unknown total-time avg-time
z3 2116 1103 3219 2 1289.1s 0.4s
msat 2115 1097 3212 9 1624.7s 0.51s
minisat-ml 2118 1096 3214 7 1580.8s 0.49s
minisat 2118 1102 3220 1 657.58s 0.2s
batsat 2118 1102 3220 1 846.72s 0.26s

Sat-race '06

300s timeout

prover sat unsat solved unknown total-time avg-time
z3 30 43 73 27 5031.7s 68.9s
msat 11 20 31 69 1834.5s 59.2s
minisat-ml 28 33 61 39 3866.2s 63.4s
minisat 39 41 80 20 4092.6s 51.2s
batsat 38 41 79 21 4443.7s 56.2s

graph

Sat-comp '18

300s timeout

prover sat unsat solved unknown total-time avg-time
z3 44 21 65 335 6075.8s 93.5s
msat 21 1 22 378 1070.8s 48.7s
minisat-ml 23 2 25 375 1652.7s 66.1s
minisat 37 8 45 355 3799.9s 84.4s
batsat 34 8 42 358 3002.0s 71.5s

graph

Global evaluation (courtesy of Mathias Fleury)

Mathias Fleury has run a variety of SAT solvers on 3313 preprocessed problems from SAT-comp 2009 to 2017. He ran the benchs on the MPI cluster with 30 minute timeout and 10GB RAM per solver.

First Run

prover sat unsat solved unknown total-time avg-time
isasat 371 315 686 2083 216087.0s 314.996s
versat 198 144 342 2427 69745.0s 203.933s
minisat 683 722 1405 1364 458533.0s 326.358s
microsat 448 373 821 1948 226881.0s 276.347s
cadical 512 481 993 1773 269024.8s 270.921s
glucose 617 708 1325 1444 386293.0s 291.542s
minisatml 541 409 950 1819 299271.0s 315.022s
zchaff 311 239 550 2219 143485.0s 260.882s
cms 683 722 1405 1364 458533.0s 326.358s

graph

The solvers are:

  • isasat verified SAT solver extracted from Isabelle, in SML
  • versat a verified SAT solver
  • minisat
  • microsat a tiny SAT solver in 200 lines of C
  • cadical a modern SAT solver
  • glucose a classic fork of minisat
  • minisat-ml, the current work
  • zchaff one of the first advanced CDCL solvers
  • cms (cryptominisat) an advanced SAT solver

Second Run

This time with more recent versions of glucose and Cadical:

prover sat unsat solved unknown total-time avg-time
isasat 371 315 686 2083 216087.0s 314.996s
versat 198 144 342 2427 69745.0s 203.933s
minisat 617 509 1126 1643 324602.0s 288.279s
microsat 448 373 821 1948 226881.0s 276.347s
cadical 622 712 1334 1435 434119.4s 325.427s
glucose 617 708 1325 1444 386293.0s 291.542s
minisatml 541 409 950 1819 299271.0s 315.022s
zchaff 311 239 550 2219 143485.0s 260.882s
cms 683 722 1405 1364 458533.0s 326.358s

graph

Discussion

The differences in performance can be explained by at least these factors:

  • the write barrier, which isn't easy to elide since we use a polymorphic 'a Vec.t for convenience. Specializing Vec.t on immediate/non-immediate types would certainly help here.
  • more cache misses (since all data in OCaml is stored as 64-bit words on x86_64, whereas minisat uses more compact 32-bit integers everywhere)
  • array bound checking (although we use unsafe_get and unsafe_set quite a lot)
  • more arithmetic for accessing slices in the clause allocator: to get the i-th literal of a clause, we access alloc.memory.(c + 1 + i). The clause c is an offset in the large integer array alloc.memory, and it points to the clause header, followed by literals. In contrast, the C++ version provides a local temporary struct with the header and a direct pointer to the first literal; subsequent accesses to literals are just c.lits[i]. Here rust shines by providing a bound-checked slice whose lifetime is properly limited, preventing the temporary clause struct from becoming stale.

The differences in behavior come from discrepancies in the VSIDS heuristics, due to different floating point computations. At some point in non trivial problems, variable activities in the C++ and OCaml versions diverge slightly (after decay and renormalization, which involves a multiplication by 1e-20), which leads to a different variable being picked as a decision.

Msat seems to more often die of out-of-memory issues, which is not very surprising as it was never optimized to tackle such challenging workloads — its purpose is to serve as a CDCL(T) backend, not a pure SAT solver.

A sample flamegraph obtained via ./src/scripts/profile.sh _build/default/src/bin/minisat.exe benchs/basic/3bitadd_31.cnf.gz shows the hotspots of the OCaml code (the problem takes ~5.5s to solve).

Notes

the graphs were generated using Alexey Ignatiev's mkplot script.