My experience with slowness is that slowness is most often caused by Coq doing work it doesn't need to do. Usually, this work comes in the form of retypechecking terms it shouldn't need to retypecheck, and occasionally it comes in the form of (re)doing typeclass search that can be known to be useless. I don't want to have to micro-optimize my tactic scripts to get 1000x speed-ups. In general, I have the sense that I can't trust Coq to be fast, and I can't trust scripts that are fast to remain fast in future versions of Coq unless I'm proactive about reporting bugs and performance regressions.
Some examples of Coq being really slow:
-
Bug #5148 - Ltac reification of moderate sized terms can take over 90 hours - see
slow_fiat_crypto_reify/README.md
for more details. -
Bug #5146 -
nsatz
can sometimes spend 10 minutes or more in reification - seeslow_nsatz.v
. -
Bug #3441 -
pose proof H as k
is sometimes an order of magnitude slower thanpose H as k; clearbody k
- seeslow_pose_proof.v
for an example of a 4x slowdown. -
Bug #3280
match goal with |- ?f ?x = ?g ?y => idtac end
can be arbitrarily slow - seeevar-normalization-slowness/very_silly_slow_tactic.v
andevar-normalization-slowness/exercise-tactics/exercise-tactics.sh
. Also, see this graph of the time of tactics vs the size of goal -
Bug #4819 - interactive time is impacted by large terms that don't exist anymore in the goal -
interactive_proof_state_slowness.v
(interactive_proof_state_slowness.sh
for Coq 8.5, unless you want to usecoqtop -emacs -time < interactive_proof_state_slowness.v
) -
Bug #4662 -
unfold ... in ...
should insert a cast annotation, elseDefined
can take over 6 minutes when it doesn't need to - seeslow_unfold.v
-
Bug #4637 -
vm_compute in ...
should insert vm_casts - seemissing_vm_casts.v
-
Bug #4776 - there should be a way to terminate typeclass resolution early - see
slow_failing_setoid_rewrite.v
-
Bug #4636 -
set (x := y)
can be 100x slower thanpose y as x; change y with x
- seeslow_set.v
. (The reverse can also happen, wherechange
is orders of magnitude slower thanset
. See bug #4779 in the next bullet.) -
Bug #4639 - running
simpl @snd
can take over 158 hours - seeslow_fiat_simpl_example/README.md
-
Bug #4657 -
omega
is slow onx - y - (x - y - x) = x - y
(takes about 1 s, when it solves an equivalent equation in 0.06 s) - seeslow_omega.v
-
Bug #4810 - if the type of the field is a projection from a record, then the performance of
field_simplify_eq
is linear in the number of record fields of that record, which is absurd, given that the fields of the record shouldn't matter. -
Bug #3448 -
Hint Extern (foo _) => ...
is significantly slower thanHint Extern foo => ...
; typeclass resolution is slow atintro
- seeslow_typeclasses_intro.v
-
Bug #3425 - unification can be very slow - see
slow_unification.v
. (Matthieu's explanation in the file.) -
Bug #5038 -
Definition
does head-zeta-reduction. Seeslow_lets.v
for an example of exponential behavior.
-
Bug #5096 -
vm_compute
is exponentially slower thannative_compute
,lazy
, andcompute
. Seeslow_fiat_crypto_vm_compute/README.md
for more details and instructions on running. -
Bug #3450 -
End foo.
is slower in trunk in some cases; it's also slower in batch mode than in interactive mode. The original slowdown was due to hashconsing of universes (then to substituting universes) on leaving a section. But now the example has gotten orders of magnitude slower, in earlier places. Seesuper_slow_end_section_and_record.v
(Coq 8.5 only). ThePseudofunctor
record takes 999 s to be defined, and you can watch the<projection> is definted
messages appear one by one, very slowly. -
Bug #4187 -
admit
is slow on a goal of the formG' -> Prop
when it's fast on a goal of the formG'
- seeslow_admit.v
-
Bug #4821 -
simple eapply
can take 2 hours - seeslow_fiat_eapply_example/src/Parsers/Refinement/SharpenedJavaScriptAssignmentExpression.v
-
Bug #4777 - unless
Set Silent
is on, the printing time is impacted by large terms that don't print - seeinteractive_hidden_slowness.v
(interactive_hidden_slowness.sh
for Coq 8.5, unless you want to usecoqtop -emacs -time < interactive_hidden_slowness.v
) -
Bug #4537 - Coq 8.5 is slower (sometimes by as much as 5x-6x) than Coq 8.5beta2 in typeclass resolution with identical traces. Pierre-Marie Pédrot said:
All right, I think I'm the responsible here. This is probably because of commit a895b2c0ca that introduced a performance enhancement of auto hints in the non-polymorphic case. The polymorphic case has a nice comment
(** FIXME: We're being inefficient here because we substitute the whole
evar map instead of just its metas, which are the only ones
mentioning the old universes. *)
which is exactly the hotspot I'm seeing right there. I'll write a quick fix.
and then, after fixing that, said:
Note that there is still a hotspot that looks suspicious to me. Indeed, the code for Evarsolve.noccur_evar whose goal is essentially to ensure that an evar does not appear in a term repeatedly calls the following snippet of code
let c =
try Retyping.expand_projection env evd p c []
with Retyping.RetypeError _ ->
(* Can happen when called from w_unify which doesn't assign evars/metas
eagerly enough *) c
in occur_rec acc c
which is responsible for ~50% of the time passed in one of your tactics for no good reason.
- Bug #3447 - Some
Defined
s are 30x slower in trunk - most of the time was spent hashconsing universes.
- Bug #4669 -
printing of dependent evars in ProofGeneral can slow emacs
interaction to a crawl (because printing of dependent evars in the
goal does not respect
Set Printing Width
) - seeslow_dependent_evars_in_pg.v
, and make sure to let it setcoq-end-goals-regexp-show-subgoals
tonil
appropriately.
#4640, Has to do with slow [End Section] in special case of no section variables.
- Bug #4625 -
checking
Defined
/Qed
causes coqtop to drop the most recent proof state. This makes it a pain to debug things like:
Definition foo : bar.
Proof.
some_slow_tactic.
some_tactic_that_makes_Defined_slow.
Defined.
- Bugs #4643,
#4642, and
#4779:
Defined.
sometimes takes 2 minutes;End Section
can take 30 seconds, even though there are no section variables, no tactics, no notations, noLet
s, and only one or twoDefinition
s;cbv [some identifiers]
can run through 64 GB of RAM in 15 minutes; andpose y as x; change y with x in H
can be 1300x slower thanset (x := y) in H
; respectively. Seeslow_fiat_examples/README.md
for more details and instructions on running. (Be warned, some of the examples of slowness themselves take 20 minutes to compile.)