Add GC heuristic #3997

JoJoDeveloping · 2024-10-27T15:51:42Z

Miri's GC runs every 10000 basic block, by default. It turns out that sometimes, it is better if the GC runs more often. This is especially true for some Tree Borrows programs, since in Tree Borrow, memory accesses take time linear in the size of a certain tree, and this tree can grow quite large, but is usually compacted significantly at every GC run. For some programs, it can be advantageous to run the GC every 200 blocks, or even more often.

Unfortunately, such a GC frequency would slow down other programs. To achieve a balance, this PR proposes a "heuristics" that allows running the GC more often based on how large the Tree Borrows tree grows. It is quite likely that other parts of Miri want to plug into this mechanisms as well, eventually. However, such concrete other cases are further work, as long as the system is designed to be general enough. This PR only contributes integration with Tree Borrows. It is also possible that the actual heuristics need more tweaks, but for now, the performance improvements look promising.

It is further likely that the overall design needs some change, since it currently relies on Cells. Also, I was not too sure where to put the new enums. So this PR is WIP for now, although it is functionally complete.

saethlin · 2024-10-27T16:40:08Z

Hm, I'm wary of this heuristic because I tried something like it to solve the inverse problem and didn't get good results. Do you have any demonstrations of workloads where this implementation makes a huge difference?

RalfJung · 2024-10-28T06:22:49Z

My main immediate concern is that this can lead to significant slowdown if there are large trees that do not get compacted. So I feel like if we have such a heuristic, it should also take into account "did the big tree actually shrink on the last GC run" or something like that, and increase the GC period if the answer is "no".

RalfJung · 2024-10-28T06:29:00Z

What does this PR do on MIRIFLAGS=-Zmiri-tree-borrows ./miri bench? Please show numbers from before and after this change.

(Unfortunately we don't have a way to automatically compare two bench reports and print "things got X% slower/faster"... see #3999 )

JoJoDeveloping · 2024-10-28T15:27:54Z

My main immediate concern is that this can lead to significant slowdown if there are large trees that do not get compacted. So I feel like if we have such a heuristic, it should also take into account "did the big tree actually shrink on the last GC run" or something like that, and increase the GC period if the answer is "no".

The way this works is by initially having some exponential back off with how often the GC runs (whenever the size doubles since the last GC run). Eventually that goes to linear (whenever a tree grew by X nodes since the last GC run). The reason is that I recon that most programs do not have many different provenances around (by using a large array storing pointers only differing in their provenance, for example). For the tree to not get compacted by the GC requires that programs execute lots of retags and then keep all these pointers alive. On such workloads, the GC already does nothing. Even even such programs would have to create and store a new reference every 7 basic blocks (with the current settings) for it to be slowed down by this patch. But note that for programs with a lot of memory use, the GC already needs to be tuned down (probably) for things to work.

In the end it is all trade-offs. This one seems to be worth it. I am not sure the current performance testsuite consists of programs where this shows (one way or another) but I can add some where it does.

JoJoDeveloping · 2024-10-28T15:33:17Z

After some musing with @saethlin, we figured the better answer than having a GC might be to use (some form of) ref-counting, at least in theory. Unfortunately that would require a large refactoring (and past attempts of doing it failed?
), and so for now we seem to be stuck with the GC, for better or worse.

RalfJung · 2024-10-28T17:24:58Z

The way this works is by initially having some exponential back off with how often the GC runs (whenever the size doubles since the last GC run).

Right, that makes sense, it just wasn't clear from the PR description. :)

Eventually that goes to linear (whenever a tree grew by X nodes since the last GC run).

That's a bit more surprising. What is the rationale for that? Did you compare the PR with and without this part?

Many programs not having a ton of provenances would already be captured by the first part, no? (Also I am not sure that's true, each live reference has its own provenance and there can easily be a ton of those if someone has a large array of things that involve references.)

I am not sure the current performance testsuite consists of programs where this shows (one way or another) but I can add some where it does.

Yes that would be good -- preferably one benchmark demonstrating a clear benefit. (We don't want to grow our number of benchmarks too much, either...)

RalfJung · 2024-10-28T17:31:05Z

src/provenance_gc.rs

+            ProvenanceGcSettings::Regularly { interval } => this.machine.since_gc >= interval,
+            ProvenanceGcSettings::Heuristically =>
+            // no GC run if the last one was recently
+                this.machine.since_gc >= 75


75 is an oddly specific number... did you try various values? On which benchmark?

It is mostly based on vibes/previous experience with running TB on a bunch of crates. I found that usually, you find some local minimum (of the GC frequency -> performance function) around 100-500, so 75 seemed like a reasonable lower bound.

Unfortunately I don't think I can pack these all up into a nice test case. But in general, you want to put the cutoff somewhere. If you feel like a different number should be chosen, suggest a better one.

But in general, you want to put the cutoff somewhere.

This is somewhat surprising, given that the heuristic only kicks in when a tree doubled in size. Did you try different values with the heuristic already in place? Measurements done before you added the heuristic wouldn't really be valid any more.

Also note that this code here is not TB-specific.

This looks more like "guarding against an overeager heuristic" than anything else? Given it is entirely based on vibes, I'd go for 100 and add a comment for why the number was picked. But I am not sure we want a lower bound here at all; each heuristic is really responsible for ensuring a lower bound itself since it has more data.

This looks more like "guarding against an overeager heuristic" than anything else?

Indeed, that is also one of the points. It ensures that performance does not tank when a heuristic goes wrong. With the current setup, all the heuristics can do is request the GC happens earlier, so some measures should be taken that it does not happen too early.

A buggy heuristic making the GC fire all the time is bad even if we "cap" it at some number here this way. So this would paper over the issue and make it harder to notice, not fix it.

RalfJung · 2024-10-29T06:28:06Z

src/borrow_tracker/tree_borrows/tree.rs

+        let last = self.nodes_at_last_gc.max(50);
+        // trigger the GC if the tree doubled since the last run,
+        // or otherwise got "significantly" larger.
+        // Note that for trees < 100 nodes, nothing happens.


Suggested change

// Note that for trees < 100 nodes, nothing happens.

// Note that for trees <= 100 nodes, nothing happens.

JoJoDeveloping · 2024-10-29T11:31:18Z

We don't want to grow our number of benchmarks too much, either...

Do we not? Changes like the one in this PR would be much easier to find/develop/test if the suite of benchmarks was more diverse and larger in general. It its current form, it is useful only to ensure there are no really major performance regressions. But that's a separate issue.

RalfJung · 2024-10-29T11:34:22Z

More benchmarks means they take forever to run, and at least until we figure out #3999 also forever to analyze.

bors · 2024-11-02T21:53:23Z

☔ The latest upstream changes (presumably #4009) made this pull request unmergeable. Please resolve the merge conflicts.

Add GC heuristic

1d7af29

RalfJung reviewed Oct 28, 2024

View reviewed changes

RalfJung reviewed Oct 29, 2024

View reviewed changes

RalfJung self-assigned this Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GC heuristic #3997

Add GC heuristic #3997

JoJoDeveloping commented Oct 27, 2024 •

edited

Loading

saethlin commented Oct 27, 2024

RalfJung commented Oct 28, 2024

RalfJung commented Oct 28, 2024

JoJoDeveloping commented Oct 28, 2024 •

edited

Loading

JoJoDeveloping commented Oct 28, 2024 •

edited

Loading

RalfJung commented Oct 28, 2024 •

edited

Loading

RalfJung Oct 28, 2024

JoJoDeveloping Oct 28, 2024 •

edited

Loading

RalfJung Oct 29, 2024

JoJoDeveloping Oct 29, 2024

RalfJung Oct 29, 2024

RalfJung Oct 29, 2024

JoJoDeveloping commented Oct 29, 2024

RalfJung commented Oct 29, 2024

bors commented Nov 2, 2024

	// Note that for trees < 100 nodes, nothing happens.
	// Note that for trees <= 100 nodes, nothing happens.

Add GC heuristic #3997

Are you sure you want to change the base?

Add GC heuristic #3997

Conversation

JoJoDeveloping commented Oct 27, 2024 • edited Loading

saethlin commented Oct 27, 2024

RalfJung commented Oct 28, 2024

RalfJung commented Oct 28, 2024

JoJoDeveloping commented Oct 28, 2024 • edited Loading

JoJoDeveloping commented Oct 28, 2024 • edited Loading

RalfJung commented Oct 28, 2024 • edited Loading

RalfJung Oct 28, 2024

Choose a reason for hiding this comment

JoJoDeveloping Oct 28, 2024 • edited Loading

Choose a reason for hiding this comment

RalfJung Oct 29, 2024

Choose a reason for hiding this comment

JoJoDeveloping Oct 29, 2024

Choose a reason for hiding this comment

RalfJung Oct 29, 2024

Choose a reason for hiding this comment

RalfJung Oct 29, 2024

Choose a reason for hiding this comment

JoJoDeveloping commented Oct 29, 2024

RalfJung commented Oct 29, 2024

bors commented Nov 2, 2024

JoJoDeveloping commented Oct 27, 2024 •

edited

Loading

JoJoDeveloping commented Oct 28, 2024 •

edited

Loading

JoJoDeveloping commented Oct 28, 2024 •

edited

Loading

RalfJung commented Oct 28, 2024 •

edited

Loading

JoJoDeveloping Oct 28, 2024 •

edited

Loading