Improve resolver speed again #14665

x-hgg-x · 2024-10-10T17:17:52Z

What does this PR try to resolve?

Follow-up of #14663 which has been splitted.

How should we test and review this PR?

Commit 1 enables the clippy::derive_ord_xor_partial_ord lint to be sure that Ord and PartialOrd are implemented correctly everywhere in the repository.

Commit 2 moves the ActivationKey type in core.

Commit 3 adds interning for the resolver activation keys so they can be used with nohash-hasher, decreasing resolving time by another 30%. This is possible since the interning is implemented with a static HashSet, so each interned element can be reduced to its address.

Performance comparison with the test added in the previous PR by setting LAST_CRATE_VERSION_COUNT = 100:

commit	duration
master	16s
with rustc-hash	6.7s
with nohash	4.6s

Firefox profiles, can be read with https://profiler.firefox.com:
perf.tar.gz

r? Eh2406

Eh2406 · 2024-10-10T18:04:53Z

I benchmarked against an earlier version of this code. Running across all of crates.io (including solana) went from 235.88min to 218.03min that is a solid 7.5% improvement. solana-transaction-status v1.3.10 improve the most going from 49.2s to 35.1s a 14s improvement! On the other hand solana-core v1.0.5 went from 99s to 114s that is 14s worse. Here is the disaggregated data for crates that took >=0.02 to build.
sort_join.csv

x-hgg-x · 2024-10-10T18:39:36Z

I tested solana-core v1.0.5 on my machine, it resolved in exactly the same time before and after this PR.

Eh2406 · 2024-10-10T19:00:42Z

Good to know! I will have to rerun when I get a chance. The most likely explanation is a random hiccup on my computer, that's pretty likely when carefully measuring time of a process that runs for an hour.

x-hgg-x · 2024-10-10T19:32:47Z

I tested solana-core v1.0.5 on my machine, it resolved in exactly the same time before and after this PR.

I ran solana-core v1.0.5 on another machine where I can do a perf, and I observe a 30% improvement with this PR going from 6m25 to 4m40. I may have some throttling problems on the first machine because the timings are not stable.

x-hgg-x · 2024-10-10T21:05:26Z

For solana-core v1.0.5, cargo spent more than 30s dropping the ResolverContext:
flamegraph.svg.tar.gz

Eh2406 · 2024-10-10T22:19:52Z

A more targeted run, that only looked at crates containing the word solana went from 180.61min from master to 162.47min on this PR. With this run solana-core v1.0.5 went from 139s to 112s! So that one may have been a fluke on my part.
out_join.csv

Eh2406 · 2024-10-11T16:43:14Z

src/cargo/core/resolver/context.rs

+type Activations = im_rc::HashMap<
+    ActivationKey,
+    (Summary, ContextAge),
+    nohash_hasher::BuildNoHashHasher<ActivationKey>,


Note to self: This PR does a number of different things, and I'm trying to understand which ones are critical to the performance win. I tried switching this one nohash_hasher::BuildNoHashHasher back to rustc_hash::FxBuildHasher, and the Solana only benchmark I was running yesterday runs in 162.53min (or 164.63min there seems to be a surprising amount of run to run variability).

So it seems that rustc-hash is already very optimized when hashing a single u64:
https://github.com/rust-lang/rustc-hash/blob/master/src/lib.rs#L99.

In this case the performance increase may only be due to fact that we don't hash the full ActivationKeyInner but only its address.

Eh2406 · 2024-10-12T01:25:18Z

I wanted to spend some time teasing apart exactly what part of the old hash was slow. The old hash but modified to hash the pointer of the InternedString instead of its contents got me 25% of the savings. It would be hard to believe that the SemverCompatibility is slow, it's basically two numbers. Hashing for SourceId is surprisingly complicated. (note to self, using SourceId::full_hash is fast but gets the wrong answer.) But I kept getting distracted. Not hashing the SourceId (with the InternedString fix) got about 80% of the savings. If keys only differ by their SourceId a hash map becomes O(n), but n is the number of packages with the same name and major version that only differ by source. Makes me wonder if hashing only by the pointer of the InternedString would also be effective.

With the holiday I will probably not work on this till Monday. Sorry for the delay.

x-hgg-x · 2024-10-12T07:29:52Z

I think we should also define a custom Hash for Dependency, as it is used in the conflict cache and hashing it amounts to 5-10% of the total resolving time.

Eh2406 · 2024-10-14T19:44:12Z

So I was able to get about 50% of benefit was much less churn. I change the order so that we only do the Eq test for SourceId after the others are determined to be equal, and I overrode Hash to look at the InternedString address and not look at SourceId.

#[derive(Clone, PartialEq, Eq, Debug)]
pub struct ActivationsKey(InternedString, SemverCompatibility, SourceId);

impl std::hash::Hash for ActivationsKey {
    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
        std::ptr::NonNull::from(self.0.as_str()).hash(state);
        self.1.hash(state);
        // self.2.hash(state); // Packages that only differ by SourceId are rare enough to not be worth hashing
    }
}

So the other half the performance is because despite being interned SourceId::Eq is not a ptr::eq. Which in turn is because SourceIds concept of identity is a complete mess. The code relies on having to IDs that are Eq but do not have the same values for their fields.

So that leads to two questions about maintainability. Pinging @epage for strong and useful opinions about cleaning up our code base:

Any ideas about how to clean up the definition of "what is a SourceId" so that the structs we've defined actually match the semantics were using? The only thing I can think of is splitting it up into a CanonicalSource that just has the kind and canonical URL, and a SourceInfo that has the other fields. Then struct SourceId {canonical: &'static CanonicalSource, info: &'static SourceInfo}. Then EQ and fast_hash (a new method for us to use in ActivationsKey) can look at the ptr to canonical. But I'm open to other suggestions.
Is the remaining performance benefit worth merging this PR as is, specifically involving adding core::activation_key::ActivationKey with the bike shedding and documentation work? I'm leaning toward no for now. @x-hgg-x has come across a number of good leads for making resolution faster and we should reevaluate this after we followed up on those. Ed had some initial feedback in Improve resolver speed #14663, so I wanted his opinion here.

bors · 2024-10-15T15:38:16Z

☔ The latest upstream changes (presumably #14692) made this pull request unmergeable. Please resolve the merge conflicts.

rustbot assigned Eh2406 Oct 10, 2024

rustbot added A-dependency-resolution Area: dependency resolution and the resolver S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 10, 2024

x-hgg-x mentioned this pull request Oct 10, 2024

Improve resolver speed #14663

Merged

x-hgg-x force-pushed the resolver-perf-2 branch 2 times, most recently from 8979c6c to 0ff9d29 Compare October 10, 2024 18:15

x-hgg-x added 3 commits October 10, 2024 22:53

fix: enable clippy::derive_ord_xor_partial_ord lint

972df9e

fix: move ActivationsKey into core

de3e0b8

fix: use nohash in the resolver for the activation keys

ee74105

x-hgg-x force-pushed the resolver-perf-2 branch from 0ff9d29 to ee74105 Compare October 10, 2024 20:54

x-hgg-x mentioned this pull request Oct 10, 2024

Investigate why Solana is so slow. Eh2406/pubgrub-crates-benchmark#6

Open

Eh2406 reviewed Oct 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve resolver speed again #14665

Improve resolver speed again #14665

x-hgg-x commented Oct 10, 2024 •

edited

Loading

Eh2406 commented Oct 10, 2024

x-hgg-x commented Oct 10, 2024 •

edited

Loading

Eh2406 commented Oct 10, 2024

x-hgg-x commented Oct 10, 2024 •

edited

Loading

x-hgg-x commented Oct 10, 2024

Eh2406 commented Oct 10, 2024

Eh2406 Oct 11, 2024 •

edited

Loading

x-hgg-x Oct 11, 2024

Eh2406 commented Oct 12, 2024

x-hgg-x commented Oct 12, 2024

Eh2406 commented Oct 14, 2024

bors commented Oct 15, 2024

Improve resolver speed again #14665

Are you sure you want to change the base?

Improve resolver speed again #14665

Conversation

x-hgg-x commented Oct 10, 2024 • edited Loading

What does this PR try to resolve?

How should we test and review this PR?

Eh2406 commented Oct 10, 2024

x-hgg-x commented Oct 10, 2024 • edited Loading

Eh2406 commented Oct 10, 2024

x-hgg-x commented Oct 10, 2024 • edited Loading

x-hgg-x commented Oct 10, 2024

Eh2406 commented Oct 10, 2024

Eh2406 Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

x-hgg-x Oct 11, 2024

Choose a reason for hiding this comment

Eh2406 commented Oct 12, 2024

x-hgg-x commented Oct 12, 2024

Eh2406 commented Oct 14, 2024

bors commented Oct 15, 2024

x-hgg-x commented Oct 10, 2024 •

edited

Loading

x-hgg-x commented Oct 10, 2024 •

edited

Loading

x-hgg-x commented Oct 10, 2024 •

edited

Loading

Eh2406 Oct 11, 2024 •

edited

Loading