Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loader fixes #15285

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Loader fixes #15285

wants to merge 1 commit into from

Conversation

georgemitenkov
Copy link
Contributor

@georgemitenkov georgemitenkov commented Nov 15, 2024

Description

  • Capture global cache reads as well.
  • Issue read-before-write for modules at commit.

How Has This Been Tested?

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Nov 15, 2024

⏱️ 3h 1m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
execution-performance / single-node-performance 1h 55m 🟥🟩🟥
execution-performance / test-target-determinator 13m 🟩🟩🟩
rust-move-tests 12m 🟩
rust-move-tests 10m
check-dynamic-deps 7m 🟩🟩🟩🟩🟩
rust-doc-tests 5m 🟩
rust-cargo-deny 4m 🟩🟩
test-target-determinator 4m 🟩
check 4m 🟩
semgrep/ci 2m 🟩🟩🟩🟩🟩
fetch-last-released-docker-image-tag 2m 🟩
general-lints 1m 🟩🟩🟩
rust-move-tests 55s
file_change_determinator 33s 🟩🟩🟩
permission-check 15s 🟩🟩🟩🟩🟩

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
execution-performance / single-node-performance 38m 16m +140%

settingsfeedbackdocs ⋅ learn more about trunk.io

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@georgemitenkov georgemitenkov marked this pull request as ready for review November 15, 2024 02:39
@georgemitenkov georgemitenkov requested review from msmouse and igor-aptos and removed request for sasha8 and danielxiangzl November 15, 2024 02:39
@georgemitenkov georgemitenkov added CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR CICD:run-execution-performance-test Run execution performance test CICD:run-execution-performance-full-test Run execution performance test (full version) labels Nov 15, 2024

This comment has been minimized.

This comment has been minimized.

Comment on lines 298 to 302
enum ModuleRead<DC, VC, S> {
/// Read from the cross-block module cache.
GlobalCache,
GlobalCache(Arc<ModuleCode<DC, VC, S>>),
/// Read from per-block cache ([SyncCodeCache]) used by parallel execution.
PerBlockCache(Option<(Arc<ModuleCode<DC, VC, S>>, Option<TxnIndex>)>),
Copy link
Contributor

@igor-aptos igor-aptos Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why do we distinguish reads here based on where we got the data from? also what is Option<TxnIndex> in the PerBlockCache ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option - module does not exist (in StateView even).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different reads - different validations. We need to check that global reads are still valid, and per-block reads have the same version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stupid formatting, didn't show I was referring to TxnIndex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - None is a storage version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it better than Result<TxnIndex, StorageVersion>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why distinction between storage version and global cache?

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 3b8ce5cfd7c7b2a6a39f4c7b6cb11da067bc8bd0

two traffics test: inner traffic : committed: 14537.54 txn/s, latency: 2736.06 ms, (p50: 2700 ms, p70: 2700, p90: 2700 ms, p99: 3000 ms), latency samples: 5527500
two traffics test : committed: 99.90 txn/s, latency: 1562.01 ms, (p50: 1400 ms, p70: 1400, p90: 1600 ms, p99: 8500 ms), latency samples: 1840
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 1.992, avg: 1.547", "ConsensusProposalToOrdered: max: 0.330, avg: 0.293", "ConsensusOrderedToCommit: max: 0.375, avg: 0.361", "ConsensusProposalToCommit: max: 0.667, avg: 0.653"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.83s no progress at version 2285665 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 8.81s no progress at version 2285663 (avg 8.81s) [limit 15].
Test Ok

Copy link
Contributor

✅ Forge suite compat success on 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> 3b8ce5cfd7c7b2a6a39f4c7b6cb11da067bc8bd0

Compatibility test results for 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> 3b8ce5cfd7c7b2a6a39f4c7b6cb11da067bc8bd0 (PR)
1. Check liveness of validators at old version: 2bb2d43037a93d883729869d65c7c6c75b028fa1
compatibility::simple-validator-upgrade::liveness-check : committed: 14710.60 txn/s, latency: 2323.60 ms, (p50: 2100 ms, p70: 2100, p90: 2400 ms, p99: 7200 ms), latency samples: 471540
2. Upgrading first Validator to new version: 3b8ce5cfd7c7b2a6a39f4c7b6cb11da067bc8bd0
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 6691.53 txn/s, latency: 4135.27 ms, (p50: 4500 ms, p70: 5100, p90: 5800 ms, p99: 5900 ms), latency samples: 121180
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 7063.82 txn/s, latency: 4564.10 ms, (p50: 4700 ms, p70: 4900, p90: 6400 ms, p99: 6900 ms), latency samples: 238140
3. Upgrading rest of first batch to new version: 3b8ce5cfd7c7b2a6a39f4c7b6cb11da067bc8bd0
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 7546.09 txn/s, latency: 3741.54 ms, (p50: 4100 ms, p70: 4400, p90: 4500 ms, p99: 4600 ms), latency samples: 142600
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6553.15 txn/s, latency: 4583.91 ms, (p50: 4700 ms, p70: 4800, p90: 5100 ms, p99: 6600 ms), latency samples: 250040
4. upgrading second batch to new version: 3b8ce5cfd7c7b2a6a39f4c7b6cb11da067bc8bd0
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 10924.15 txn/s, latency: 2531.49 ms, (p50: 2500 ms, p70: 3200, p90: 3500 ms, p99: 3700 ms), latency samples: 190920
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 10570.88 txn/s, latency: 2928.58 ms, (p50: 2700 ms, p70: 3500, p90: 3700 ms, p99: 4100 ms), latency samples: 348820
5. check swarm health
Compatibility test for 2bb2d43037a93d883729869d65c7c6c75b028fa1 ==> 3b8ce5cfd7c7b2a6a39f4c7b6cb11da067bc8bd0 passed
Test Ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR CICD:run-execution-performance-full-test Run execution performance test (full version) CICD:run-execution-performance-test Run execution performance test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants