Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track transaction performance through various stage using random mask #34789

Closed

Conversation

lijunwangs
Copy link
Contributor

@lijunwangs lijunwangs commented Jan 16, 2024

Problem

Enable the system to track transaction processing performance through various stage based on probability.

Summary of Changes

Based on the https://docs.google.com/document/d/1ig1rC0dk-ddi33JIqG9EZ4ZSq9xAJpT9fQTPaZFi_vw/edit.
We use a randomly generated 12 bits integer as a mask to match the transaction's signature. If it is matched, we mark the packet for tracking for performance in the Meta's mask. This is used efficiently down stream without doing mask matching. For these matched packets we report processing time for: fetch, sigverify and banking stage.

Fixes #

Copy link

codecov bot commented Jan 19, 2024

Codecov Report

Attention: Patch coverage is 73.68421% with 40 lines in your changes are missing coverage. Please review.

Project coverage is 81.7%. Comparing base (e8c87e8) to head (5ed6a58).
Report is 1 commits behind head on master.

❗ Current head 5ed6a58 differs from pull request most recent head 6a4f974. Consider uploading reports for the commit 6a4f974 to get more accurate results

Additional details and impacted files
@@            Coverage Diff            @@
##           master   #34789     +/-   ##
=========================================
- Coverage    81.8%    81.7%   -0.1%     
=========================================
  Files         834      835      +1     
  Lines      224815   224966    +151     
=========================================
+ Hits       183919   184006     +87     
- Misses      40896    40960     +64     

@lijunwangs lijunwangs force-pushed the track_transaction_performance branch 2 times, most recently from 4d4ba9a to 0991665 Compare January 22, 2024 19:18
Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you run some benchmarks with and without this tracking. it seems extremely likely to add detrimental overhead

core/src/banking_stage/consumer.rs Outdated Show resolved Hide resolved
@@ -206,6 +206,26 @@ impl Consumer {
.slot_metrics_tracker
.increment_retryable_packets_count(retryable_transaction_indexes.len() as u64);

// Now we track the performance for the interested transactions which is not in the retryable_transaction_indexes
// We assume the retryable_transaction_indexes is already sorted.
for (index, packet) in packets_to_process.iter().enumerate() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there somewhere that we can put this where we aren't adding an iteration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see an easy way to do that -- I would like avoid smudge on the SanitizedTransaction. But I think I can make this more efficient -- I do not need the binary search on the retryable index, I could do it in one simple loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the actual iterations not like one stack frame deeper in most cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done many levels down using SanitizedTransaction which does not have information about the start_time.

Copy link
Contributor

@apfitzge apfitzge Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs to be done elsewhere; this is never called with the new scheduler (for non-votes), so we'd never get any metrics if that's enabled. By the time this change gets in, it will be the default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apfitzge what is the new scheduler's name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked to @apfitzge offline, will address both issue raised by Trent and Andrew.

core/src/banking_stage/consumer.rs Outdated Show resolved Hide resolved
core/src/banking_stage/consumer.rs Outdated Show resolved Hide resolved
core/src/banking_stage/immutable_deserialized_packet.rs Outdated Show resolved Hide resolved
streamer/src/nonblocking/quic.rs Outdated Show resolved Hide resolved
transaction-metrics-tracker/src/lib.rs Outdated Show resolved Hide resolved
transaction-metrics-tracker/src/lib.rs Outdated Show resolved Hide resolved
transaction-metrics-tracker/src/lib.rs Outdated Show resolved Hide resolved
transaction-metrics-tracker/src/lib.rs Outdated Show resolved Hide resolved
@lijunwangs
Copy link
Contributor Author

can you run some benchmarks with and without this tracking. it seems extremely likely to add detrimental overhead

I do not see any material differences with/without the change using bench-tps:

Rpc-client with fix

lijun_solana_com@lijun-dev8:~/solana$ ./cargo run --release --bin solana-bench-tps -- -u http://35.233.177.221:8899 --identity ~/.config/solana/id.json --tx_count 1000 --thread-batch-sleep-ms 0 -t 20 --duration 60 -n 35.233.177.221:8001 --read-client-keys ~/gce-keypairs.yaml --use-rpc-client

Highest TPS: 32385.70 sampling period 1s max transactions: 432146 clients: 1 drop rate: 0.63

[2024-01-30T09:42:05.249223482Z INFO solana_bench_tps::bench] Average TPS: 7171.7437

Highest TPS: 31997.50 sampling period 1s max transactions: 493701 clients: 1 drop rate: 0.57

[2024-01-30T09:43:53.539368674Z INFO solana_bench_tps::bench] Average TPS: 8209.673

Highest TPS: 25217.14 sampling period 1s max transactions: 378571 clients: 1 drop rate: 0.67

[2024-01-30T18:05:42.712776385Z INFO solana_bench_tps::bench] Average TPS: 6204.4365

Highest TPS: 41385.95 sampling period 1s max transactions: 288390 clients: 1 drop rate: 0.75

[2024-01-30T18:30:48.060532674Z INFO solana_bench_tps::bench] Average TPS: 4780.393

Highest TPS: 31080.66 sampling period 1s max transactions: 373902 clients: 1 drop rate: 0.67

[2024-01-31T00:34:06.877848441Z INFO solana_bench_tps::bench] Average TPS: 6120.311

Master without change:

Highest TPS: 28262.44 sampling period 1s max transactions: 520460 clients: 1 drop rate: 0.55

[2024-01-31T02:12:00.852692886Z INFO solana_bench_tps::bench] Average TPS: 8523.614

Highest TPS: 34402.00 sampling period 1s max transactions: 354908 clients: 1 drop rate: 0.67

[2024-01-31T02:21:19.220559104Z INFO solana_bench_tps::bench] Average TPS: 5904.669

Highest TPS: 29222.58 sampling period 1s max transactions: 264552 clients: 1 drop rate: 0.76

[2024-01-31T19:02:38.746203621Z INFO solana_bench_tps::bench] Average TPS: 4400.843

Highest TPS: 39137.56 sampling period 1s max transactions: 384987 clients: 1 drop rate: 0.65

[2024-01-31T19:04:17.896044976Z INFO solana_bench_tps::bench] Average TPS: 6338.43

Highest TPS: 29189.24 sampling period 1s max transactions: 388258 clients: 1 drop rate: 0.64

[2024-01-31T19:06:34.395401402Z INFO solana_bench_tps::bench] Average TPS: 6257.952

@github-actions github-actions bot added stale [bot only] Added to stale content; results in auto-close after a week. and removed stale [bot only] Added to stale content; results in auto-close after a week. labels Feb 15, 2024
Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think benchtps is how we want to be testing this. it's too far away and is biased by everything after the scheduler, that we don't care about. i think we should be using the benchmarks in perf, sigverify and banking instead. i'm especially concerned about whether we negatively impacting our capacity in front of sigverify load shedding

core/src/banking_stage/consumer.rs Outdated Show resolved Hide resolved
transaction-metrics-tracker/src/lib.rs Show resolved Hide resolved
streamer/src/nonblocking/quic.rs Outdated Show resolved Hide resolved
@@ -206,6 +206,26 @@ impl Consumer {
.slot_metrics_tracker
.increment_retryable_packets_count(retryable_transaction_indexes.len() as u64);

// Now we track the performance for the interested transactions which is not in the retryable_transaction_indexes
// We assume the retryable_transaction_indexes is already sorted.
for (index, packet) in packets_to_process.iter().enumerate() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the actual iterations not like one stack frame deeper in most cases?

Comment on lines +222 to +225
debug!(
"Banking stage processing took {duration:?} for transaction {:?}",
packet.transaction().get_signatures().first()
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really misleading. It didn't take that amount of time to process this transaction. It took that time to process the batch of transactions.

@@ -244,6 +244,9 @@ pub(crate) struct ProcessPacketsTimings {
// Time spent running the cost model in processing transactions before executing
// transactions
pub cost_model_us: u64,

// banking stage processing time histogram for sampled packets
pub process_sampled_packets_us_hist: histogram::Histogram,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will almost certainly have between 0-2 counts per block on mnb, meaning it will probably be so noisy as to be useless.

Copy link
Contributor

@apfitzge apfitzge Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we care much more about time from sigverify to banking stage picking up the packet from channel, i.e. "time-to-scheduler"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sampling mechanism goal was to sample the system without tracking everything. Even a couple of data points per slot over long time time can still provide insight into where the time is spent over various stage. Time to scheduler is good stats to have. I will defer to future PRs to reduce this change set.

trying txn mask matching

output txn to figure out why txn is not exactly matched

Use 62 and 61 portion

track fetch performance using random txn mask

track sigverify performance using random txn mask

track banking stage performance using random txn mask

adding missing cargo lock file

add debug messages

Revert "add debug messages"

This reverts commit 96aead5.

fixed some clippy issues

check-crate issue

Fix a clippy issue

Fix a clippy issue

debug why txns in banking stage shows fewer performance tracking points

debug why txns in banking stage shows fewer performance tracking points

debug why txns in banking stage shows fewer performance tracking points

debug why txns in banking stage shows fewer performance tracking points

get higher PPS for testing purpose

more debug messages on why txn is skipped

display if tracer packet in log

add debug before calling processing_function

debug at the initial of banking stage

track if a txn is forwarded

dependency order

missing cargo file

clean up debug messages

Do not use TRACER_PACKET, use its own bit

rename some functions

addressed some comments from Trent

Update core/src/banking_stage/immutable_deserialized_packet.rs

Co-authored-by: Trent Nelson <[email protected]>

addressed some comments from Trent

Do not use binary_search, do simple compare in one loop
@lijunwangs lijunwangs force-pushed the track_transaction_performance branch from 697e141 to d566aa0 Compare March 1, 2024 09:48
@willhickey
Copy link
Contributor

This repository is no longer in use. Please re-open this pull request in the agave repo: https://github.com/anza-xyz/agave

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants