Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transaction performance tracking -- streamer stage #257

Conversation

lijunwangs
Copy link

@lijunwangs lijunwangs commented Mar 14, 2024

Problem

Enable the system to track transaction processing performance through various stage based on probability. This is the resubmit of the PR solana-labs#34789 against Agave for the first part: streamer only to reduce PR size

Summary of Changes

Based on the https://docs.google.com/document/d/1ig1rC0dk-ddi33JIqG9EZ4ZSq9xAJpT9fQTPaZFi_vw/edit.
We use a randomly generated 12 bits integer as a mask to match the transaction's signature. If it is matched, we mark the packet for tracking for performance in the Meta's mask. This is used efficiently down stream without doing mask matching. For these matched packets we report processing time for: fetch, sigverify and banking stage.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 81.19658% with 22 lines in your changes are missing coverage. Please review.

Project coverage is 81.9%. Comparing base (b3fd87f) to head (7651985).

Additional details and impacted files
@@            Coverage Diff            @@
##           master     #257     +/-   ##
=========================================
- Coverage    81.9%    81.9%   -0.1%     
=========================================
  Files         837      838      +1     
  Lines      226792   226910    +118     
=========================================
+ Hits       185795   185878     +83     
- Misses      40997    41032     +35     

@lijunwangs lijunwangs force-pushed the track_transaction_performance_streamer_stage branch from 56835a9 to bb20eeb Compare April 4, 2024 18:48
@lijunwangs lijunwangs merged commit 2b03910 into anza-xyz:master Apr 4, 2024
50 checks passed
mergify bot pushed a commit that referenced this pull request Apr 5, 2024
* transaction performance tracking -- streamer stage

(cherry picked from commit 2b03910)

# Conflicts:
#	Cargo.toml
#	streamer/src/nonblocking/quic.rs
#	streamer/src/quic.rs
Copy link

mergify bot commented Apr 5, 2024

Backports to the stable branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule.

Copy link

mergify bot commented Apr 5, 2024

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

mergify bot pushed a commit that referenced this pull request Apr 5, 2024
* transaction performance tracking -- streamer stage

(cherry picked from commit 2b03910)

# Conflicts:
#	Cargo.toml
mergify bot pushed a commit that referenced this pull request Apr 5, 2024
* transaction performance tracking -- streamer stage

(cherry picked from commit 2b03910)

# Conflicts:
#	Cargo.toml
#	streamer/src/nonblocking/quic.rs
#	streamer/src/quic.rs
lijunwangs added a commit that referenced this pull request Apr 5, 2024
* transaction performance tracking -- streamer stage

(cherry picked from commit 2b03910)

# Conflicts:
#	Cargo.toml
@steviez
Copy link

steviez commented Apr 6, 2024

One one of your PR's doing similar things, I had a comment offering an alternative approach but you must not have seen it:
#202 (comment)

I did end up building this branch, and I instrumented out the TVU path. I think this approach is better because we get every packet and this approach easily scales to doing the same counting down the processing path. Ie, you can see that I measure the age at 2 spots in my branch. What do you think about this approach @lijunwangs ?

Here is that branch that does this for TVU path; I was reporting metrics internally for a node:
https://github.com/anza-xyz/agave/compare/master...steviez:timestamp_packets?expand=1

@lijunwangs
Copy link
Author

One one of your PR's doing similar things, I had a comment offering an alternative approach but you must not have seen it: #202 (comment)

I did end up building this branch, and I instrumented out the TVU path. I think this approach is better because we get every packet and this approach easily scales to doing the same counting down the processing path. Ie, you can see that I measure the age at 2 spots in my branch. What do you think about this approach @lijunwangs ?

Here is that branch that does this for TVU path; I was reporting metrics internally for a node: https://github.com/anza-xyz/agave/compare/master...steviez:timestamp_packets?expand=1

I thought about the approach as well -- it is simpler. But I had two concerns -- 1. it changed the public Meta struct declared in the SDK and case compatibility issues. 2. The extra storage overhead of storing this information in every packet we handle through the stages while we are only interested in certain samples of the packets.

Beside, this change does not preclude adopting your proposal if we have a consensus doing so in the future. The changes I have here is very limited to streamer itself. If we were to change to using Meta to store the information, we could simply change the storage from PacketAccumulator to it. That what I meant by we can iterate it. For now we really need to capture some information in the streamer layer on the latency this fills the critical gap.

Copy link

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing this first PR before reviewing the backports. Left some nit comments. I have a PR up to address these: #648.

@@ -718,6 +732,32 @@ async fn packet_batch_sender(
}
}

fn track_streamer_fetch_packet_performance(
packet_perf_measure: &mut [([u8; 64], Instant)],
stats: &Arc<StreamStats>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Passing &Arc<T> is often an anti-pattern. For this case, I believe we should use &T instead.

let mut process_sampled_packets_us_hist = stats.process_sampled_packets_us_hist.lock().unwrap();

for (signature, start_time) in packet_perf_measure.iter() {
let duration = Instant::now().duration_since(*start_time);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could hoist the Instant::now() call to a local variable above the for-loop and skip having to do the syscall each iteration. This would change the values on duration, but I don't think by much. Also, it may be better to have all measurements compared to a single "now" value. Wdyt?

Comment on lines +742 to +743
let mut measure = Measure::start("track_perf");
let mut process_sampled_packets_us_hist = stats.process_sampled_packets_us_hist.lock().unwrap();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Measurement is unbalance here. We do measure the lock time, we do not measure the unlock time. IMO either both or none should be measured.

lijunwangs added a commit that referenced this pull request Apr 24, 2024
* transaction performance tracking -- streamer stage

(cherry picked from commit 2b03910)
lijunwangs added a commit that referenced this pull request Apr 25, 2024
* transaction performance tracking -- streamer stage

(cherry picked from commit 2b03910)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants