-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance]: Restructure slashing for more performance #19457
Comments
This also may not make sense with the SDK's plans for a staking/slashing refactor, hence my asking if were aligned on problem/fix |
Nice, thanks for writing this up @ValarDragon! I'm really stoked to see you guys getting into the nitty gritty details of improving I/O, specifically writes. Tbh, I'm less concerned with reads, especially with store/v2 on the horizon. Now with regards to lowering number of writes, I think (1) should have zero contentious and provides a case where you may have no writes at all (assuming no missed blocks). So I have no objections there. I think (2) is a bit more contentious of a change. @tac0turtle wdyt? |
Sweet! #1 was shockingly easier than I expected, making a draft PR right now. I just have to get proto compiling working on my new laptop haha |
On our latest benchmarks, using IAVL v1 & no fast nodes, we are now seeing the Reads from slashing as taking 11% of our sync time! I am still shocked that this is so high tbh, I don't really understand why its not hitting caches. Were doing 300 reads, 150 writes in last SDK release. So maybe its possible we consume the full cache size every block, and thus get a lot of misses? (we need ~30% more speed improvements in state machine side, over the current IAVL v1 no fast nodes to hit our target goal. Getting pretty close!!) As we want to push towards lower block times though, it does seem a bit silly for us to not batch this computation more. Would a batching PR be accepted? (Batching as in process missed votes for jailing once every N blocks) This relates to #19474 . I don't really want to make Osmosis' code diverge from mainline that much in slashing, so I would only do it if theres alignment here. (Right now its just backporting my commits to their v0.47.x equivalent, which we'll drop once were on an SDK version containing it!) |
if we did batching how would you track it over a period of time, store it in some cache? |
If i wanted to process every 5th block, I just need to do the following pseudocode: if blockheight % 5 == 0 && blockHeight > 0 {
for i := blockheight - 4; i <= blockHieght; i++ {
header, err := getBlockHeaderAtHeight(i)
if err != nil { // edge case where chain started in the last 5 blocks, skip until we get height
continue
}
for vote in header.LastCommit {
// existing code for handling votes, just pass in blockheight instead of reading from ctx
}
}
} Staking already stores those headers! |
Yup -- @ValarDragon's proposal makes sense. This doesn't require epochs though, as you've pointed out, you can just perform modulo (that modulo must be divisible by the signing window though). |
Summary
Currently slashing makes two state reads per validator, always one write, sometimes two writes, every block. This causes excess time for:
In Osmosis this is 150 to 300 state writes per block! We see this on average being around 175 writes.
We've just started benchmarking and aiming to reduce our number of writes per block. After we changed x/distribution to only distribute staking rewards once every 50 blocks, it preliminarily looks like were at around 1000 writes per block, so this is a big contribution to database Commit time!
Problem Definition
No response
Proposed Feature
Right now I'm concerned with lowering the number of writes at hand here. (But later on would like to reduce the data fetch time too)
There are two data writes right now:
I see 3 paths forward here:
2 * num_missed_validator_sigs
.I think we should do both solutions (1) and (2) personally, but I see (2) as potentially contentious. (I think its fine, but I'd understand if others disagreed) I'm happy to PR a fix for #1, if other folks agree. I believe it would be:
However I would not be taking on the fully burdened work scope of QA'ing the migration in the mainline SDK. I would bring it to the Osmosis fork if were aligned, but I would only include it in Osmosis if the mainline SDK folks wanted to do it.
The text was updated successfully, but these errors were encountered: