[Performance]: Restructure slashing for more performance #19457

ValarDragon · 2024-02-16T23:16:13Z

Summary

Currently slashing makes two state reads per validator, always one write, sometimes two writes, every block. This causes excess time for:

Fetching this data
IAVL committing to this data

In Osmosis this is 150 to 300 state writes per block! We see this on average being around 175 writes.

We've just started benchmarking and aiming to reduce our number of writes per block. After we changed x/distribution to only distribute staking rewards once every 50 blocks, it preliminarily looks like were at around 1000 writes per block, so this is a big contribution to database Commit time!

Problem Definition

No response

Proposed Feature

Right now I'm concerned with lowering the number of writes at hand here. (But later on would like to reduce the data fetch time too)

There are two data writes right now:

We always write SlashingInfo for every validator. What this is doing right now is just incrementing an offset, and then tracking missed blocks if you changed how many you missed.
If your missed status for the current block is different than your last block at that offset, we update the bitmap.

I see 3 paths forward here:

(Low Complexity) Remove the offset incrementing in SetInfo. It just doesn't need to happen, you can derive it from a "start offset", and tracking the block height pretty easily. If folks are aligned with this approach I'll happily make a PR, and this should drop the number of writes to just be roughly 2 * num_missed_validator_sigs.
(Low-Medium Complexity) Theres a lot of data locality were just kinda missing here. We could just not update liveness tracking info every single block, and instead update it every "10" blocks. This would require:
- Every block writing to state the list of votes (unless stored already). (NOTE: It is stored in staking)
- Every "interval" run the existing vote logic.
  - All the data gets will be in cache improving speed
  - We will only do 1 write per validator + num validator missed blocks, but the number of missed block should be in the same validator bitmap chunk, so still likely just 2 writes.
(Medium Complexity) Change the data structure so that all validator's bitmap entry's that get processed together are in the same IAVL leaf. (This would lower the read and write cost significantly, as instead of reads and writes for 150 validators, its O(1) DB reads/writes, just each capturing 150 bits of data)

I think we should do both solutions (1) and (2) personally, but I see (2) as potentially contentious. (I think its fine, but I'd understand if others disagreed) I'm happy to PR a fix for #1, if other folks agree. I believe it would be:

Simple
A notable win

However I would not be taking on the fully burdened work scope of QA'ing the migration in the mainline SDK. I would bring it to the Osmosis fork if were aligned, but I would only include it in Osmosis if the mainline SDK folks wanted to do it.

The text was updated successfully, but these errors were encountered:

ValarDragon · 2024-02-16T23:17:36Z

This also may not make sense with the SDK's plans for a staking/slashing refactor, hence my asking if were aligned on problem/fix

alexanderbez · 2024-02-17T17:25:48Z

Nice, thanks for writing this up @ValarDragon! I'm really stoked to see you guys getting into the nitty gritty details of improving I/O, specifically writes. Tbh, I'm less concerned with reads, especially with store/v2 on the horizon.

Now with regards to lowering number of writes, I think (1) should have zero contentious and provides a case where you may have no writes at all (assuming no missed blocks). So I have no objections there. I think (2) is a bit more contentious of a change.

@tac0turtle wdyt?

ValarDragon · 2024-02-17T17:49:39Z

Sweet! #1 was shockingly easier than I expected, making a draft PR right now. I just have to get proto compiling working on my new laptop haha

ValarDragon · 2024-02-23T05:28:09Z

On our latest benchmarks, using IAVL v1 & no fast nodes, we are now seeing the Reads from slashing as taking 11% of our sync time! I am still shocked that this is so high tbh, I don't really understand why its not hitting caches. Were doing 300 reads, 150 writes in last SDK release. So maybe its possible we consume the full cache size every block, and thus get a lot of misses?

(we need ~30% more speed improvements in state machine side, over the current IAVL v1 no fast nodes to hit our target goal. Getting pretty close!!)

As we want to push towards lower block times though, it does seem a bit silly for us to not batch this computation more. Would a batching PR be accepted? (Batching as in process missed votes for jailing once every N blocks) This relates to #19474 . I don't really want to make Osmosis' code diverge from mainline that much in slashing, so I would only do it if theres alignment here. (Right now its just backporting my commits to their v0.47.x equivalent, which we'll drop once were on an SDK version containing it!)

tac0turtle · 2024-02-23T07:25:31Z

if we did batching how would you track it over a period of time, store it in some cache?

ValarDragon · 2024-02-23T14:50:48Z

If i wanted to process every 5th block, I just need to do the following pseudocode:

if blockheight % 5 == 0 && blockHeight > 0 {
  for i := blockheight - 4; i <= blockHieght; i++ {
    header, err := getBlockHeaderAtHeight(i)
    if err != nil { // edge case where chain started in the last 5 blocks, skip until we get height
        continue
    } 
    for vote in header.LastCommit {
        // existing code for handling votes, just pass in blockheight instead of reading from ctx
    } 
  }
}

Staking already stores those headers!

alexanderbez · 2024-02-23T16:54:20Z

Yup -- @ValarDragon's proposal makes sense. This doesn't require epochs though, as you've pointed out, you can just perform modulo (that modulo must be divisible by the signing window though).

ValarDragon added the T:feature-request label Feb 16, 2024

ValarDragon mentioned this issue Feb 17, 2024

perf!: Make slashing not write sign info every block #19458

Merged

12 tasks

tac0turtle added the T: Performance Performance improvements label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance]: Restructure slashing for more performance #19457

[Performance]: Restructure slashing for more performance #19457

ValarDragon commented Feb 16, 2024 •

edited

Loading

ValarDragon commented Feb 16, 2024

alexanderbez commented Feb 17, 2024

ValarDragon commented Feb 17, 2024

ValarDragon commented Feb 23, 2024 •

edited

Loading

tac0turtle commented Feb 23, 2024

ValarDragon commented Feb 23, 2024 •

edited

Loading

alexanderbez commented Feb 23, 2024

[Performance]: Restructure slashing for more performance #19457

[Performance]: Restructure slashing for more performance #19457

Comments

ValarDragon commented Feb 16, 2024 • edited Loading

Summary

Problem Definition

Proposed Feature

ValarDragon commented Feb 16, 2024

alexanderbez commented Feb 17, 2024

ValarDragon commented Feb 17, 2024

ValarDragon commented Feb 23, 2024 • edited Loading

tac0turtle commented Feb 23, 2024

ValarDragon commented Feb 23, 2024 • edited Loading

alexanderbez commented Feb 23, 2024

ValarDragon commented Feb 16, 2024 •

edited

Loading

ValarDragon commented Feb 23, 2024 •

edited

Loading

ValarDragon commented Feb 23, 2024 •

edited

Loading