Thv/fix rollback issue #147

Sword-Smith · 2024-05-07T17:46:19Z

~~A draft PR since there are still two tests failing.~~

All tests pass.

This closes #143

dan-da

It looks good to me overall.

I ran my stress test, which consists of commenting the sleep calls in run-multiple-instances-from-genesis.sh and then running it. This causes both mining nodes to start at the same time. They usually mine blocks 1 and 2 right away independently, which forces a reorg by one of them when they connect. Sometimes they don't find any peer for some reason, in which case I just shutdown and re-run. Anyway, in 3 successive runs both nodes were synced up by block 3, and once by block 2. That wasn't happening before, so this definitely seems to fix it.

That said, I am still seeing some warnings in the log of the peer that does the reorgs. I'm unsure if these are indicating real problems, or are just normal for a re-org.

Here they are:

$ grep WARN /tmp/integration_test_from_genesis-2.log 
2024-05-08T01:15:56.324546469Z  WARN ThreadId(01) neptune_core::models::state::wallet::wallet_state: Unable to find valid membership proof for UTXO with digest 08806702505947110800,10280140293633150857,05397434293386042537,13540323760646287415,06445496058825839468. UTXO was received at block height 1. Current block height is 1
2024-05-08T01:15:56.356890146Z  WARN ThreadId(01) neptune_core::models::state::wallet::wallet_state: Unable to find valid membership proof for UTXO with digest 08806702505947110800,10280140293633150857,05397434293386042537,13540323760646287415,06445496058825839468. UTXO was received at block height 1. Current block height is 2
2024-05-08T01:16:44.889471372Z  WARN ThreadId(01) neptune_core::models::state: Could not recover MSMP as transaction appears to be on an abandoned chain
2024-05-08T01:17:43.902038027Z  WARN ThreadId(01) neptune_core::models::state: Could not recover MSMP as transaction appears to be on an abandoned chain
2024-05-08T01:18:42.909408732Z  WARN ThreadId(01) neptune_core::models::state: Could not recover MSMP as transaction appears to be on an abandoned chain
2024-05-08T01:19:41.91564508Z  WARN ThreadId(01) neptune_core::models::state: Could not recover MSMP as transaction appears to be on an abandoned chain

I'd like to get feedback on these warnings before approving the PR.

src/models/state/archival_state.rs

src/models/state/mod.rs

src/models/state/archival_state.rs

src/bin/dashboard_src/receive_screen.rs

src/bin/dashboard_src/send_screen.rs

Sword-Smith · 2024-05-08T13:45:16Z

It looks good to me overall.

I ran my stress test, which consists of commenting the sleep calls in run-multiple-instances-from-genesis.sh and then running it. This causes both mining nodes to start at the same time. They usually mine blocks 1 and 2 right away independently, which forces a reorg by one of them when they connect. Sometimes they don't find any peer for some reason, in which case I just shutdown and re-run. Anyway, in 3 successive runs both nodes were synced up by block 3, and once by block 2. That wasn't happening before, so this definitely seems to fix it.

That said, I am still seeing some warnings in the log of the peer that does the reorgs. I'm unsure if these are indicating real problems, or are just normal for a re-org.

Here they are:
$ grep WARN /tmp/integration_test_from_genesis-2.log 
2024-05-08T01:15:56.324546469Z  WARN ThreadId(01) neptune_core::models::state::wallet::wallet_state: Unable to find valid membership proof for UTXO with digest 08806702505947110800,10280140293633150857,05397434293386042537,13540323760646287415,06445496058825839468. UTXO was received at block height 1. Current block height is 1
2024-05-08T01:15:56.356890146Z  WARN ThreadId(01) neptune_core::models::state::wallet::wallet_state: Unable to find valid membership proof for UTXO with digest 08806702505947110800,10280140293633150857,05397434293386042537,13540323760646287415,06445496058825839468. UTXO was received at block height 1. Current block height is 2
2024-05-08T01:16:44.889471372Z  WARN ThreadId(01) neptune_core::models::state: Could not recover MSMP as transaction appears to be on an abandoned chain
2024-05-08T01:17:43.902038027Z  WARN ThreadId(01) neptune_core::models::state: Could not recover MSMP as transaction appears to be on an abandoned chain
2024-05-08T01:18:42.909408732Z  WARN ThreadId(01) neptune_core::models::state: Could not recover MSMP as transaction appears to be on an abandoned chain
2024-05-08T01:19:41.91564508Z  WARN ThreadId(01) neptune_core::models::state: Could not recover MSMP as transaction appears to be on an abandoned chain
I'd like to get feedback on these warnings before approving the PR.

These warnings point to the fact that the client has UTXOs on a branch that is no longer canonical. I think this behavior is correct and shouldn't be fixed. These abandoned UTXOs is why we have the CLI endpoint prune-abandoned-monitored-utxos which the user can run to get rid of these warnigs.

dan-da

thx for clarification re warnings. it's nice to have that timelock test fixed also. It's caused some CI failures.

This logging disturbs the test-output. And the caller's to this function already log correctly if appropriate.

We want to use this functionality when we set a new tip header since we are going to new the previous block's mutator set.

Always set the new block we are storing to disk and indexing in the database as the tip. Previously, we would check if this block exceeded the proof of work family number of the current tip but this responsibility is now delegated to the caller. This is needed to fix a bug in `store_block_internal_worker` where the value `previous_ms_accumulator` is set incorrectly whenever there is a fork, i.e. whenever `store_block_internal_worker` receives a block that is not the child of what the block currently considers the tip. In order to set `previous_ms_accumulator`, we need to know that the block we are storing now is considered the tip (by the `ArchivalState`) *and* we need a method to fetch the parent of the tip after this function `write_block_as_tip` has been called. Cf. #143.

When the global state is updated with a new block, that new block is always considered the new tip. This change is reflected in a new name for this function also. This allows us to correctly handle state updates in the case of forks. Since the block is no longer carrying around the previous state of the mutator set accumulator, we have to get that state from somewhere else. We elect to get it from the archival state which can identify the parent of the current tip. Cf. #143

…ernal` Previously tests had their own helper functions for updating the global state. With this commit, most tests now go through the same state updater function as the main code. This shortens multiple tests and make us more likely to catch errors, like the one in #143.

After redefining the behavior of the `set_new_tip` to update global state with a new block, the function formerly known as `store_block`, we need to rearrange the order in which blocks are added in a few tests. With these changes, all tests pass, and #143 is closed.

The timelock test was flaky since it often overflowed on the addition of two `BfieldElement`'s. This closes #148.

Also changes the fn name of the internal worker to match that of the outer methods.

Sword-Smith · 2024-05-10T10:24:29Z

I ran my stress test, which consists of commenting the sleep calls in run-multiple-instances-from-genesis.sh and then running it. This causes both mining nodes to start at the same time. They usually mine blocks 1 and 2 right away independently, which forces a reorg by one of them when they connect.

Just wanted to add that I really appreciate that you have a way of testing fork resolution.

With this PR, more tests go through the GlobalState's tip-updater method to verify that forks are resolved. Previously only the individual components were tested for fork resolution, not the main state-updater method. That's why this problem wasn't caught by unit tests.

Sword-Smith requested a review from dan-da May 7, 2024 17:46

Sword-Smith marked this pull request as draft May 7, 2024 17:46

Sword-Smith marked this pull request as ready for review May 7, 2024 21:09

dan-da reviewed May 8, 2024

View reviewed changes

dan-da approved these changes May 8, 2024

View reviewed changes

Sword-Smith added 12 commits May 10, 2024 12:11

Remove WARN log from function testing block's PoW validity

c5ce76b

This logging disturbs the test-output. And the caller's to this function already log correctly if appropriate.

feat(archival_state): Add method to get parent block of tip

99abbdc

We want to use this functionality when we set a new tip header since we are going to new the previous block's mutator set.

Change RegTest time resolution to 10 hours

348748b

test: Delete bad helper function from test lib

4e0e75d

Add note about valid UTXO for add_expected_utxo

d8d9298

Make compiler/linter happy

8ef7eb5

fix(test): Fix flaky timelock test

0a540f8

The timelock test was flaky since it often overflowed on the addition of two `BfieldElement`'s. This closes #148.

docs(tip-updater): Elaborate on comments in method to update tip

65b283c

Also changes the fn name of the internal worker to match that of the outer methods.

Sword-Smith force-pushed the thv/fix-rollback-issue branch from b7bda00 to 65b283c Compare May 10, 2024 10:17

Fix name of test-helper function after upstream changes

0b212d1

Sword-Smith merged commit f1b3d57 into master May 10, 2024
3 checks passed

Sword-Smith deleted the thv/fix-rollback-issue branch May 11, 2024 07:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thv/fix rollback issue #147

Thv/fix rollback issue #147

Sword-Smith commented May 7, 2024 •

edited

Loading

dan-da left a comment

Sword-Smith commented May 8, 2024

dan-da left a comment •

edited

Loading

Sword-Smith commented May 10, 2024

Thv/fix rollback issue #147

Thv/fix rollback issue #147

Conversation

Sword-Smith commented May 7, 2024 • edited Loading

dan-da left a comment

Choose a reason for hiding this comment

Sword-Smith commented May 8, 2024

dan-da left a comment • edited Loading

Choose a reason for hiding this comment

Sword-Smith commented May 10, 2024

Sword-Smith commented May 7, 2024 •

edited

Loading

dan-da left a comment •

edited

Loading