Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thv/fix rollback issue #147

Merged
merged 13 commits into from
May 10, 2024
Merged

Thv/fix rollback issue #147

merged 13 commits into from
May 10, 2024

Conversation

Sword-Smith
Copy link
Member

@Sword-Smith Sword-Smith commented May 7, 2024

A draft PR since there are still two tests failing.

All tests pass.

This closes #143

@Sword-Smith Sword-Smith requested a review from dan-da May 7, 2024 17:46
@Sword-Smith Sword-Smith marked this pull request as draft May 7, 2024 17:46
@Sword-Smith Sword-Smith marked this pull request as ready for review May 7, 2024 21:09
Copy link
Collaborator

@dan-da dan-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me overall.

I ran my stress test, which consists of commenting the sleep calls in run-multiple-instances-from-genesis.sh and then running it. This causes both mining nodes to start at the same time. They usually mine blocks 1 and 2 right away independently, which forces a reorg by one of them when they connect. Sometimes they don't find any peer for some reason, in which case I just shutdown and re-run. Anyway, in 3 successive runs both nodes were synced up by block 3, and once by block 2. That wasn't happening before, so this definitely seems to fix it.

That said, I am still seeing some warnings in the log of the peer that does the reorgs. I'm unsure if these are indicating real problems, or are just normal for a re-org.

Here they are:

$ grep WARN /tmp/integration_test_from_genesis-2.log 
2024-05-08T01:15:56.324546469Z  WARN ThreadId(01) neptune_core::models::state::wallet::wallet_state: Unable to find valid membership proof for UTXO with digest 08806702505947110800,10280140293633150857,05397434293386042537,13540323760646287415,06445496058825839468. UTXO was received at block height 1. Current block height is 1
2024-05-08T01:15:56.356890146Z  WARN ThreadId(01) neptune_core::models::state::wallet::wallet_state: Unable to find valid membership proof for UTXO with digest 08806702505947110800,10280140293633150857,05397434293386042537,13540323760646287415,06445496058825839468. UTXO was received at block height 1. Current block height is 2
2024-05-08T01:16:44.889471372Z  WARN ThreadId(01) neptune_core::models::state: Could not recover MSMP as transaction appears to be on an abandoned chain
2024-05-08T01:17:43.902038027Z  WARN ThreadId(01) neptune_core::models::state: Could not recover MSMP as transaction appears to be on an abandoned chain
2024-05-08T01:18:42.909408732Z  WARN ThreadId(01) neptune_core::models::state: Could not recover MSMP as transaction appears to be on an abandoned chain
2024-05-08T01:19:41.91564508Z  WARN ThreadId(01) neptune_core::models::state: Could not recover MSMP as transaction appears to be on an abandoned chain

I'd like to get feedback on these warnings before approving the PR.

src/models/state/archival_state.rs Show resolved Hide resolved
src/models/state/archival_state.rs Show resolved Hide resolved
src/models/state/mod.rs Outdated Show resolved Hide resolved
src/models/state/archival_state.rs Show resolved Hide resolved
src/bin/dashboard_src/receive_screen.rs Show resolved Hide resolved
src/bin/dashboard_src/send_screen.rs Show resolved Hide resolved
@Sword-Smith
Copy link
Member Author

It looks good to me overall.

I ran my stress test, which consists of commenting the sleep calls in run-multiple-instances-from-genesis.sh and then running it. This causes both mining nodes to start at the same time. They usually mine blocks 1 and 2 right away independently, which forces a reorg by one of them when they connect. Sometimes they don't find any peer for some reason, in which case I just shutdown and re-run. Anyway, in 3 successive runs both nodes were synced up by block 3, and once by block 2. That wasn't happening before, so this definitely seems to fix it.

That said, I am still seeing some warnings in the log of the peer that does the reorgs. I'm unsure if these are indicating real problems, or are just normal for a re-org.

Here they are:

$ grep WARN /tmp/integration_test_from_genesis-2.log 
2024-05-08T01:15:56.324546469Z  WARN ThreadId(01) neptune_core::models::state::wallet::wallet_state: Unable to find valid membership proof for UTXO with digest 08806702505947110800,10280140293633150857,05397434293386042537,13540323760646287415,06445496058825839468. UTXO was received at block height 1. Current block height is 1
2024-05-08T01:15:56.356890146Z  WARN ThreadId(01) neptune_core::models::state::wallet::wallet_state: Unable to find valid membership proof for UTXO with digest 08806702505947110800,10280140293633150857,05397434293386042537,13540323760646287415,06445496058825839468. UTXO was received at block height 1. Current block height is 2
2024-05-08T01:16:44.889471372Z  WARN ThreadId(01) neptune_core::models::state: Could not recover MSMP as transaction appears to be on an abandoned chain
2024-05-08T01:17:43.902038027Z  WARN ThreadId(01) neptune_core::models::state: Could not recover MSMP as transaction appears to be on an abandoned chain
2024-05-08T01:18:42.909408732Z  WARN ThreadId(01) neptune_core::models::state: Could not recover MSMP as transaction appears to be on an abandoned chain
2024-05-08T01:19:41.91564508Z  WARN ThreadId(01) neptune_core::models::state: Could not recover MSMP as transaction appears to be on an abandoned chain

I'd like to get feedback on these warnings before approving the PR.

These warnings point to the fact that the client has UTXOs on a branch that is no longer canonical. I think this behavior is correct and shouldn't be fixed. These abandoned UTXOs is why we have the CLI endpoint prune-abandoned-monitored-utxos which the user can run to get rid of these warnigs.

Copy link
Collaborator

@dan-da dan-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for clarification re warnings. it's nice to have that timelock test fixed also. It's caused some CI failures.

This logging disturbs the test-output. And the caller's to this function
already log correctly if appropriate.
We want to use this functionality when we set a new tip header since
we are going to new the previous block's mutator set.
Always set the new block we are storing to disk and indexing in the
database as the tip. Previously, we would check if this block exceeded
the proof of work family number of the current tip but this
responsibility is now delegated to the caller.

This is needed to fix a bug in `store_block_internal_worker` where the
value `previous_ms_accumulator` is set incorrectly whenever there is a
fork, i.e. whenever `store_block_internal_worker` receives a block that
is not the child of what the block currently considers the tip. In order
to set `previous_ms_accumulator`, we need to know that the block we are
storing now is considered the tip (by the `ArchivalState`) *and* we need
a method to fetch the parent of the tip after this function
`write_block_as_tip` has been called.

Cf. #143.
When the global state is updated with a new block, that new block is
always considered the new tip. This change is reflected in a new name
for this function also. This allows us to correctly handle state updates
in the case of forks. Since the block is no longer carrying around the
previous state of the mutator set accumulator, we have to get that state
from somewhere else. We elect to get it from the archival state which
can identify the parent of the current tip.

Cf. #143
…ernal`

Previously tests had their own helper functions for updating the global
state. With this commit, most tests now go through the same state
updater function as the main code. This shortens multiple tests and make
us more likely to catch errors, like the one in #143.
After redefining the behavior of the `set_new_tip` to update global state
with a new block, the function formerly known as `store_block`,  we need
to rearrange the order in which blocks are added in a few tests. With these
changes, all tests pass, and #143 is closed.
The timelock test was flaky since it often overflowed on the addition of
two `BfieldElement`'s.

This closes #148.
Also changes the fn name of the internal worker to match that of the
outer methods.
@Sword-Smith
Copy link
Member Author

I ran my stress test, which consists of commenting the sleep calls in run-multiple-instances-from-genesis.sh and then running it. This causes both mining nodes to start at the same time. They usually mine blocks 1 and 2 right away independently, which forces a reorg by one of them when they connect.

Just wanted to add that I really appreciate that you have a way of testing fork resolution.

With this PR, more tests go through the GlobalState's tip-updater method to verify that forks are resolved. Previously only the individual components were tested for fork resolution, not the main state-updater method. That's why this problem wasn't caught by unit tests.

@Sword-Smith Sword-Smith merged commit f1b3d57 into master May 10, 2024
3 checks passed
@Sword-Smith Sword-Smith deleted the thv/fix-rollback-issue branch May 11, 2024 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

assertion failed: Mutator set in wallet-handler must agree with that from applied block
2 participants