Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(drive): uncommitted state if db transaction fails #2305

Merged
merged 16 commits into from
Nov 4, 2024

Conversation

shumkov
Copy link
Member

@shumkov shumkov commented Nov 2, 2024

Issue being fixed or feature implemented

We had a sequence of errors on the mainnet started since block 32326. We got RocksDB's "transaction is busy" error because of a bug (#2309). Due to another bug in Tenderdash (dashpay/tenderdash#966), validators just proceeded to the next block partially committing the state and updating the cache. Full nodes are stuck and proceeded after re-sync.

What was done?

  • Panic, if app hashes in memory and on disk, are different
  • Panic if we aren't able to commit the Drive state to restart Drive and try this block from scratch

For the mainnet chain, we enable these fixes at the block when we consider the state is consistent.

How Has This Been Tested?

Syncing testnet and mainnet

Breaking Changes

None

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have added "!" to the title and described breaking changes in the corresponding section if my code contains any
  • I have made corresponding changes to the documentation if needed

For repository code-owners and collaborators only

  • I have assigned this pull request to a milestone

Summary by CodeRabbit

  • New Features

    • Enhanced error handling and verification logic for transaction and proposal processing.
    • Improved consistency checks between in-memory and on-disk application states.
  • Bug Fixes

    • Addressed specific inconsistencies related to block heights on the Dash network, particularly for block height 32326 and 32327.
  • Refactor

    • Simplified the PlatformConfig structure by removing the initial_protocol_version field and related methods.
  • Documentation

    • Updated comments to clarify the rationale behind changes and temporary solutions for handling specific network conditions.

Copy link
Contributor

coderabbitai bot commented Nov 2, 2024

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes in this pull request primarily focus on enhancing the transaction handling logic across several functions in the abci handler files. Key modifications include the introduction of conditional checks related to specific block heights and the Dash network, which alter how transactions are committed and how application hashes are verified. Additionally, the PlatformConfig struct is simplified by removing the initial_protocol_version field, streamlining configuration management. Overall, these changes aim to improve error handling and ensure consistency between in-memory and on-disk states.

Changes

File Path Change Summary
packages/rs-drive-abci/src/abci/handler/finalize_block.rs Modified finalize_block function to include conditional transaction commit logic based on network and block height. Added comments for clarity.
packages/rs-drive-abci/src/abci/handler/info.rs Updated info function to replace application hash retrieval with a new variable and added checks for critical inconsistencies.
packages/rs-drive-abci/src/abci/handler/prepare_proposal.rs Enhanced prepare_proposal function with new verification logic for application hashes and added panic conditions for inconsistencies.
packages/rs-drive-abci/src/abci/handler/process_proposal.rs Modified process_proposal function to include application hash verification and refined error handling.
packages/rs-drive-abci/src/config.rs Removed initial_protocol_version from PlatformConfigIntermediate and updated related methods accordingly.

Possibly related PRs

Suggested labels

bug

Suggested reviewers

  • QuantumExplorer

🐇 In the code we hop and play,
With transactions in a new way.
Block heights checked, and hashes too,
A simpler config, all shiny and new!
Errors caught, no need to fret,
In our code garden, we’re all set! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@shumkov shumkov added this to the v1.5.x milestone Nov 3, 2024
@shumkov shumkov self-assigned this Nov 3, 2024
@shumkov shumkov marked this pull request as ready for review November 3, 2024 10:28
@shumkov
Copy link
Member Author

shumkov commented Nov 3, 2024

@CodeRabbit review

Copy link
Contributor

coderabbitai bot commented Nov 3, 2024

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (8)
packages/rs-drive-abci/src/abci/handler/info.rs (3)

50-50: Address the TODO: Verify that the chain ID is evo1.

There's a pending TODO comment to verify that the chain ID is evo1. Implementing this verification is important to ensure that the exception for block 32326 only applies to the mainnet and does not affect other networks.

Would you like assistance in implementing this chain ID verification?


51-63: Refactor conditional logic for clarity.

The negated condition in the if statement can be simplified to enhance readability. By restructuring the condition, the logic becomes more straightforward and maintainable.

Apply this diff to simplify the condition:

 #[allow(clippy::collapsible_if)]
-if !(app.platform().config.network == Network::Dash && last_block_height == 32326) {
+if app.platform().config.network != Network::Dash || last_block_height != 32326 {
     // App hash in memory must be equal to app hash on disk
     if drive_storage_root_hash != platform_state_app_hash {
         // We panic because we can't recover from this situation.
         // Better to restart the Drive, so we might self-heal the node
-        // reloading state form the disk
+        // reloading state from the disk
         panic!(
             "drive and platform state app hash mismatch: drive_storage_root_hash: {:?}, platform_state_app_hash: {:?}",
             drive_storage_root_hash, platform_state_app_hash
         );
     }
 }

This change directly checks when the network is not Dash or the last block height is not 32326, making the condition clearer.


57-58: Fix typographical error in the comment.

In the comment, "reloading state form the disk" should be "reloading state from the disk".

Apply this diff to correct the typo:

 // Better to restart the Drive, so we might self-heal the node
-// reloading state form the disk
+// reloading state from the disk
 panic!(
     "drive and platform state app hash mismatch: drive_storage_root_hash: {:?}, platform_state_app_hash: {:?}",
     drive_storage_root_hash, platform_state_app_hash
 );
packages/rs-drive-abci/src/abci/handler/finalize_block.rs (2)

75-75: Reminder: Address the TODO comment

The TODO indicates the need to verify that the chain ID is evo1. Implementing this check is important to ensure the conditional logic applies only to the intended network, preventing unintended behavior on other networks.

Would you like assistance in implementing the chain ID verification?


76-76: Avoid hardcoding block height and network values

To improve maintainability and flexibility, consider defining constants or configuration parameters for the block height 32326 and the network Network::Dash instead of hardcoding them directly in the conditional statement.

packages/rs-drive-abci/src/abci/handler/prepare_proposal.rs (1)

39-39: Typo in comment: 'commited' should be 'committed'

Please correct the spelling in the comment: 'commited' should be 'committed'.

packages/rs-drive-abci/src/abci/handler/process_proposal.rs (2)

208-208: Simplify the conditional statement for better readability

The current condition uses a negation with !(...), which can be harder to read. Consider inverting the logic to make it more straightforward and improve code clarity.

Apply this diff:

-        if !(app.platform().config.network == Network::Dash && request.height == 32327) {
+        if app.platform().config.network != Network::Dash || request.height != 32327 {

208-208: Use a named constant for the exceptional block height

Hardcoding the block height 32327 can affect maintainability and readability. Define it as a named constant to make the code clearer and to facilitate any future changes.

Apply this diff:

+        const EXCEPTION_BLOCK_HEIGHT: i64 = 32327;
-        if !(app.platform().config.network == Network::Dash && request.height == 32327) {
+        if app.platform().config.network != Network::Dash || request.height != EXCEPTION_BLOCK_HEIGHT {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between feacde2 and 4b81209.

📒 Files selected for processing (5)
  • packages/rs-drive-abci/src/abci/handler/finalize_block.rs (2 hunks)
  • packages/rs-drive-abci/src/abci/handler/info.rs (2 hunks)
  • packages/rs-drive-abci/src/abci/handler/prepare_proposal.rs (2 hunks)
  • packages/rs-drive-abci/src/abci/handler/process_proposal.rs (2 hunks)
  • packages/rs-drive-abci/src/config.rs (0 hunks)
💤 Files with no reviewable changes (1)
  • packages/rs-drive-abci/src/config.rs
🔇 Additional comments (2)
packages/rs-drive-abci/src/abci/handler/prepare_proposal.rs (1)

64-75: ⚠️ Potential issue

Inconsistency in block height handling

The code applies an exception for request.height == 32327, but according to the PR objectives, the chain halt occurred at block 32326, and the exception should be made for that block. Please verify the correct block height to ensure consistency between the code and the PR objectives.

Run the following script to verify references to block heights 32326 and 32327 in the codebase:

packages/rs-drive-abci/src/abci/handler/process_proposal.rs (1)

15-15: Import Network module for conditional checks

The addition of use dpp::dashcore::Network; is necessary for the network-specific conditional logic introduced later in the code.

@QuantumExplorer QuantumExplorer changed the base branch from master to v1.5.1-dev November 3, 2024 10:48
…ommit' into fix/drive/patch-for-rocksdb-tx-commit

# Conflicts:
#	packages/rs-drive-abci/src/abci/handler/finalize_block.rs
#	packages/rs-drive-abci/src/abci/handler/info.rs
#	packages/rs-drive-abci/src/abci/handler/prepare_proposal.rs
#	packages/rs-drive-abci/src/abci/handler/process_proposal.rs
QuantumExplorer
QuantumExplorer previously approved these changes Nov 4, 2024
@shumkov shumkov changed the base branch from v1.5.1-dev to master November 4, 2024 09:41
@shumkov shumkov dismissed QuantumExplorer’s stale review November 4, 2024 09:41

The base branch was changed.

@shumkov shumkov merged commit dacc6db into master Nov 4, 2024
24 checks passed
@shumkov shumkov deleted the fix/drive/patch-for-rocksdb-tx-commit branch November 4, 2024 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants