-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sdk)!: detect stale nodes #2254
Conversation
WalkthroughThe changes introduced in this pull request primarily focus on enhancing error handling and retry logic within the DAPI client and SDK. Key modifications include the replacement of the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (9)
packages/rs-dapi-client/src/lib.rs (1)
74-88
: Approved: Excellent improvements to error handling and retry logicThe changes to the
CanRetry
trait are well-implemented and align perfectly with the PR objectives. The newcan_retry
method provides more flexibility in error handling, while the deprecation ofis_node_failure
maintains backward compatibility.A minor suggestion for improvement:
Consider adding a brief example in the documentation for
can_retry
to illustrate its usage, especially how to handle theNone
case. This would further enhance the clarity for developers using this trait.Example:
/// Returns true if the operation can be retried. /// `None` means unspecified - in this case, you should either inspect the error /// in more details, or assume `false`. /// /// # Example /// ``` /// match error.can_retry() { /// Some(true) => retry_operation(), /// Some(false) => handle_non_retryable_error(), /// None => { /// // Inspect error in more detail or assume false /// if should_retry_based_on_error_details(error) { /// retry_operation() /// } else { /// handle_non_retryable_error() /// } /// } /// } /// ``` fn can_retry(&self) -> Option<bool>;packages/rs-sdk/src/error.rs (1)
84-100
: LGTM:CanRetry
trait implementation looks good.The implementation correctly identifies retryable errors as per the PR objectives, covering stale proofs, DAPI client errors, core client errors, and timeout errors. The use of the
matches!
macro results in concise and readable code.Consider adding a comment explaining the rationale behind categorizing these specific errors as retryable. This would enhance code maintainability and make it easier for other developers to understand and potentially extend this logic in the future.
Example:
impl CanRetry for Error { fn can_retry(&self) -> Option<bool> { // The following errors are considered retryable: // - Stale proofs: The node might have updated data on a subsequent request // - DAPI and Core client errors: These might be transient network issues // - Timeout errors: A retry might succeed if the timeout was due to temporary congestion let retry = matches!( self, Error::Proof(drive_proof_verifier::Error::StaleProof(..)) | Error::DapiClientError(_) | Error::CoreClientError(_) | Error::TimeoutReached(_, _) ); if retry { Some(true) } else { None } } }packages/rs-sdk/Cargo.toml (1)
9-9
: LGTM! Consider specifying features for chrono.The addition of chrono as a regular dependency aligns well with the PR objectives, particularly for implementing time-based checks for stale proofs. The version specified (0.4.38) is relatively recent, which is good for security and feature support.
Consider specifying only the features you need for chrono to minimize the dependency footprint. For example:
-chrono = { version = "0.4.38" } +chrono = { version = "0.4.38", default-features = false, features = ["clock"] }This assumes you only need the
clock
feature. Adjust the features based on your specific requirements.packages/rs-dapi-client/src/transport/grpc.rs (2)
Line range hint
120-141
: Approved: Improved retry logic with more flexibilityThe changes to the
can_retry
method are well-implemented and align with the PR objectives. The use ofOption<bool>
provides more flexibility in error handling, and the inverted logic makes it clearer which cases are retryable.Consider adding a brief comment explaining the logic behind the
retry
variable, e.g.:// Determine if retry is possible based on the status code // We retry for all codes except those explicitly listed let retry = !matches!( code, Ok | DataLoss | Cancelled | Unknown | DeadlineExceeded | ResourceExhausted | Aborted | Internal | Unavailable );This would enhance code readability and make the intention behind the logic more explicit.
Line range hint
1-541
: Suggestion: Improve file organization and documentationWhile the changes in this file are focused on the
can_retry
method, there are some general improvements that could enhance the overall quality of the file:
Consider grouping the
impl_transport_request_grpc!
macro invocations by client type (Platform and Core) using comments or regions for better organization.Add a brief documentation comment at the beginning of the file explaining its purpose and the main components it contains.
Consider adding a TODO comment to explore if any of the repeated
impl_transport_request_grpc!
macro invocations can be further abstracted or generated programmatically to reduce code duplication.These suggestions are not directly related to the current changes but could improve the overall maintainability of the file in the future.
packages/rs-dapi-client/src/dapi_client.rs (1)
Line range hint
236-246
: HandleNone
cases explicitly when checkingerror.can_retry()
Using
error.can_retry().unwrap_or(false)
treatsNone
asfalse
, which may not reflect the intended behavior for indeterminate retry cases. To prevent unintended consequences, handle theNone
case explicitly.Consider applying this change to handle
None
explicitly:- if !error.can_retry().unwrap_or(false) { + match error.can_retry() { + Some(true) => { + // Error is retryable; no action needed here + }, + Some(false) => { + // Error is not retryable; ban the address if necessary + if applied_settings.ban_failed_address { + let mut address_list = self + .address_list + .write() + .expect("can't get address list for write"); + + address_list.ban_address(&address) + .map_err(DapiClientError::<<R::Client as TransportClient>::Error>::AddressList)?; + } + }, + None => { + // Indeterminate retryability; decide how to handle this case + // Possibly log or take default action + }, + }packages/rs-sdk/src/sdk.rs (3)
641-641
: Incomplete 'FIXME' CommentThe comment at line 641 seems incomplete:
// FIXME: in future, we need to implement t
. Please complete or clarify the intended message.
751-751
: Typo in Documentation CommentIn the documentation comment at line 751, "fefault" should be "default".
919-919
: Typo in Documentation CommentIn the documentation comment at line 919, "fefault" should be "default".
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (7)
- packages/rs-dapi-client/src/dapi_client.rs (3 hunks)
- packages/rs-dapi-client/src/lib.rs (1 hunks)
- packages/rs-dapi-client/src/transport/grpc.rs (2 hunks)
- packages/rs-drive-proof-verifier/src/error.rs (1 hunks)
- packages/rs-sdk/Cargo.toml (1 hunks)
- packages/rs-sdk/src/error.rs (2 hunks)
- packages/rs-sdk/src/sdk.rs (13 hunks)
🧰 Additional context used
🔇 Additional comments (4)
packages/rs-sdk/src/error.rs (2)
8-8
: LGTM: Import statement updated correctly.The addition of
CanRetry
to the import statement is consistent with the new trait implementation and aligns with the PR objectives of improving error handling and retry logic.
Line range hint
1-100
: Overall assessment: Changes align well with PR objectives.The modifications to this file effectively implement the desired improvements in error handling and retry logic. The new
CanRetry
trait implementation provides a clear mechanism for identifying retryable errors, which should enhance the SDK's ability to handle transient issues.packages/rs-drive-proof-verifier/src/error.rs (1)
85-89
: Verify that all error handling covers the newStaleProof
variantThe addition of the
StaleProof
variant to theError
enum may require updates in parts of the codebase whereError
is matched or handled. Ensure that all pattern matches and error handling logic have been updated accordingly to incorporate this new variant.Run the following script to locate all match statements involving
Error
:This will help identify any areas that need to be updated to handle
StaleProof
appropriately.packages/rs-dapi-client/src/dapi_client.rs (1)
42-49
: ClarifyNone
handling incan_retry
methodThe
can_retry
method returnsOption<bool>
, whereNone
indicates indeterminate retry capability. Currently, theMock
variant returnsNone
. Please confirm if returningNone
forMock
errors is intentional, or consider returningSome(true)
orSome(false)
to specify the retry behavior explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (5)
packages/rs-drive-proof-verifier/src/error.rs (1)
95-96
: Minor improvement for consistency in error messagesThe error messages in
StaleProofError
are informative and well-structured. However, there's a small inconsistency in terminology between the two variants:
StaleProofHeight
uses "received" in both the field name and error message.Time
(soon to beStaleProofTime
) uses "received" in the field name but "actual" in the error message.For better consistency, consider using "received" in both error messages.
Suggested change for the
Time
variant's error message:#[error( "stale proof time: expected {expected_timestamp_ms}ms, received {received_timestamp_ms} ms, tolerance {tolerance_ms} ms; try another server" )]This small change will make the error messages more consistent across both variants.
Also applies to: 105-107
packages/rs-sdk/src/sdk.rs (4)
104-121
: LGTM! New fields for stale proof detection.The addition of
previous_proof_height
,proof_height_tolerance
, andproof_time_tolerance_ms
fields to theSdk
struct implements the stale proof detection mechanism as described in the PR objectives. This is a good approach to ensure the freshness of received proofs.Consider adding documentation comments for these new fields to explain their purpose and default values, especially for
proof_time_tolerance_ms
which is set toNone
by default.
577-615
: Approve time verification logic with a suggestion.The
verify_proof_time
function correctly implements the time-based stale proof detection. However, there's a potential issue with using the local system time.Consider using a more reliable time source or implementing a time synchronization mechanism. The current implementation assumes that the local system time is accurate, which may not always be the case. This could lead to false positives or negatives in stale proof detection.
One possible approach is to use a Network Time Protocol (NTP) client to periodically synchronize the local time with a reliable time server. This would help ensure more accurate time comparisons.
742-754
: LGTM! Configuration options for stale proof detection.The addition of
proof_height_tolerance
andproof_time_tolerance_ms
fields to theSdkBuilder
struct, along with their respectivewith_*
methods, provides good flexibility for users to configure the stale proof detection mechanism.In the
with_proof_time_tolerance
method documentation, there's a typo in the word "default". It's written as "fefault". Please correct this typo.- /// This is set to `None` by fefault. + /// This is set to `None` by default.Also applies to: 903-930
1090-1216
: LGTM! Comprehensive test coverage for stale proof detection.The new test module provides good coverage for the
verify_proof_height
andverify_proof_time
functions. The use oftest_matrix
allows for testing multiple scenarios efficiently.Consider adding a few more edge cases to the
test_verify_proof_time
function:
- Test with
u64::MAX
forreceived
andnow
to ensure no overflow issues.- Test with very large tolerance values to ensure they're handled correctly.
These additional tests would help ensure the robustness of the time verification logic under extreme conditions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- packages/rs-dapi-client/src/dapi_client.rs (3 hunks)
- packages/rs-drive-proof-verifier/src/error.rs (1 hunks)
- packages/rs-sdk/src/sdk.rs (13 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/rs-dapi-client/src/dapi_client.rs
🧰 Additional context used
🔇 Additional comments (4)
packages/rs-drive-proof-verifier/src/error.rs (2)
85-89
: LGTM: StaleProof variant added correctlyThe addition of the
StaleProof
variant to theError
enum is well-implemented. The use of#[error(transparent)]
and#[from]
attributes ensures proper error propagation and conversion fromStaleProofError
. This change aligns with the PR objective of detecting stale proofs.
85-115
: Summary: Solid implementation of stale proof detectionThe changes introduced in this file effectively implement the stale proof detection mechanism as outlined in the PR objectives. The new
StaleProof
variant in theError
enum and theStaleProofError
enum provide a robust structure for handling and reporting stale proof errors.The implementation is well-thought-out, with detailed error messages that include all relevant information. The suggested improvements are minor and focus on enhancing consistency and clarity in naming and error messages.
Overall, this is a strong addition to the error handling system that will improve the ability to detect and respond to stale proofs in the system.
packages/rs-sdk/src/sdk.rs (2)
617-679
: LGTM! Robust height verification logic.The
verify_proof_height
function implements a solid approach to height-based stale proof detection. It handles various edge cases well, including:
- Same height proofs
- Initial SDK startup (when
prev
is less thantolerance
)- Concurrent updates from multiple threads
The use of atomic operations ensures thread-safety, which is crucial for this shared state.
Line range hint
1-1216
: Overall LGTM! Solid implementation of stale proof detection.The changes in this file successfully implement the stale proof detection mechanism as described in the PR objectives. Key points:
- The
Sdk
struct now includes fields for tracking and configuring proof staleness.- New helper functions
verify_proof_time
andverify_proof_height
implement the core logic for detecting stale proofs.- The
SdkBuilder
provides configuration options for users to customize the stale proof detection behavior.- Comprehensive test coverage ensures the reliability of the new functionality.
These changes will help mitigate the issue of receiving outdated information when fetching data from nodes that have experienced a chain halt.
To ensure that the new stale proof detection mechanism is properly integrated, let's verify its usage in other parts of the codebase:
This will help confirm that the new functionality is being used correctly throughout the project.
✅ Verification successful
Line range hint
1-1216
: Overall LGTM! Solid implementation of stale proof detection.The changes in this file successfully implement the stale proof detection mechanism as described in the PR objectives. Key points:
- The
Sdk
struct now includes fields for tracking and configuring proof staleness.- New helper functions
verify_proof_time
andverify_proof_height
implement the core logic for detecting stale proofs.- The
SdkBuilder
provides configuration options for users to customize the stale proof detection behavior.- Comprehensive test coverage ensures the reliability of the new functionality.
These changes will help mitigate the issue of receiving outdated information when fetching data from nodes that have experienced a chain halt.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for usage of verify_proof_metadata rg --type rust 'verify_proof_metadata' # Search for usage of with_proof_height_tolerance and with_proof_time_tolerance rg --type rust 'with_proof_height_tolerance|with_proof_time_tolerance'Length of output: 751
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
packages/rs-sdk/src/sdk.rs (4)
243-248
: LGTM: Enhanced proof verification inparse_proof_with_metadata_and_proof
.The addition of the
verify_proof_metadata
call in theparse_proof_with_metadata_and_proof
method improves the robustness of proof verification. This change aligns well with the new fields added to theSdk
struct.Consider adding a comment explaining the purpose of the
verify_proof_metadata
call for better code readability.
577-615
: LGTM: Well-implementedverify_proof_time
function.The
verify_proof_time
function correctly checks if the proof time is within the specified tolerance of the current time. It includes appropriate logging and returns a detailed error when the time is out of tolerance.Consider using a constant for the log level (e.g.,
warn!
) instead of hardcoding it, to make it easier to adjust logging verbosity in the future.
617-677
: LGTM: Well-implementedverify_proof_height
function with proper concurrency handling.The
verify_proof_height
function correctly checks if the proof height is within the specified tolerance of the previous proof height. It uses atomic operations to handle concurrent access to the previous proof height, ensuring thread safety. The function includes appropriate logging and returns a detailed error when the height is out of tolerance.Consider adding a comment explaining the purpose of the while loop (lines 666-674) for better code readability, as the atomic compare-and-swap logic might not be immediately obvious to all readers.
1088-1214
: LGTM: Comprehensive test coverage for new proof verification functionality.The new test module provides excellent coverage for the
verify_proof_height
andverify_proof_time
functions, as well as the behavior of cloned SDK instances with respect to proof height verification. The use of test matrices allows for thorough testing of various input combinations, covering both valid and invalid cases.Consider adding a few more edge cases to the
test_verify_proof_time
test matrix, such as testing with very large time values or the maximum possibleu64
value, to ensure robustness against potential overflow scenarios.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- packages/rs-drive-proof-verifier/src/error.rs (1 hunks)
- packages/rs-sdk/src/sdk.rs (13 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/rs-drive-proof-verifier/src/error.rs
🧰 Additional context used
🔇 Additional comments (4)
packages/rs-sdk/src/sdk.rs (4)
104-121
: LGTM: New fields for proof validation.The addition of
previous_proof_height
,proof_height_tolerance
, andproof_time_tolerance_ms
fields to theSdk
struct enhances the SDK's ability to validate proof freshness. These fields are well-typed and appropriately scoped for their intended use.
250-267
: LGTM: Well-implementedverify_proof_metadata
method.The
verify_proof_metadata
method effectively utilizes the new fields to perform comprehensive proof validation. It checks both proof height and time when the respective tolerances are set, with appropriate error handling.
Line range hint
740-928
: LGTM: Well-implemented additions toSdkBuilder
for configuring proof tolerances.The additions to the
SdkBuilder
struct and its implementation, including the new fieldsproof_height_tolerance
andproof_time_tolerance_ms
, along with their corresponding configuration methodswith_proof_height_tolerance
andwith_proof_time_tolerance
, provide a clean and flexible way to set up proof validation tolerances during SDK initialization. These changes are consistent with the new fields added to theSdk
struct.The comments for
with_proof_time_tolerance
method provide valuable information about the implications of enabling time-based proof checks. This is excellent documentation that will help users make informed decisions when configuring the SDK.
Line range hint
1-1214
: Overall assessment: Excellent implementation of proof staleness detection.The changes introduced in this file significantly enhance the SDK's ability to detect stale proofs received from nodes. The implementation aligns well with the PR objectives and includes:
- New fields in the
Sdk
struct for tracking and configuring proof validation.- Enhanced proof verification in existing methods.
- New functions for verifying proof time and height.
- Updates to the
SdkBuilder
for flexible configuration of proof validation tolerances.- Comprehensive test coverage for the new functionality.
The implementation is thread-safe, well-documented, and provides users with the flexibility to configure proof staleness detection according to their needs. These changes should effectively address the issue of receiving outdated information when fetching data from nodes that have experienced a chain halt.
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (5)
packages/rs-sdk/src/error.rs (2)
70-73
: LGTM: NewStaleNode
variant added correctly.The addition of the
StaleNode
variant to theError
enum aligns well with the PR objective of detecting stale proofs. The use of#[from]
and#[error(transparent)]
attributes is appropriate for error conversion and propagation.Consider adding a brief doc comment to explain the purpose of this variant, e.g.:
/// Error indicating that the node provided stale data #[error(transparent)] StaleNode(#[from] StaleNodeError),
100-125
: LGTM:StaleNodeError
enum defined correctly with comprehensive error handling.The
StaleNodeError
enum is well-defined and aligns perfectly with the PR objective of detecting stale proofs based on height and time. Both variants (Height
andTime
) include detailed error messages and relevant fields, which will be helpful for debugging and error reporting.The use of the
#[error]
attribute for custom error formatting is appropriate, and the inclusion of doc comments for the enum and its fields is commendable.Consider adding a brief example in the doc comment for the
StaleNodeError
enum to illustrate its usage, e.g.:/// Server returned stale metadata /// /// # Examples /// /// ``` /// use your_crate::StaleNodeError; /// /// let height_error = StaleNodeError::Height { /// expected_height: 100, /// received_height: 90, /// tolerance_blocks: 5, /// }; /// /// let time_error = StaleNodeError::Time { /// expected_timestamp_ms: 1000, /// received_timestamp_ms: 900, /// tolerance_ms: 50, /// }; /// ``` #[derive(Debug, thiserror::Error)] pub enum StaleNodeError { // ... (rest of the enum definition) }This addition would enhance the documentation and provide a clear usage example for developers.
packages/rs-dapi-client/src/transport/grpc.rs (1)
Line range hint
120-132
: Approved: Improved error handling and retry logicThe changes to the
CanRetry
trait implementation fordapi_grpc::tonic::Status
are well-thought-out and align with the PR objectives. The newcan_retry
method name is more descriptive, and the inverted logic using thematches!
macro improves readability and maintainability.Consider adding a blank line after the
use dapi_grpc::tonic::Code::*;
statement for better code readability:fn can_retry(&self) -> bool { let code = self.code(); use dapi_grpc::tonic::Code::*; + !matches!( code, Ok | DataLoss | Cancelled | Unknown | DeadlineExceeded | ResourceExhausted | Aborted | Internal | Unavailable ) }
packages/rs-dapi-client/src/dapi_client.rs (2)
Line range hint
236-242
: Re-evaluate Address Banning on Non-Retryable ErrorsIn the error handling block, when
!error.can_retry()
istrue
(i.e., the error is non-retryable), the code bans the failed address:if !error.can_retry() { if applied_settings.ban_failed_address { // Ban the address } }This might lead to banning addresses due to non-recoverable errors, which could be outside the node's control (e.g., client-side issues).
Consider modifying the logic to ban addresses only on retryable errors that persist, indicating a problem with the node:
-if !error.can_retry() { +if error.can_retry() {This change ensures that only addresses causing persistent, retryable errors are banned, improving the robustness of the client.
272-274
: Log Retryable Errors AppropriatelyCurrently, only non-retryable errors are logged at the error level:
if !error.can_retry() { tracing::error!(?error, "request failed"); }Consider also logging retryable errors, perhaps at the
warn
level, to provide visibility into issues that require retries:if !error.can_retry() { tracing::error!(?error, "request failed"); +} else { + tracing::warn!(?error, "retryable error occurred"); }This enhancement aids in monitoring and debugging by highlighting errors that cause retries.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- packages/rs-dapi-client/src/dapi_client.rs (3 hunks)
- packages/rs-dapi-client/src/lib.rs (1 hunks)
- packages/rs-dapi-client/src/transport/grpc.rs (1 hunks)
- packages/rs-sdk/src/error.rs (3 hunks)
- packages/rs-sdk/src/sdk.rs (15 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/rs-dapi-client/src/lib.rs
- packages/rs-sdk/src/sdk.rs
🧰 Additional context used
🔇 Additional comments (3)
packages/rs-sdk/src/error.rs (2)
8-8
: LGTM: Import statement updated correctly.The import statement has been properly updated to include the
CanRetry
trait, which is consistent with its implementation for theError
enum later in the file.
88-98
: LGTM:CanRetry
trait implemented correctly.The implementation of the
CanRetry
trait for theError
enum is well-done. It correctly includes the newStaleNode
error in the list of retryable errors, along with other relevant error types. The use of thematches!
macro is appropriate and efficient for this kind of check.This implementation aligns well with the PR objective of improving retry logic in the SDK.
packages/rs-dapi-client/src/dapi_client.rs (1)
42-47
: Verifycan_retry
Logic for All Error VariantsThe
can_retry
method returnstrue
forNoAvailableAddresses
andAddressList
errors, and delegates totransport_error.can_retry()
forTransport
errors. ForMock
errors, it returnsfalse
.Please confirm that this logic aligns with the intended retry policy:
NoAvailableAddresses
: Retrying may not be effective if there are truly no available addresses. Consider whether it should returnfalse
to prevent unnecessary retries.AddressList
Errors: Assess if these errors are recoverable through retries.Ensuring the
can_retry
logic accurately reflects which errors are retryable will prevent futile retries and potential resource wastage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (9)
packages/rs-dapi-client/src/lib.rs (1)
74-85
: Excellent improvements to theCanRetry
trait!The changes to the
CanRetry
trait are well-implemented and improve its functionality:
- The new
can_retry
method provides a clear and simple interface.- The deprecation of
is_node_failure
is handled gracefully, maintaining backward compatibility while encouraging the use of the new method.- The trait documentation has been updated to reflect these changes.
These modifications enhance the trait's clarity and align well with Rust best practices.
Consider adding a brief explanation in the trait-level documentation about when an operation might not be retryable. This could provide valuable context for users of the trait.
/// Returns true if the operation can be retried. +/// +/// Some operations may not be safely retried due to potential side effects or other constraints. +/// Implementors of this trait should carefully consider the conditions under which an operation +/// can be safely retried. pub trait CanRetry { /// Returns true if the operation can be retried safely. fn can_retry(&self) -> bool;packages/rs-sdk/Cargo.toml (1)
9-9
: LGTM! Consider adding a comment for clarity.The addition of
chrono
as a regular dependency is appropriate given the new time-based proof validation feature mentioned in the PR description. The version specified (0.4.38) is recent and exact, which is good for reproducibility.Consider adding a brief comment explaining why
chrono
is needed, e.g.:# Required for time-based proof validation chrono = { version = "0.4.38" }This will help future maintainers understand the purpose of this dependency.
packages/rs-sdk/src/error.rs (3)
70-73
: LGTM: NewStaleNode
variant added correctly.The
StaleNode
variant has been appropriately added to theError
enum, using the#[from]
attribute for automatic conversion fromStaleNodeError
. This addition aligns well with the PR objective of detecting stale proofs from nodes.Consider adding a brief doc comment to explain the purpose of this new error variant, for example:
/// Error indicating that the node provided stale data #[error(transparent)] StaleNode(#[from] StaleNodeError),
88-98
: LGTM:CanRetry
trait implemented correctly.The
CanRetry
trait has been properly implemented for theError
enum. Thecan_retry
method correctly identifies which error types are eligible for retry attempts.To improve maintainability, consider defining a constant array of retryable error variants:
impl CanRetry for Error { const RETRYABLE_ERRORS: [Error; 4] = [ Error::StaleNode(..), Error::DapiClientError(_), Error::CoreClientError(_), Error::TimeoutReached(_, _), ]; fn can_retry(&self) -> bool { Self::RETRYABLE_ERRORS.iter().any(|e| std::mem::discriminant(e) == std::mem::discriminant(self)) } }This approach centralizes the list of retryable errors, making it easier to maintain and update in the future.
100-125
: LGTM:StaleNodeError
enum added with comprehensive error information.The
StaleNodeError
enum has been well-designed to provide detailed information about stale node errors. BothHeight
andTime
variants include all necessary fields to understand the nature of the staleness.For consistency with Rust naming conventions, consider changing the field names in the
Time
variant to snake_case:Time { /// Expected time in milliseconds - is local time when the message was received expected_timestamp_ms: u64, /// Time received from the server in the message, in milliseconds received_timestamp_ms: u64, /// Tolerance in milliseconds tolerance_ms: u64, }This change would make the field names consistent with those in the
Height
variant.packages/rs-dapi-client/src/transport/grpc.rs (1)
Line range hint
120-133
: Approve changes with a minor suggestion for improvementThe refactoring of the
CanRetry
implementation fordapi_grpc::tonic::Status
is well done. The new method namecan_retry
is more descriptive and aligns better with the trait name. The inverted logic using thematches!
macro improves readability and maintainability.However, to further enhance code organization and readability, consider extracting the list of status codes that prevent retrying into a constant or a separate function. This would make the main
can_retry
function more concise and easier to understand at a glance.Here's a suggested refactoring:
impl CanRetry for dapi_grpc::tonic::Status { fn can_retry(&self) -> bool { !is_non_retryable_status(self.code()) } } fn is_non_retryable_status(code: tonic::Code) -> bool { use dapi_grpc::tonic::Code::*; matches!( code, Ok | DataLoss | Cancelled | Unknown | DeadlineExceeded | ResourceExhausted | Aborted | Internal | Unavailable ) }This refactoring separates the concerns of determining which status codes are non-retryable from the main
can_retry
logic, making the code more modular and easier to maintain.packages/rs-sdk/src/sdk.rs (3)
762-762
: Typo in Error MessageThere's a typo in the error message:
"data conttact cache size must be positive"
. It should be"data contract cache size must be positive"
.Apply this diff to fix the typo:
.expect("data conttact cache size must be positive"), +.expect("data contract cache size must be positive"),
888-922
: Clarify Documentation for Tolerance MethodsThe documentation for
with_height_tolerance
andwith_time_tolerance
methods is comprehensive. However, consider adding examples to demonstrate how to use these methods effectively and explain the implications of different tolerance settings on staleness detection.
1081-1218
: Test Coverage for Edge CasesThe added tests in the
test
module are valuable for verifying the staleness logic. To enhance test coverage, consider adding more edge case scenarios, such as:
- Testing the behavior when
metadata_height_tolerance
ormetadata_time_tolerance_ms
isNone
.- Simulating scenarios where the
expected_height
is exactly equal totolerance
.- Verifying the behavior when system time moves backward.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
- packages/rs-dapi-client/src/dapi_client.rs (3 hunks)
- packages/rs-dapi-client/src/lib.rs (1 hunks)
- packages/rs-dapi-client/src/transport/grpc.rs (1 hunks)
- packages/rs-sdk/Cargo.toml (1 hunks)
- packages/rs-sdk/src/error.rs (3 hunks)
- packages/rs-sdk/src/sdk.rs (15 hunks)
🧰 Additional context used
🔇 Additional comments (6)
packages/rs-sdk/src/error.rs (2)
8-8
: LGTM: Import statement updated correctly.The import statement has been appropriately updated to include
CanRetry
along withDapiClientError
. This change aligns with the newCanRetry
trait implementation for theError
enum.
Line range hint
1-125
: Summary: Excellent implementation of stale node detection and error handling.The changes in this file effectively implement the stale node detection mechanism as described in the PR objectives. The new
StaleNodeError
enum, along with theStaleNode
variant in theError
enum, provides a robust way to handle and report stale node issues. The implementation of theCanRetry
trait enhances the SDK's ability to handle retryable errors gracefully.These modifications will significantly improve the SDK's resilience when dealing with nodes that may have experienced a chain halt or are otherwise out of sync. The clear error messages and detailed error information will aid in debugging and maintaining the system.
Overall, the changes are well-implemented, properly documented, and align closely with the PR objectives. Great job!
packages/rs-dapi-client/src/dapi_client.rs (2)
42-47
: Implementation ofcan_retry()
is appropriateThe
can_retry()
method forDapiClientError
correctly determines the retryability of different error types, which is essential for the retry logic.
272-274
: Logging non-retryable errors enhances observabilityAppropriately logging errors that are not retryable helps in monitoring and diagnosing issues that cannot be automatically recovered.
packages/rs-sdk/src/sdk.rs (2)
134-136
: Ensure Proper Cloning of New Fields inClone
ImplementationThe
Clone
implementation now includes cloning of the new fieldsmetadata_last_seen_height
,metadata_height_tolerance
, andmetadata_time_tolerance_ms
. Verify that these fields are correctly cloned to prevent unintended sharing or concurrency issues, especially with the atomic variablemetadata_last_seen_height
.
965-969
: Initializemetadata_last_seen_height
AppropriatelyWhen initializing
metadata_last_seen_height
withAtomicU64::new(0)
, consider whether starting at zero is appropriate. If zero is not a valid or meaningful height, this could affect staleness checks. Ensure that the initial value aligns with the expected behavior when the SDK starts operating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
packages/rs-sdk/src/error.rs (3)
70-73
: LGTM: NewStaleNode
variant added correctly.The addition of the
StaleNode
variant to theError
enum is appropriate and aligns with the PR objective of detecting stale nodes. The use of#[from]
attribute for automatic conversion fromStaleNodeError
is a good practice.Consider updating the error message to be more specific:
- #[error(transparent)] + #[error("Stale node detected: {0}")] StaleNode(#[from] StaleNodeError),This change would provide more context when the error is displayed.
94-119
: LGTM:StaleNodeError
enum is well-designed and informative.The new
StaleNodeError
enum is a great addition, providing specific information about stale node errors. The separate variants forHeight
andTime
allow for precise error reporting, and the error messages are descriptive and include all necessary information.Consider moving the
StaleNodeError
enum definition above theError
enum in the file. This would improve code organization by defining the more specific error type before it's used in the mainError
enum.// Move this block above the Error enum definition /// Server returned stale metadata #[derive(Debug, thiserror::Error)] pub enum StaleNodeError { // ... (rest of the enum definition) } // Then keep the Error enum as is /// Error type for the SDK #[derive(Debug, thiserror::Error)] pub enum Error { // ... (rest of the enum definition) }This change would enhance code readability and follow the convention of defining more specific types before their usage.
Line range hint
1-119
: Overall, the changes look great and achieve the PR objectives.The modifications to
error.rs
successfully introduce new error types and handling mechanisms for stale nodes, aligning well with the PR objectives. The code is well-structured, follows Rust best practices, and improves error handling by providing more detailed information about stale node errors.To further enhance the error handling capabilities, consider adding a logging mechanism for these errors. This could help with debugging and monitoring in production environments. For example:
use log::error; impl Error { pub fn log(&self) { match self { Error::StaleNode(stale_error) => { error!("Stale node detected: {}", stale_error); } // Add other error types as needed _ => { error!("An error occurred: {}", self); } } } }This addition would allow for consistent logging of errors throughout the SDK, which could be valuable for troubleshooting and monitoring.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- packages/rs-dapi-client/src/dapi_client.rs (3 hunks)
- packages/rs-sdk/src/error.rs (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/rs-dapi-client/src/dapi_client.rs
🧰 Additional context used
🔇 Additional comments (2)
packages/rs-sdk/src/error.rs (2)
8-8
: LGTM: Import statement updated correctly.The import statement has been properly updated to include the
CanRetry
trait, which is necessary for the new functionality being added to theError
enum.
88-92
: Implementation looks good, but let's revisit the retry logic.The implementation of the
CanRetry
trait for theError
enum is well-structured and allows for more nuanced error handling. However, there's a potential concern regarding which errors should be retryable.Based on a previous comment by shumkov, there was a suggestion to limit retries to
Error::StaleNode
andError::TimeoutReached
. The current implementation follows this suggestion. However, we should verify if this is still the desired behavior or if we need to adjust the retry logic for other error types.Could you confirm if the current retry logic aligns with the team's latest decision on error handling?
Issue being fixed or feature implemented
When fetching information from a node that has chain halt, we receive outdated information.
What was done?
Implemented two mechanisms to detect stale nodes:
How Has This Been Tested?
Added tests.
Breaking Changes
Queries to nodes that are out of sync return
StaleNodeError
instead of outdated information.rs_dapi_client::CanRetry::is_node_failure()
is deprecated in favor ofcan_retry()
.Checklist:
For repository code-owners and collaborators only
Summary by CodeRabbit
New Features
StaleNodeError
type for detailed reporting of stale metadata issues.Bug Fixes
Documentation
Tests