Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure received certificates' messages are delivered. #2696

Merged
merged 3 commits into from
Oct 24, 2024

Conversation

afck
Copy link
Contributor

@afck afck commented Oct 23, 2024

Motivation

Due to #2692 process_inbox can in some cases return without processing a message, even though a quorum of validators already had the sending certificate.

The client mutex fixed this in most cases, but causes unnecessary contention.

Proposal

Ensure that the messages are delivered using retry_pending_cross_chain_requests.

Test Plan

The tests should not be flaky anymore despite the removal of the two client locks.

Release Plan

  • These changes should be backported to the latest devnet branch, then
    • be released in a new SDK.
  • These changes should be backported to the latest testnet branch, then
    • be released in a new SDK.

Links

@afck afck requested review from ma2bd, ndr-ds, jvff and deuszx October 23, 2024 14:01
@@ -3236,16 +3233,11 @@ where
&self,
remote_node: RemoteNode<P::Node>,
) -> Result<(), ChainClientError> {
let mutex = self.state().client_mutex();
let _guard = mutex.lock_owned().await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I don't know why we can remove it now. It protected from concurrent calls to synchronize_received_certificates_from_validator - why do we not care about that anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also protected from at the same time doing

  • synchronize_received_certificates, and
  • processing a notification about a received certificate from a validator.

That's what can cause the issue described in #2692 and it was the only reason these were added in #2567.

With the retry_… call, this scenario should now work fine.

@afck afck merged commit 2b70b99 into linera-io:main Oct 24, 2024
4 checks passed
@afck afck deleted the ensure-received-messages branch October 24, 2024 12:02
deuszx pushed a commit that referenced this pull request Oct 24, 2024
* Ensure received certificates' messages are delivered.

* Remove unnecessary client mutex locking.

* Add a comment: Why are we retrying?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wait for outgoing messages when syncing received certificates from validators.
2 participants