Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure received certificates' messages are delivered. #2696

Merged
merged 3 commits into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 10 additions & 18 deletions linera-core/src/client/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -955,9 +955,6 @@ where
);
}
if self.state().has_other_owners(&info.manager.ownership) {
let mutex = self.state().client_mutex();
let _guard = mutex.lock_owned().await;

// For chains with any owner other than ourselves, we could be missing recent
// certificates created by other owners. Further synchronize blocks from the network.
// This is a best-effort that depends on network conditions.
Expand Down Expand Up @@ -1233,7 +1230,7 @@ where
// `advance_with_local` might have drained the whole `block_batch`.
// In that case, move to the next chain batch, but remember to wait for
// the messages to be delivered to the inboxes.
other_sender_chains.push((chain_id, last_height));
other_sender_chains.push(chain_id);
continue;
};
let batch_size = last_height.saturating_sub(first_height).0 + 1;
Expand Down Expand Up @@ -1396,17 +1393,17 @@ where
return;
}
}
for (chain_id, height) in other_sender_chains {
for chain_id in other_sender_chains {
// Certificates for this chain were omitted from `certificates` because they were
// already processed locally. If they were processed in a concurrent task, it is not
// guaranteed that their cross-chain messages were already handled.
if let Err(error) = self
.client
.local_node
.wait_for_outgoing_messages(chain_id, height)
.retry_pending_cross_chain_requests(chain_id)
deuszx marked this conversation as resolved.
Show resolved Hide resolved
.await
{
error!(
"Failed trying to wait for outgoing messages from {chain_id} \
up to {height}: {error}"
);
error!("Failed to retry outgoing messages from {chain_id}: {error}");
}
}
// Update tracker.
Expand Down Expand Up @@ -3236,16 +3233,11 @@ where
&self,
remote_node: RemoteNode<P::Node>,
) -> Result<(), ChainClientError> {
let mutex = self.state().client_mutex();
let _guard = mutex.lock_owned().await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I don't know why we can remove it now. It protected from concurrent calls to synchronize_received_certificates_from_validator - why do we not care about that anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also protected from at the same time doing

  • synchronize_received_certificates, and
  • processing a notification about a received certificate from a validator.

That's what can cause the issue described in #2692 and it was the only reason these were added in #2567.

With the retry_… call, this scenario should now work fine.


let chain_id = self.chain_id;
// Proceed to downloading received certificates.
let received_certificates = self
.synchronize_received_certificates_from_validator(chain_id, &remote_node)
.await?;

drop(_guard);
// Process received certificates. If the client state has changed during the
// network calls, we should still be fine.
self.receive_certificates_from_validator(received_certificates)
Expand Down Expand Up @@ -3287,7 +3279,7 @@ struct ReceivedCertificatesFromValidator {
/// The downloaded certificates. The signatures were already checked and they are ready
/// to be processed.
certificates: Vec<Certificate>,
/// Sender chains that were already up to date locally. We need to wait for their messages
/// to be delivered, at least up to the given block height.
other_sender_chains: Vec<(ChainId, BlockHeight)>,
/// Sender chains that were already up to date locally. We need to ensure their messages
/// are delivered.
other_sender_chains: Vec<ChainId>,
}
16 changes: 0 additions & 16 deletions linera-core/src/local_node.rs
Original file line number Diff line number Diff line change
Expand Up @@ -333,22 +333,6 @@ where
info
}

/// Returns only after the outbox of the given chain does not contain any entries up to
/// `height` anymore.
#[tracing::instrument(level = "trace", skip_all)]
pub async fn wait_for_outgoing_messages(
&self,
chain_id: ChainId,
height: BlockHeight,
) -> Result<(), LocalNodeError> {
// TODO(#2692): Implement this, once #2689 is merged.
warn!(
"Not waiting for outgoing messages from {chain_id:.8} up to height {height}: \
not implemented yet."
);
Ok(())
}

/// Returns a read-only view of the [`ChainStateView`] of a chain referenced by its
/// [`ChainId`].
///
Expand Down
Loading