Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn ChainManager's pending_blobs into proposed/locked specific pending blobs #2813

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ndr-ds
Copy link
Contributor

@ndr-ds ndr-ds commented Nov 4, 2024

Motivation

We were adding stuff to the chain manager's pending blobs, but never removing them. We were also not properly associating the lifecycle of the pending blobs with the locked and proposed blocks

Proposal

Associate the lifecycle of the pending blobs with the locked and proposed blocks, separately

Test Plan

CI + will add some tests

Release Plan

  • These changes should be backported to the latest devnet branch
  • These changes should be backported to the latest testnet branch

Copy link
Contributor Author

ndr-ds commented Nov 4, 2024

@ndr-ds ndr-ds force-pushed the 11-01-split_rust_ci_into_more_jobs branch from dfa4908 to 5450a3b Compare November 5, 2024 01:49
@ndr-ds ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from 8c70aac to cfc31b3 Compare November 5, 2024 01:49
@ndr-ds ndr-ds force-pushed the 11-01-split_rust_ci_into_more_jobs branch from 5450a3b to f288c9f Compare November 5, 2024 12:30
@ndr-ds ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from cfc31b3 to 7da2190 Compare November 5, 2024 12:30
@ndr-ds ndr-ds force-pushed the 11-01-split_rust_ci_into_more_jobs branch from f288c9f to f1966ea Compare November 5, 2024 14:26
@ndr-ds ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from 7da2190 to 5cbadfd Compare November 5, 2024 15:13
@ndr-ds ndr-ds changed the base branch from 11-01-split_rust_ci_into_more_jobs to graphite-base/2813 November 5, 2024 15:20
@ndr-ds ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from 5cbadfd to fee8cc8 Compare November 5, 2024 15:22
@ndr-ds ndr-ds marked this pull request as ready for review November 5, 2024 15:22
@ndr-ds ndr-ds requested review from afck, deuszx and ma2bd November 5, 2024 15:23
@ndr-ds ndr-ds changed the base branch from graphite-base/2813 to main November 5, 2024 15:23
@ndr-ds ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch 2 times, most recently from 80f5fc2 to fee8cc8 Compare November 5, 2024 15:23
requested_locked: None,
locked_pending_blobs: BTreeMap::new(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to turn these into Vec<Blob>s? (In the ChainManagerInfo, not in the ChainManager.) The recipient can't trust that the blob ID → blob association is correct anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Good point

linera-core/src/client/mod.rs Outdated Show resolved Hide resolved
linera-core/src/remote_node.rs Outdated Show resolved Hide resolved
@ndr-ds ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from fee8cc8 to 8205540 Compare November 5, 2024 20:35
@ndr-ds ndr-ds requested a review from afck November 5, 2024 20:40
@ndr-ds ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from 8205540 to e00276e Compare November 5, 2024 21:02
linera-chain/src/manager.rs Outdated Show resolved Hide resolved
linera-chain/src/manager.rs Outdated Show resolved Hide resolved
linera-core/src/remote_node.rs Outdated Show resolved Hide resolved
linera-core/src/local_node.rs Show resolved Hide resolved
linera-core/src/chain_worker/state/mod.rs Outdated Show resolved Hide resolved
linera-core/src/client/mod.rs Outdated Show resolved Hide resolved
linera-chain/src/manager.rs Outdated Show resolved Hide resolved
@ndr-ds ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from e00276e to dd4bb33 Compare November 6, 2024 19:39
@@ -305,12 +305,12 @@ where
}

/// Returns an error if the block requires a blob we don't have.
/// Looks for the blob in: chain manager's pending blobs and storage.
/// Looks for the blob in: chain manager's proposed blobs and storage.
async fn check_no_missing_blobs(
&self,
required_blob_ids: &HashSet<BlobId>,
) -> Result<(), WorkerError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this PR but instead of returning the missing_blob_ids as the error, we could instead return Vec<BlobId> from this method and it'd contain all the missing blobs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then error out on the caller? I'm not sure I get it 🤔 We have retry logic that relies on the missing blobs being returned as this error, so we can't remove it AFAIU

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller would check if the returned vector is non-empty and act as required.

linera-chain/src/manager.rs Outdated Show resolved Hide resolved
@@ -305,12 +305,12 @@ where
}

/// Returns an error if the block requires a blob we don't have.
/// Looks for the blob in: chain manager's pending blobs and storage.
/// Looks for the blob in: chain manager's proposed blobs and storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC in storage we have blobs from confirmed blocks and in the proposed_blobs we have blobs that haven't been validated even (they'd be part of locked_blobs). Isn't it possible that the blobs from proposed_blobs will be forgotten (if we confirm another proposal)? Yet the caller of this method might be told that we do have all the blobs (which we'll later delete). Isn't that a problem?

Copy link
Contributor

@afck afck Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blobs that haven't been validated even

(They have been validated by us, but not yet by a quorum, yes.)

I'm also a bit confused, yes.

Whenever we set the proposed block, we need to make sure that the blobs we store with the proposal are exactly the ones published by it. (Whether they were provided by the client or we already had them. But maybe we should just require for now that the client provide them all regardless.)

And analogously for the locked block: We need to store with it all the blobs that it publishes or reads! (Read blobs that we already have in storage we can probably omit, since we currently never remove blobs from storage, and since that might end up being too many blobs for the chain manager.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Ah, I guess in the second case we never need to store any blobs in the chain manager that aren't being published by this block, because we presumably already asked for and processed a confirmed block that used them?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess now that we don't have proposed_blobs anymore and we just point to the blobs in the BlockProposal, we should be fine here, right? Or am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right:

  • We should check that the blobs provided in the proposal are all published by that block (and don't contain duplicates).
  • The published blobs that aren't provided in the proposal must be confirmed for us locally.

That's both already the case, is it?

Copy link
Contributor

@afck afck Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're controlling how it gets created

In a distributed fault-tolerant network, you can never trust what you receive from another node! In general, validators can't trust clients and vice-versa. Concretely, someone could run a modified client or an alternative implementation.

And that method is run on the client side, if I understand correctly, before sending the proposal over the network.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, of course, my bad. I'll add the duplicated blobs check!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on the server side we also always get the blobs for the block either by calling published_blob_ids or required_blob_ids. So I don't think we'll have duplicates.
The question is: should these functions error out if they detect a duplicate? For published_blob_ids I would say maybe, but for required_blob_ids I don't think so

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still confused: In ChainWorkerState::check_for_unneeded_blobs, at least, we don't seem to check for duplicates. It would be easy to add, though, by replacing contains with remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I finally understand what you mean 😅 Good point! Will add. I think I'll do it differently though so we can have separate errors for unneeded and duplicated

Err(NodeError::InvalidChainInfoResponse)
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method needs a comprehensive test coverage with each test case clearly showing what we're trying to achieve. I don't think the "requirements"/rules for these are written down anywhere? i.e. It should be clear to the team (and based on the code) when a blob is expected to be in one of the collections (locked_blobs, proposed_blobs, storage).

@ndr-ds ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch 2 times, most recently from dc5272b to 0e33066 Compare November 12, 2024 04:00
linera-chain/src/manager.rs Outdated Show resolved Hide resolved
@@ -305,12 +305,12 @@ where
}

/// Returns an error if the block requires a blob we don't have.
/// Looks for the blob in: chain manager's pending blobs and storage.
/// Looks for the blob in: chain manager's proposed blobs and storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right:

  • We should check that the blobs provided in the proposal are all published by that block (and don't contain duplicates).
  • The published blobs that aren't provided in the proposal must be confirmed for us locally.

That's both already the case, is it?

@ndr-ds ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from 0e33066 to b85568e Compare November 12, 2024 14:06
linera-core/src/client/mod.rs Outdated Show resolved Hide resolved
@@ -305,12 +305,12 @@ where
}

/// Returns an error if the block requires a blob we don't have.
/// Looks for the blob in: chain manager's pending blobs and storage.
/// Looks for the blob in: chain manager's proposed blobs and storage.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so. We create the proposal's blobs based on the blobs being published by that block. We also enforce that all blobs being published must be provided in the proposal, which means they must be in some pending location locally when proposing the block.
Right now, if we're trying to publish a blob again that is already confirmed in storage, we do technically need to add that blob to pending. Later on that'll be optimized away as we will ignore blobs that are already in storage, but maybe we can do that optimization earlier as well. Let me try that.

@ndr-ds ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch 4 times, most recently from f178427 to 2e47cbb Compare November 12, 2024 17:32
}
}

#[allow(clippy::result_large_err)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed? I thought I boxed all the large error fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! It's not, will remove

.await?
.manager
.get()
.locked_blobs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the chain manager now always contain all blobs the block uses, or only the published ones? I think we went with the latter, didn't we? Then we also need to read them from storage here after all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chain manager's proposed blobs are always the published ones. The locked blobs have all the used ones, AFAIU. Maybe both should just have the published ones? 🤔

@@ -305,12 +305,12 @@ where
}

/// Returns an error if the block requires a blob we don't have.
/// Looks for the blob in: chain manager's pending blobs and storage.
/// Looks for the blob in: chain manager's proposed blobs and storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still confused: In ChainWorkerState::check_for_unneeded_blobs, at least, we don't seem to check for duplicates. It would be easy to add, though, by replacing contains with remove.

round,
*cert,
&key_pair,
manager.locked_blobs,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're gonna keep all used blobs in the locked blobs, we actually need to distinct here between published and used locked blobs 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, good point, this should filter them and use only the published ones.

@ndr-ds ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch 2 times, most recently from b212fc5 to 8e30f57 Compare November 14, 2024 14:40
timeout_vote: None,
fallback_vote: None,
round_timeout,
current_round,
fallback_owners,
pending_blobs: BTreeMap::new(),
})
}

/// Returns the most recent vote we cast.
pub fn pending(&self) -> Option<&Vote> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should rename this function, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Will do

/// These are the published blobs belonging to the locked block.
#[debug(skip_if = Vec::is_empty)]
pub locked_published_blobs: Vec<Blob>,
/// These are the used blobs belonging to the locked block.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Maybe remove all the These are.)

linera-chain/src/manager.rs Show resolved Hide resolved
.difference(&proposal.content.block.published_blob_ids())
.cloned()
.collect::<Vec<_>>();
let used_blobs = self.state.storage.read_blobs(&used_blob_ids).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid reading those if the proposal doesn't have a validated block.

.collect();
if !missing_blob_ids.is_empty() {
return Err(WorkerError::BlobsNotFound(missing_blob_ids));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(In theory this should be unreachable, shouldn't it?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, yes, but I think if for some weird reason this happens we should definitely at least error out 😅

/// and checks that they are otherwise in storage.
async fn get_blobs_and_checks_storage(
/// Returns the blobs requested by their `blob_ids` that are in the chain manager's locked blobs
/// and check that they are otherwise in storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// and check that they are otherwise in storage.
/// and checks that they are otherwise in storage.

.pending_blobs
let manager = chain_state_view.manager.get();
manager
.locked_published_blobs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is unnecessary: At the (only) call site, in ChainClient::propose_block, we only get into this situation when we already just looked up in the chain manager that we are re-proposing a locked block. In that case, just use the corresponding blobs that are already in the chain manager. Only call read_local_blobs in the else case, when we do BlockProposal::new_initial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

.get(&blob_id)
.or_else(|| manager.locked_used_blobs.get(&blob_id))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is also unnecessary: We are looking for the published blobs, aren't we?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should create an issue, though: If the locked block is using blobs that we don't have in storage, we need to find another way to send them to the validator.
(Currently the validator wouldn't accept them, I think, because it expects the blobs in BlockProposal to be only the published ones?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole maybe_blob block can be removed I think. Since this will now only be called for a new block proposal, when this maybe_blob block is removed then that means this will only fail if someone forgets to call add_pending_blobs to add this blob to the ChainClient's pending blobs, which is the expected behavior, is it not?

round,
*cert,
&key_pair,
manager.locked_published_blobs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, we're already doing that anyway! Then the change to read_local_blobs isn't necessary, and the call should move into the else branch here, shouldn't it?

linera-core/src/remote_node.rs Show resolved Hide resolved
@ndr-ds ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch 2 times, most recently from de5ae14 to 3b55663 Compare November 15, 2024 02:47
@@ -385,36 +389,46 @@ impl ChainManager {
Ok(Outcome::Accept)
}

/// If the validated block certificate is more recent, return that certificate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// If the validated block certificate is more recent, return that certificate
/// If the validated block certificate is more recent, returns that certificate

@@ -385,36 +389,46 @@ impl ChainManager {
Ok(Outcome::Accept)
}

/// If the validated block certificate is more recent, return that certificate
/// so that the locked block can be updated.
pub fn maybe_update_locked(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually update the locked block. Maybe validated_block_if_newer_than_locked?

validated_block_certificate: Option<LiteCertificate<'static>>,
executed_block: ExecutedBlock,
) -> Option<Certificate> {
if let Some(lite_cert) = validated_block_certificate {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Maybe return early.)

Suggested change
if let Some(lite_cert) = validated_block_certificate {
let lite_cert = validated_block_certificate?;

}
}
}
let ProposalContent { round, .. } = proposal_content;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? Now we pass in the same block twice, basically.

Copy link
Contributor Author

@ndr-ds ndr-ds Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now instead of building the ExecutedBlock here we do it outside the function. This was done to make the whole maybe_update_locked work and not pass used_blobs that won't be used here. Maybe there's a better way? This was the best I could think of.
I could also pass in the round, the content and the blobs as separate arguments, instead of the whole proposal
Nvm, can't do that 🤔 Open to suggestions on this!

/// are still pending in the validator's chain manager. Returns `None` if not all of them
/// could be found.
/// Checks that the given blobs are present in the provided lists of locked blobs from the
/// chain manager. If they're not there, try to download the missing ones from this validator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// chain manager. If they're not there, try to download the missing ones from this validator.
/// chain manager. If they're not there, tries to download the missing ones from this validator.

@@ -71,6 +73,12 @@ pub enum LocalNodeError {

#[error("Blobs not found: {0:?}")]
BlobsNotFound(Vec<BlobId>),

#[error("A validator requested blobs that are not required: {0:?}")]
UnrequiredBlobsRequested(Vec<BlobId>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then that's not a LocalNodeError but a NodeError. I tried to move these things to the right place in #2895; let's not mix them up again.

Copy link
Contributor Author

@ndr-ds ndr-ds Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right 🤦🏻‍♂️ my bad! We do the check in the local node but it is related to the validator. Was trying to code past the brain mush stage, seems like it was a bad idea 😅

UnrequiredBlobsRequested(Vec<BlobId>),

#[error("Blobs requested by validator contain duplicates")]
DuplicateBlobsRequested,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Actually, even added exactly those two error variants to NodeError in #2895. Please check if we're doing these tests already, and don't re-introduce them to local_node.

@ndr-ds ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch 3 times, most recently from 74ceba0 to 73fc36f Compare November 15, 2024 14:40
@ndr-ds ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from 73fc36f to 523ac87 Compare November 15, 2024 14:47
Comment on lines +228 to +230
return Err(NodeError::LocalError {
error: LocalNodeError::BlobsNotFound(missing_blob_ids).to_string(),
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should be fine, right? The validator was missing some blobs, then we try to find them in the local node, and some are also missing from the local node, so we return a local error 🤔

@ndr-ds ndr-ds requested review from afck and deuszx November 15, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants