-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turn ChainManager's pending_blobs into proposed/locked specific pending blobs #2813
base: main
Are you sure you want to change the base?
Turn ChainManager's pending_blobs into proposed/locked specific pending blobs #2813
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
dfa4908
to
5450a3b
Compare
8c70aac
to
cfc31b3
Compare
5450a3b
to
f288c9f
Compare
cfc31b3
to
7da2190
Compare
f288c9f
to
f1966ea
Compare
7da2190
to
5cbadfd
Compare
5cbadfd
to
fee8cc8
Compare
f1966ea
to
4aa66d4
Compare
80f5fc2
to
fee8cc8
Compare
linera-chain/src/manager.rs
Outdated
requested_locked: None, | ||
locked_pending_blobs: BTreeMap::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to turn these into Vec<Blob>
s? (In the ChainManagerInfo
, not in the ChainManager
.) The recipient can't trust that the blob ID → blob association is correct anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Good point
fee8cc8
to
8205540
Compare
8205540
to
e00276e
Compare
e00276e
to
dd4bb33
Compare
@@ -305,12 +305,12 @@ where | |||
} | |||
|
|||
/// Returns an error if the block requires a blob we don't have. | |||
/// Looks for the blob in: chain manager's pending blobs and storage. | |||
/// Looks for the blob in: chain manager's proposed blobs and storage. | |||
async fn check_no_missing_blobs( | |||
&self, | |||
required_blob_ids: &HashSet<BlobId>, | |||
) -> Result<(), WorkerError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this PR but instead of returning the missing_blob_ids
as the error, we could instead return Vec<BlobId>
from this method and it'd contain all the missing blobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then error out on the caller? I'm not sure I get it 🤔 We have retry logic that relies on the missing blobs being returned as this error, so we can't remove it AFAIU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller would check if the returned vector is non-empty and act as required.
@@ -305,12 +305,12 @@ where | |||
} | |||
|
|||
/// Returns an error if the block requires a blob we don't have. | |||
/// Looks for the blob in: chain manager's pending blobs and storage. | |||
/// Looks for the blob in: chain manager's proposed blobs and storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC in storage we have blobs from confirmed blocks and in the proposed_blobs
we have blobs that haven't been validated even (they'd be part of locked_blobs
). Isn't it possible that the blobs from proposed_blobs
will be forgotten (if we confirm another proposal)? Yet the caller of this method might be told that we do have all the blobs (which we'll later delete). Isn't that a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blobs that haven't been validated even
(They have been validated by us, but not yet by a quorum, yes.)
I'm also a bit confused, yes.
Whenever we set the proposed block, we need to make sure that the blobs we store with the proposal are exactly the ones published by it. (Whether they were provided by the client or we already had them. But maybe we should just require for now that the client provide them all regardless.)
And analogously for the locked block: We need to store with it all the blobs that it publishes or reads! (Read blobs that we already have in storage we can probably omit, since we currently never remove blobs from storage, and since that might end up being too many blobs for the chain manager.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Ah, I guess in the second case we never need to store any blobs in the chain manager that aren't being published by this block, because we presumably already asked for and processed a confirmed block that used them?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess now that we don't have proposed_blobs
anymore and we just point to the blobs in the BlockProposal
, we should be fine here, right? Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right:
- We should check that the blobs provided in the proposal are all published by that block (and don't contain duplicates).
- The published blobs that aren't provided in the proposal must be confirmed for us locally.
That's both already the case, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we're controlling how it gets created
In a distributed fault-tolerant network, you can never trust what you receive from another node! In general, validators can't trust clients and vice-versa. Concretely, someone could run a modified client or an alternative implementation.
And that method is run on the client side, if I understand correctly, before sending the proposal over the network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, of course, my bad. I'll add the duplicated blobs check!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, on the server side we also always get the blobs for the block either by calling published_blob_ids or required_blob_ids. So I don't think we'll have duplicates.
The question is: should these functions error out if they detect a duplicate? For published_blob_ids
I would say maybe, but for required_blob_ids
I don't think so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm still confused: In ChainWorkerState::check_for_unneeded_blobs
, at least, we don't seem to check for duplicates. It would be easy to add, though, by replacing contains
with remove
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I finally understand what you mean 😅 Good point! Will add. I think I'll do it differently though so we can have separate errors for unneeded and duplicated
Err(NodeError::InvalidChainInfoResponse) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method needs a comprehensive test coverage with each test case clearly showing what we're trying to achieve. I don't think the "requirements"/rules for these are written down anywhere? i.e. It should be clear to the team (and based on the code) when a blob is expected to be in one of the collections (locked_blobs, proposed_blobs, storage).
dc5272b
to
0e33066
Compare
@@ -305,12 +305,12 @@ where | |||
} | |||
|
|||
/// Returns an error if the block requires a blob we don't have. | |||
/// Looks for the blob in: chain manager's pending blobs and storage. | |||
/// Looks for the blob in: chain manager's proposed blobs and storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right:
- We should check that the blobs provided in the proposal are all published by that block (and don't contain duplicates).
- The published blobs that aren't provided in the proposal must be confirmed for us locally.
That's both already the case, is it?
0e33066
to
b85568e
Compare
@@ -305,12 +305,12 @@ where | |||
} | |||
|
|||
/// Returns an error if the block requires a blob we don't have. | |||
/// Looks for the blob in: chain manager's pending blobs and storage. | |||
/// Looks for the blob in: chain manager's proposed blobs and storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so. We create the proposal's blobs based on the blobs being published by that block. We also enforce that all blobs being published must be provided in the proposal, which means they must be in some pending location locally when proposing the block.
Right now, if we're trying to publish a blob again that is already confirmed in storage, we do technically need to add that blob to pending. Later on that'll be optimized away as we will ignore blobs that are already in storage, but maybe we can do that optimization earlier as well. Let me try that.
f178427
to
2e47cbb
Compare
linera-core/src/local_node.rs
Outdated
} | ||
} | ||
|
||
#[allow(clippy::result_large_err)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed? I thought I boxed all the large error fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! It's not, will remove
linera-core/src/local_node.rs
Outdated
.await? | ||
.manager | ||
.get() | ||
.locked_blobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the chain manager now always contain all blobs the block uses, or only the published ones? I think we went with the latter, didn't we? Then we also need to read them from storage here after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chain manager's proposed blobs are always the published ones. The locked blobs have all the used ones, AFAIU. Maybe both should just have the published ones? 🤔
@@ -305,12 +305,12 @@ where | |||
} | |||
|
|||
/// Returns an error if the block requires a blob we don't have. | |||
/// Looks for the blob in: chain manager's pending blobs and storage. | |||
/// Looks for the blob in: chain manager's proposed blobs and storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm still confused: In ChainWorkerState::check_for_unneeded_blobs
, at least, we don't seem to check for duplicates. It would be easy to add, though, by replacing contains
with remove
.
linera-core/src/client/mod.rs
Outdated
round, | ||
*cert, | ||
&key_pair, | ||
manager.locked_blobs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're gonna keep all used blobs in the locked blobs, we actually need to distinct here between published and used locked blobs 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, good point, this should filter them and use only the published ones.
b212fc5
to
8e30f57
Compare
linera-chain/src/manager.rs
Outdated
timeout_vote: None, | ||
fallback_vote: None, | ||
round_timeout, | ||
current_round, | ||
fallback_owners, | ||
pending_blobs: BTreeMap::new(), | ||
}) | ||
} | ||
|
||
/// Returns the most recent vote we cast. | ||
pub fn pending(&self) -> Option<&Vote> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should rename this function, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Will do
linera-chain/src/manager.rs
Outdated
/// These are the published blobs belonging to the locked block. | ||
#[debug(skip_if = Vec::is_empty)] | ||
pub locked_published_blobs: Vec<Blob>, | ||
/// These are the used blobs belonging to the locked block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Maybe remove all the These are
.)
.difference(&proposal.content.block.published_blob_ids()) | ||
.cloned() | ||
.collect::<Vec<_>>(); | ||
let used_blobs = self.state.storage.read_blobs(&used_blob_ids).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could avoid reading those if the proposal doesn't have a validated block.
.collect(); | ||
if !missing_blob_ids.is_empty() { | ||
return Err(WorkerError::BlobsNotFound(missing_blob_ids)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(In theory this should be unreachable, shouldn't it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, yes, but I think if for some weird reason this happens we should definitely at least error out 😅
/// and checks that they are otherwise in storage. | ||
async fn get_blobs_and_checks_storage( | ||
/// Returns the blobs requested by their `blob_ids` that are in the chain manager's locked blobs | ||
/// and check that they are otherwise in storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// and check that they are otherwise in storage. | |
/// and checks that they are otherwise in storage. |
linera-core/src/client/mod.rs
Outdated
.pending_blobs | ||
let manager = chain_state_view.manager.get(); | ||
manager | ||
.locked_published_blobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is unnecessary: At the (only) call site, in ChainClient::propose_block
, we only get into this situation when we already just looked up in the chain manager that we are re-proposing a locked block. In that case, just use the corresponding blobs that are already in the chain manager. Only call read_local_blobs
in the else
case, when we do BlockProposal::new_initial
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
linera-core/src/client/mod.rs
Outdated
.get(&blob_id) | ||
.or_else(|| manager.locked_used_blobs.get(&blob_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this is also unnecessary: We are looking for the published blobs, aren't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should create an issue, though: If the locked block is using blobs that we don't have in storage, we need to find another way to send them to the validator.
(Currently the validator wouldn't accept them, I think, because it expects the blobs in BlockProposal
to be only the published ones?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole maybe_blob
block can be removed I think. Since this will now only be called for a new block proposal, when this maybe_blob
block is removed then that means this will only fail if someone forgets to call add_pending_blobs
to add this blob to the ChainClient
's pending blobs, which is the expected behavior, is it not?
round, | ||
*cert, | ||
&key_pair, | ||
manager.locked_published_blobs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, we're already doing that anyway! Then the change to read_local_blobs
isn't necessary, and the call should move into the else
branch here, shouldn't it?
de5ae14
to
3b55663
Compare
linera-chain/src/manager.rs
Outdated
@@ -385,36 +389,46 @@ impl ChainManager { | |||
Ok(Outcome::Accept) | |||
} | |||
|
|||
/// If the validated block certificate is more recent, return that certificate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// If the validated block certificate is more recent, return that certificate | |
/// If the validated block certificate is more recent, returns that certificate |
linera-chain/src/manager.rs
Outdated
@@ -385,36 +389,46 @@ impl ChainManager { | |||
Ok(Outcome::Accept) | |||
} | |||
|
|||
/// If the validated block certificate is more recent, return that certificate | |||
/// so that the locked block can be updated. | |||
pub fn maybe_update_locked( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't actually update the locked block. Maybe validated_block_if_newer_than_locked
?
linera-chain/src/manager.rs
Outdated
validated_block_certificate: Option<LiteCertificate<'static>>, | ||
executed_block: ExecutedBlock, | ||
) -> Option<Certificate> { | ||
if let Some(lite_cert) = validated_block_certificate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Maybe return early.)
if let Some(lite_cert) = validated_block_certificate { | |
let lite_cert = validated_block_certificate?; |
} | ||
} | ||
} | ||
let ProposalContent { round, .. } = proposal_content; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? Now we pass in the same block twice, basically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now instead of building the ExecutedBlock
here we do it outside the function. This was done to make the whole maybe_update_locked
work and not pass used_blobs
that won't be used here. Maybe there's a better way? This was the best I could think of.
I could also pass in the round, the content and the blobs as separate arguments, instead of the whole proposal
Nvm, can't do that 🤔 Open to suggestions on this!
linera-core/src/remote_node.rs
Outdated
/// are still pending in the validator's chain manager. Returns `None` if not all of them | ||
/// could be found. | ||
/// Checks that the given blobs are present in the provided lists of locked blobs from the | ||
/// chain manager. If they're not there, try to download the missing ones from this validator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// chain manager. If they're not there, try to download the missing ones from this validator. | |
/// chain manager. If they're not there, tries to download the missing ones from this validator. |
linera-core/src/local_node.rs
Outdated
@@ -71,6 +73,12 @@ pub enum LocalNodeError { | |||
|
|||
#[error("Blobs not found: {0:?}")] | |||
BlobsNotFound(Vec<BlobId>), | |||
|
|||
#[error("A validator requested blobs that are not required: {0:?}")] | |||
UnrequiredBlobsRequested(Vec<BlobId>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then that's not a LocalNodeError
but a NodeError
. I tried to move these things to the right place in #2895; let's not mix them up again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right 🤦🏻♂️ my bad! We do the check in the local node but it is related to the validator. Was trying to code past the brain mush stage, seems like it was a bad idea 😅
linera-core/src/local_node.rs
Outdated
UnrequiredBlobsRequested(Vec<BlobId>), | ||
|
||
#[error("Blobs requested by validator contain duplicates")] | ||
DuplicateBlobsRequested, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Actually, even added exactly those two error variants to NodeError
in #2895. Please check if we're doing these tests already, and don't re-introduce them to local_node
.
74ceba0
to
73fc36f
Compare
73fc36f
to
523ac87
Compare
return Err(NodeError::LocalError { | ||
error: LocalNodeError::BlobsNotFound(missing_blob_ids).to_string(), | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one should be fine, right? The validator was missing some blobs, then we try to find them in the local node, and some are also missing from the local node, so we return a local error 🤔
Motivation
We were adding stuff to the chain manager's pending blobs, but never removing them. We were also not properly associating the lifecycle of the pending blobs with the
locked
andproposed
blocksProposal
Associate the lifecycle of the pending blobs with the
locked
andproposed
blocks, separatelyTest Plan
CI + will add some tests
Release Plan
devnet
branchtestnet
branch