Turn ChainManager's pending_blobs into proposed/locked specific pending blobs #2813

ndr-ds · 2024-11-04T23:46:40Z

Motivation

We were adding stuff to the chain manager's pending blobs, but never removing them. We were also not properly associating the lifecycle of the pending blobs with the locked and proposed blocks

Proposal

Associate the lifecycle of the pending blobs with the locked and proposed blocks, separately

Test Plan

CI + will add some tests

Release Plan

These changes should be backported to the latest devnet branch
These changes should be backported to the latest testnet branch

ndr-ds · 2024-11-04T23:46:59Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

afck · 2024-11-05T16:16:32Z

linera-chain/src/manager.rs

            requested_locked: None,
+            locked_pending_blobs: BTreeMap::new(),


Would it be better to turn these into Vec<Blob>s? (In the ChainManagerInfo, not in the ChainManager.) The recipient can't trust that the blob ID → blob association is correct anyway.

Sure! Good point

linera-core/src/client/mod.rs

linera-core/src/remote_node.rs

linera-chain/src/manager.rs

linera-core/src/remote_node.rs

linera-core/src/local_node.rs

linera-core/src/chain_worker/state/mod.rs

linera-core/src/client/mod.rs

linera-chain/src/manager.rs

linera-core/src/client/mod.rs

linera-core/src/remote_node.rs

deuszx · 2024-11-08T13:26:10Z

linera-core/src/chain_worker/state/mod.rs

@@ -305,12 +305,12 @@ where
    }

    /// Returns an error if the block requires a blob we don't have.
-    /// Looks for the blob in: chain manager's pending blobs and storage.
+    /// Looks for the blob in: chain manager's proposed blobs and storage.
    async fn check_no_missing_blobs(
        &self,
        required_blob_ids: &HashSet<BlobId>,
    ) -> Result<(), WorkerError> {


Not in this PR but instead of returning the missing_blob_ids as the error, we could instead return Vec<BlobId> from this method and it'd contain all the missing blobs.

And then error out on the caller? I'm not sure I get it 🤔 We have retry logic that relies on the missing blobs being returned as this error, so we can't remove it AFAIU

The caller would check if the returned vector is non-empty and act as required.

linera-chain/src/manager.rs

deuszx · 2024-11-08T13:32:34Z

linera-core/src/chain_worker/state/mod.rs

@@ -305,12 +305,12 @@ where
    }

    /// Returns an error if the block requires a blob we don't have.
-    /// Looks for the blob in: chain manager's pending blobs and storage.
+    /// Looks for the blob in: chain manager's proposed blobs and storage.


IIUC in storage we have blobs from confirmed blocks and in the proposed_blobs we have blobs that haven't been validated even (they'd be part of locked_blobs). Isn't it possible that the blobs from proposed_blobs will be forgotten (if we confirm another proposal)? Yet the caller of this method might be told that we do have all the blobs (which we'll later delete). Isn't that a problem?

blobs that haven't been validated even

(They have been validated by us, but not yet by a quorum, yes.)

I'm also a bit confused, yes.

Whenever we set the proposed block, we need to make sure that the blobs we store with the proposal are exactly the ones published by it. (Whether they were provided by the client or we already had them. But maybe we should just require for now that the client provide them all regardless.)

And analogously for the locked block: We need to store with it all the blobs that it publishes ~~or reads~~! (Read blobs that we already have in storage we can probably omit, since we currently never remove blobs from storage, and since that might end up being too many blobs for the chain manager.)

(Ah, I guess in the second case we never need to store any blobs in the chain manager that aren't being published by this block, because we presumably already asked for and processed a confirmed block that used them?)

I guess now that we don't have proposed_blobs anymore and we just point to the blobs in the BlockProposal, we should be fine here, right? Or am I missing something?

I think you're right:

We should check that the blobs provided in the proposal are all published by that block (and don't contain duplicates).

The published blobs that aren't provided in the proposal must be confirmed for us locally.

That's both already the case, is it?

since we're controlling how it gets created

In a distributed fault-tolerant network, you can never trust what you receive from another node! In general, validators can't trust clients and vice-versa. Concretely, someone could run a modified client or an alternative implementation.

And that method is run on the client side, if I understand correctly, before sending the proposal over the network.

Right, of course, my bad. I'll add the duplicated blobs check!

Actually, on the server side we also always get the blobs for the block either by calling published_blob_ids or required_blob_ids. So I don't think we'll have duplicates.
The question is: should these functions error out if they detect a duplicate? For published_blob_ids I would say maybe, but for required_blob_ids I don't think so

Sorry, I'm still confused: In ChainWorkerState::check_for_unneeded_blobs, at least, we don't seem to check for duplicates. It would be easy to add, though, by replacing contains with remove.

I think I finally understand what you mean 😅 Good point! Will add. I think I'll do it differently though so we can have separate errors for unneeded and duplicated

deuszx · 2024-11-08T14:12:47Z

linera-core/src/local_node.rs

+                Err(NodeError::InvalidChainInfoResponse)
+            }
+        }
+    }


This method needs a comprehensive test coverage with each test case clearly showing what we're trying to achieve. I don't think the "requirements"/rules for these are written down anywhere? i.e. It should be clear to the team (and based on the code) when a blob is expected to be in one of the collections (locked_blobs, proposed_blobs, storage).

linera-chain/src/manager.rs

afck · 2024-11-12T10:16:05Z

linera-core/src/chain_worker/state/mod.rs

@@ -305,12 +305,12 @@ where
    }

    /// Returns an error if the block requires a blob we don't have.
-    /// Looks for the blob in: chain manager's pending blobs and storage.
+    /// Looks for the blob in: chain manager's proposed blobs and storage.


I think you're right:

We should check that the blobs provided in the proposal are all published by that block (and don't contain duplicates).

The published blobs that aren't provided in the proposal must be confirmed for us locally.

That's both already the case, is it?

linera-core/src/client/mod.rs

ndr-ds · 2024-11-12T14:13:39Z

linera-core/src/chain_worker/state/mod.rs

@@ -305,12 +305,12 @@ where
    }

    /// Returns an error if the block requires a blob we don't have.
-    /// Looks for the blob in: chain manager's pending blobs and storage.
+    /// Looks for the blob in: chain manager's proposed blobs and storage.


I believe so. We create the proposal's blobs based on the blobs being published by that block. We also enforce that all blobs being published must be provided in the proposal, which means they must be in some pending location locally when proposing the block.
Right now, if we're trying to publish a blob again that is already confirmed in storage, we do technically need to add that blob to pending. Later on that'll be optimized away as we will ignore blobs that are already in storage, but maybe we can do that optimization earlier as well. Let me try that.

afck · 2024-11-13T17:00:47Z

linera-core/src/local_node.rs

+        }
+    }
+
+    #[allow(clippy::result_large_err)]


Is this still needed? I thought I boxed all the large error fields.

Good point! It's not, will remove

afck · 2024-11-13T17:02:34Z

linera-core/src/local_node.rs

+                        .await?
+                        .manager
+                        .get()
+                        .locked_blobs


Does the chain manager now always contain all blobs the block uses, or only the published ones? I think we went with the latter, didn't we? Then we also need to read them from storage here after all.

The chain manager's proposed blobs are always the published ones. The locked blobs have all the used ones, AFAIU. Maybe both should just have the published ones? 🤔

afck · 2024-11-13T17:07:48Z

linera-core/src/chain_worker/state/mod.rs

@@ -305,12 +305,12 @@ where
    }

    /// Returns an error if the block requires a blob we don't have.
-    /// Looks for the blob in: chain manager's pending blobs and storage.
+    /// Looks for the blob in: chain manager's proposed blobs and storage.


Sorry, I'm still confused: In ChainWorkerState::check_for_unneeded_blobs, at least, we don't seem to check for duplicates. It would be easy to add, though, by replacing contains with remove.

ndr-ds · 2024-11-14T02:09:27Z

linera-core/src/client/mod.rs

+                round,
+                *cert,
+                &key_pair,
+                manager.locked_blobs,


If we're gonna keep all used blobs in the locked blobs, we actually need to distinct here between published and used locked blobs 😅

Right, good point, this should filter them and use only the published ones.

linera-chain/src/manager.rs

afck · 2024-11-14T14:44:23Z

linera-chain/src/manager.rs

            timeout_vote: None,
            fallback_vote: None,
            round_timeout,
            current_round,
            fallback_owners,
-            pending_blobs: BTreeMap::new(),
        })
    }

    /// Returns the most recent vote we cast.
    pub fn pending(&self) -> Option<&Vote> {


We should rename this function, too.

Good catch! Will do

afck · 2024-11-14T14:47:20Z

linera-chain/src/manager.rs

+    /// These are the published blobs belonging to the locked block.
+    #[debug(skip_if = Vec::is_empty)]
+    pub locked_published_blobs: Vec<Blob>,
+    /// These are the used blobs belonging to the locked block.


(Maybe remove all the These are.)

linera-chain/src/manager.rs

afck · 2024-11-14T14:50:15Z

linera-core/src/chain_worker/state/attempted_changes.rs

+            .difference(&proposal.content.block.published_blob_ids())
+            .cloned()
+            .collect::<Vec<_>>();
+        let used_blobs = self.state.storage.read_blobs(&used_blob_ids).await?;


We could avoid reading those if the proposal doesn't have a validated block.

afck · 2024-11-14T14:52:49Z

linera-core/src/chain_worker/state/attempted_changes.rs

+            .collect();
+        if !missing_blob_ids.is_empty() {
+            return Err(WorkerError::BlobsNotFound(missing_blob_ids));
+        }


(In theory this should be unreachable, shouldn't it?)

In theory, yes, but I think if for some weird reason this happens we should definitely at least error out 😅

afck · 2024-11-14T15:05:24Z

linera-core/src/chain_worker/state/mod.rs

-    /// and checks that they are otherwise in storage.
-    async fn get_blobs_and_checks_storage(
+    /// Returns the blobs requested by their `blob_ids` that are in the chain manager's locked blobs
+    /// and check that they are otherwise in storage.


Suggested change

/// and check that they are otherwise in storage.

/// and checks that they are otherwise in storage.

afck · 2024-11-14T15:22:20Z

linera-core/src/client/mod.rs

-                    .pending_blobs
+                let manager = chain_state_view.manager.get();
+                manager
+                    .locked_published_blobs


Actually, this is unnecessary: At the (only) call site, in ChainClient::propose_block, we only get into this situation when we already just looked up in the chain manager that we are re-proposing a locked block. In that case, just use the corresponding blobs that are already in the chain manager. Only call read_local_blobs in the else case, when we do BlockProposal::new_initial.

Good catch!

afck · 2024-11-14T15:23:29Z

linera-core/src/client/mod.rs

                    .get(&blob_id)
+                    .or_else(|| manager.locked_used_blobs.get(&blob_id))


And this is also unnecessary: We are looking for the published blobs, aren't we?

We should create an issue, though: If the locked block is using blobs that we don't have in storage, we need to find another way to send them to the validator.
(Currently the validator wouldn't accept them, I think, because it expects the blobs in BlockProposal to be only the published ones?)

This whole maybe_blob block can be removed I think. Since this will now only be called for a new block proposal, when this maybe_blob block is removed then that means this will only fail if someone forgets to call add_pending_blobs to add this blob to the ChainClient's pending blobs, which is the expected behavior, is it not?

afck · 2024-11-14T15:25:20Z

linera-core/src/client/mod.rs

+                round,
+                *cert,
+                &key_pair,
+                manager.locked_published_blobs,


I see, we're already doing that anyway! Then the change to read_local_blobs isn't necessary, and the call should move into the else branch here, shouldn't it?

linera-core/src/remote_node.rs

afck · 2024-11-15T10:12:49Z

linera-chain/src/manager.rs

@@ -385,36 +389,46 @@ impl ChainManager {
        Ok(Outcome::Accept)
    }

+    /// If the validated block certificate is more recent, return that certificate


Suggested change

/// If the validated block certificate is more recent, return that certificate

/// If the validated block certificate is more recent, returns that certificate

afck · 2024-11-15T10:14:12Z

linera-chain/src/manager.rs

@@ -385,36 +389,46 @@ impl ChainManager {
        Ok(Outcome::Accept)
    }

+    /// If the validated block certificate is more recent, return that certificate
+    /// so that the locked block can be updated.
+    pub fn maybe_update_locked(


This doesn't actually update the locked block. Maybe validated_block_if_newer_than_locked?

afck · 2024-11-15T10:15:18Z

linera-chain/src/manager.rs

+        validated_block_certificate: Option<LiteCertificate<'static>>,
+        executed_block: ExecutedBlock,
+    ) -> Option<Certificate> {
+        if let Some(lite_cert) = validated_block_certificate {


(Maybe return early.)

Suggested change

if let Some(lite_cert) = validated_block_certificate {

let lite_cert = validated_block_certificate?;

afck · 2024-11-15T10:17:38Z

linera-chain/src/manager.rs

-                }
-            }
-        }
+        let ProposalContent { round, .. } = proposal_content;


Why this change? Now we pass in the same block twice, basically.

Now instead of building the ExecutedBlock here we do it outside the function. This was done to make the whole maybe_update_locked work and not pass used_blobs that won't be used here. Maybe there's a better way? This was the best I could think of.
~~I could also pass in the round, the content and the blobs as separate arguments, instead of the whole proposal~~
Nvm, can't do that 🤔 Open to suggestions on this!

afck · 2024-11-15T10:19:21Z

linera-core/src/remote_node.rs

-    /// are still pending in the validator's chain manager. Returns `None` if not all of them
-    /// could be found.
+    /// Checks that the given blobs are present in the provided lists of locked blobs from the
+    /// chain manager. If they're not there, try to download the missing ones from this validator.


Suggested change

/// chain manager. If they're not there, try to download the missing ones from this validator.

/// chain manager. If they're not there, tries to download the missing ones from this validator.

afck · 2024-11-15T10:22:58Z

linera-core/src/local_node.rs

@@ -71,6 +73,12 @@ pub enum LocalNodeError {

    #[error("Blobs not found: {0:?}")]
    BlobsNotFound(Vec<BlobId>),
+
+    #[error("A validator requested blobs that are not required: {0:?}")]
+    UnrequiredBlobsRequested(Vec<BlobId>),


Then that's not a LocalNodeError but a NodeError. I tried to move these things to the right place in #2895; let's not mix them up again.

Right 🤦🏻‍♂️ my bad! We do the check in the local node but it is related to the validator. Was trying to code past the brain mush stage, seems like it was a bad idea 😅

afck · 2024-11-15T10:24:08Z

linera-core/src/local_node.rs

+    UnrequiredBlobsRequested(Vec<BlobId>),
+
+    #[error("Blobs requested by validator contain duplicates")]
+    DuplicateBlobsRequested,


Same here. Actually, even added exactly those two error variants to NodeError in #2895. Please check if we're doing these tests already, and don't re-introduce them to local_node.

…ng blobs

ndr-ds · 2024-11-15T14:48:37Z

linera-core/src/updater.rs

+                return Err(NodeError::LocalError {
+                    error: LocalNodeError::BlobsNotFound(missing_blob_ids).to_string(),
+                });


This one should be fine, right? The validator was missing some blobs, then we try to find them in the local node, and some are also missing from the local node, so we return a local error 🤔

This was referenced Nov 4, 2024

Fix lint CI and split it into multiple jobs #2785

Merged

Split rust CI into more jobs #2786

Merged

This was linked to issues Nov 4, 2024

The pending blobs in the chain manager should be exactly the ones for the locked block. #2704

Open

Fix storage DOS attack in block proposals #2199

Open

ndr-ds force-pushed the 11-01-split_rust_ci_into_more_jobs branch from dfa4908 to 5450a3b Compare November 5, 2024 01:49

ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from 8c70aac to cfc31b3 Compare November 5, 2024 01:49

ndr-ds force-pushed the 11-01-split_rust_ci_into_more_jobs branch from 5450a3b to f288c9f Compare November 5, 2024 12:30

ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from cfc31b3 to 7da2190 Compare November 5, 2024 12:30

ndr-ds force-pushed the 11-01-split_rust_ci_into_more_jobs branch from f288c9f to f1966ea Compare November 5, 2024 14:26

ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from 7da2190 to 5cbadfd Compare November 5, 2024 15:13

ndr-ds changed the base branch from 11-01-split_rust_ci_into_more_jobs to graphite-base/2813 November 5, 2024 15:20

ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from 5cbadfd to fee8cc8 Compare November 5, 2024 15:22

ndr-ds force-pushed the graphite-base/2813 branch from f1966ea to 4aa66d4 Compare November 5, 2024 15:22

ndr-ds marked this pull request as ready for review November 5, 2024 15:22

ndr-ds requested review from afck, deuszx and ma2bd November 5, 2024 15:23

ndr-ds changed the base branch from graphite-base/2813 to main November 5, 2024 15:23

ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch 2 times, most recently from 80f5fc2 to fee8cc8 Compare November 5, 2024 15:23

afck reviewed Nov 5, 2024

View reviewed changes

ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from fee8cc8 to 8205540 Compare November 5, 2024 20:35

ndr-ds requested a review from afck November 5, 2024 20:40

ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from 8205540 to e00276e Compare November 5, 2024 21:02

afck reviewed Nov 6, 2024

View reviewed changes

deuszx reviewed Nov 6, 2024

View reviewed changes

linera-chain/src/manager.rs Outdated Show resolved Hide resolved

deuszx reviewed Nov 6, 2024

View reviewed changes

linera-core/src/client/mod.rs Outdated Show resolved Hide resolved

deuszx reviewed Nov 6, 2024

View reviewed changes

linera-core/src/remote_node.rs Outdated Show resolved Hide resolved

ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from e00276e to dd4bb33 Compare November 6, 2024 19:39

deuszx reviewed Nov 8, 2024

View reviewed changes

linera-chain/src/manager.rs Outdated Show resolved Hide resolved

deuszx reviewed Nov 8, 2024

View reviewed changes

ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch 2 times, most recently from dc5272b to 0e33066 Compare November 12, 2024 04:00

afck reviewed Nov 12, 2024

View reviewed changes

ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from 0e33066 to b85568e Compare November 12, 2024 14:06

ndr-ds commented Nov 12, 2024

View reviewed changes

ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch 4 times, most recently from f178427 to 2e47cbb Compare November 12, 2024 17:32

afck reviewed Nov 13, 2024

View reviewed changes

ndr-ds commented Nov 14, 2024

View reviewed changes

ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch 2 times, most recently from b212fc5 to 8e30f57 Compare November 14, 2024 14:40

ndr-ds commented Nov 14, 2024

View reviewed changes

linera-chain/src/manager.rs Show resolved Hide resolved

afck reviewed Nov 14, 2024

View reviewed changes

ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch 2 times, most recently from de5ae14 to 3b55663 Compare November 15, 2024 02:47

afck requested changes Nov 15, 2024

View reviewed changes

ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch 3 times, most recently from 74ceba0 to 73fc36f Compare November 15, 2024 14:40

Turn ChainManager's pending_blobs into proposed/locked specific pendi…

523ac87

…ng blobs

ndr-ds force-pushed the 11-04-turn_chainmanager_s_pending_blobs_into_proposed_locked_specific_pending_blobs branch from 73fc36f to 523ac87 Compare November 15, 2024 14:47

ndr-ds commented Nov 15, 2024

View reviewed changes

ndr-ds requested review from afck and deuszx November 15, 2024 14:49

		requested_locked: None,
		locked_pending_blobs: BTreeMap::new(),

	/// and check that they are otherwise in storage.
	/// and checks that they are otherwise in storage.

		.get(&blob_id)
		.or_else(\|\| manager.locked_used_blobs.get(&blob_id))

	/// If the validated block certificate is more recent, return that certificate
	/// If the validated block certificate is more recent, returns that certificate

	if let Some(lite_cert) = validated_block_certificate {
	let lite_cert = validated_block_certificate?;

	/// chain manager. If they're not there, try to download the missing ones from this validator.
	/// chain manager. If they're not there, tries to download the missing ones from this validator.

Turn ChainManager's pending_blobs into proposed/locked specific pending blobs #2813

Are you sure you want to change the base?

Turn ChainManager's pending_blobs into proposed/locked specific pending blobs #2813

Conversation

ndr-ds commented Nov 4, 2024

Motivation

Proposal

Test Plan

Release Plan

ndr-ds commented Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afck Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afck Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndr-ds Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndr-ds Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndr-ds commented Nov 4, 2024 •

edited

Loading

afck Nov 8, 2024 •

edited

Loading

afck Nov 12, 2024 •

edited

Loading

ndr-ds Nov 15, 2024 •

edited

Loading

ndr-ds Nov 15, 2024 •

edited

Loading