Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DropGuard to ensure future Completion #1300

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

leodziki
Copy link

@leodziki leodziki commented Aug 31, 2024

Description

To manage state transitions safely across await boundaries, we introduced a DropGuard wrapper. This ensures that even if a future is dropped before completion, it will continue executing in the background via tokio::spawn. This approach allows us to avoid the pitfalls of incomplete state transitions in critical futures without the overhead of spawning all futures immediately.

The DropGuard checks if the inner future has completed; if not, it safely spawns the future in the background during its drop. This solution strikes a balance between stack-based future management and the necessity to complete crucial futures.

Fixes #1156

Type of change

  • Breaking change (fix or feature that would cause existing functionality to
    not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

cargo test --all and bazel test //... passes as expected.

Checklist

  • Updated documentation if needed
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

@leodziki
Copy link
Author

Hope to review the implementation of DropGuard soon
cc: @allada, @MarcusSorealheis

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r1.
Reviewable status: 0 of 2 LGTMs obtained, and 1 of 5 files reviewed, and 8 discussions need to be resolved


nativelink-util/src/drop_guard.rs line 1 at r1 (raw file):

use std::future::Future;

nit: Needs copyright.


nativelink-util/src/drop_guard.rs line 5 at r1 (raw file):

use std::task::{Context, Poll};

pub struct DropGuard<F: Future> {

nit: F: Future + Send + 'static because our future must be able to be transferred into a spawn


nativelink-util/src/drop_guard.rs line 5 at r1 (raw file):

use std::task::{Context, Poll};

pub struct DropGuard<F: Future> {

nit: Rename to DropProtectedFuture (and rename file).


nativelink-util/src/drop_guard.rs line 6 at r1 (raw file):

pub struct DropGuard<F: Future> {
    future: Option<Pin<Box<F>>>,

nit: We should be able to do this with a F. We should not need to Box and cannot Pin here (because we may xfer our future to a spawn).


nativelink-util/src/drop_guard.rs line 10 at r1 (raw file):

impl<F: Future> DropGuard<F> {
    pub fn new(future: F) -> Self {

We need to be careful that we don't loose our global nativelink Context here.

  1. We need to have the user pass in a span here.
  2. The future that we store needs to be wrapped inContextAwareFuture::new (plus the span the user passes in).
  3. (see below in Drop call)

nativelink-util/src/drop_guard.rs line 30 at r1 (raw file):

            }
        } else {
            panic!("Future already completed");

nit: use unreachable


nativelink-util/src/drop_guard.rs line 39 at r1 (raw file):

        if let Some(future) = self.future.take() {
            // Block on the future to ensure it completes.
            futures::executor::block_on(future);

This is unsafe, instead just forward it into a spawn!() and do not wait for the future.

Normally we'd require a background_spawn!(), but in this case we want to use spawn! directly and suppress the linter. If we use background_spawn! we'll create double work, because ContextAwareFuture handles all the context stuff we need to be careful with.


nativelink-util/src/evicting_map.rs line 438 at r1 (raw file):

            };

            let fut = state.put(key, eviction_item);

nit: We don't want to do this here, we want to make this entire future/function call protected. This can be done by rewriting the function signature to:

    fn inner_insert_many(
        &self,
        mut state: &mut State<K, T>,
        inserts: impl IntoIterator<Item = (K, T)>,
        seconds_since_anchor: i32,
    ) -> impl Future<Output=Vec<T>> + Send {
        DropSafeFuture::new(async move {
          // ...
        })

We introduced a `DropProtectedFuture` wrapper to manage state transitions
safely across `await` boundaries. It ensures that the future will continue
executing in the background via `tokio::spawn` even if a future is dropped
before completion.
Copy link
Author

@leodziki leodziki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 LGTMs obtained, and 0 of 7 files reviewed, and pending CI: Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, NativeLink.com Cloud / Remote Cache (Legacy Dockerfile Test), Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (22.04), integration-tests (22.04), macos-13, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, and 5 discussions need to be resolved (waiting on @allada)


nativelink-util/src/evicting_map.rs line 447 at r2 (raw file):

                    };

                    if let Some(old_item) = state.put(key, eviction_item).await {

I'm submitting this WIP to clarify unclear points.

As we see, there is self and state referenced here & mutated n this async move {} block.

We are gonna implement Send + static` trait on the Future itself.

How shall we handle this?

Should we change this functions parameter as Arc<Mutex> for state and add Arc<Self> as an additional parameter for self?
Or should we implement Clone trait on self and state?

Waiting for clarifying further walks from this.

Thanks.


nativelink-util/src/evicting_map.rs line 450 at r2 (raw file):

                        replaced_items.push(old_item);
                    }
                    state.sum_store_size += new_item_size;

Here.


nativelink-util/src/evicting_map.rs line 452 at r2 (raw file):

                    state.sum_store_size += new_item_size;
                    state.lifetime_inserted_bytes.add(new_item_size);
                    self.evict_items(state.deref_mut()).await;

Here.


nativelink-util/src/drop_guard.rs line 10 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

We need to be careful that we don't loose our global nativelink Context here.

  1. We need to have the user pass in a span here.
  2. The future that we store needs to be wrapped inContextAwareFuture::new (plus the span the user passes in).
  3. (see below in Drop call)

Done.


nativelink-util/src/drop_guard.rs line 39 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

This is unsafe, instead just forward it into a spawn!() and do not wait for the future.

Normally we'd require a background_spawn!(), but in this case we want to use spawn! directly and suppress the linter. If we use background_spawn! we'll create double work, because ContextAwareFuture handles all the context stuff we need to be careful with.

Done.

@leodziki
Copy link
Author

Waiting for opinion, @allada

@leodziki leodziki requested a review from allada September 16, 2024 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EvictingMap manages state between await boundaries
2 participants