Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Artifact materialization can cause a race when multiple callers materialize the same artifact in the same destination. #28605

Open
3 of 15 tasks
tvalentyn opened this issue Sep 21, 2023 · 9 comments

Comments

@tvalentyn
Copy link
Contributor

tvalentyn commented Sep 21, 2023

What happened?

Beam SDK containers use the below helper to download artifacts.

func Materialize(ctx context.Context, endpoint string, dependencies []*pipepb.ArtifactInformation, rt string, dest string) ([]*pipepb.ArtifactInformation, error) {

When a worker instantiates multiple SDK containers that share a destination for staged artifacts (--semi_persist_dir:

semiPersistDir = flag.String("semi_persist_dir", "/tmp", "Local semi-persistent directory (optional).")
), a race can occur: while one container writes a staged artifact into a staging location, another container might not be able to read it, even though the artifact was previously materialized.

This race was observed in Dataflow Python pipelines that didn't use a sibling SDK worker protocol.

We can consider using a file lock during materialization, and possibly not re-download artifacts with the same SHA.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@tvalentyn
Copy link
Contributor Author

cc: @lostluck

@lostluck
Copy link
Contributor

This seems like a Python specific issue, as it does multiple processes or multiple containers on a worker VM. The other SDKs (Go and Java at least) will only have a single boot cycle to to download from the artifact repo. It's hard to block race conditions across separate processes that can't meaningfully communicate.

It's also hard to know if that semi-persist directory is actually shared or not between other workers.

In more actionable commentary, that "don't download if the sha matches" is probably a good idea, since we can check for existing files, and if it exists with with a matching SHA, then it's the expected file.
If not, it might be either wrong, or in progress of being downloaded....

Whomever works on this needs to take that into consideration.

@kennknowles
Copy link
Member

I'm gonna say that a process writing to a share location had better assume there might be other processes that want that location, even on accident. The two choices are: (1) generate a fresh location that no one else is going to conflict with (a la mktemp -d) or (2) use a location intentionally shared. I feel like in this case intentional sharing is what you want, to download once and use by all.

@kennknowles
Copy link
Member

Which is just me saying the same thing but from the peanut gallery objecting to processes working in a way that makes assumptions about how they'll be used, such as an SDK harness assuming it has a particular relationship to docker containers and VMs. That's just pain down the line.

@kennknowles
Copy link
Member

I think this was on my radar due to the release. Is it common enough that it makes the sibling process feature unusable, or can we defer to 2.52.0? (I know there is already not a milestone attached - I am inviting you to attach 2.51.0 if it would have a huge negative impact on users)

@tvalentyn
Copy link
Contributor Author

sibling worker protocol has been reenabled now, and with sibling workers, this issue doesn't happen.

@tvalentyn
Copy link
Contributor Author

At least in Dataflow runner.

@kennknowles
Copy link
Member

Can you point to a change that addressed this? Is it already in 2.51.0? (apologies if this is already answer and I missed it or forgot)

@tvalentyn
Copy link
Contributor Author

tvalentyn commented Oct 3, 2023

a change that is now fully rolled out in Dataflow runner fully enables sibling sdk worker protocol, thereby removing the failure mode in default configuration for dataflow - Dataflow now starts only one Python container instead of many, and since artifacts are downloaded only once per worker machine, there is no race. it is a runner-controlled change.

@tvalentyn tvalentyn added P3 and removed P2 labels Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants