Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task]: Improve how to handle the Dataflow-specific option impersonateServiceAccount for Beam Java #30301

Open
2 of 16 tasks
liferoad opened this issue Feb 13, 2024 · 1 comment

Comments

@liferoad
Copy link
Collaborator

liferoad commented Feb 13, 2024

What needs to happen?

impersonateServiceAccount should be kept when submitting Dataflow jobs but should be removed when creating Dataflow workers per the design. To fix this, #30283 put a simple solution to remove the impersonateServiceAccount key from the JSON pipeline options. This introduces some Dataflow-specific concepts, which could be improved by moving it to the Dataflow-specific module. See more details in this comment.

Open this issue to track this potential task to improve how to handle Dataflow-specific options in the future.

Note for Beam Python, we remove this option from the internal Dataflow apiclient module

Issue Priority

Priority: 3 (nice-to-have improvement)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@liferoad liferoad changed the title [Task]: Improve how to handle the Dataflow-specific option impersonateServiceAccountfor Beam Java [Task]: Improve how to handle the Dataflow-specific option impersonateServiceAccount for Beam Java Feb 13, 2024
@kennknowles
Copy link
Member

For this particular option, the dataflow service (the UW) should be the place where you remove the option.

The Python SDK is a real mess when it comes to isolating non-GCP and GCP things. It is not a good place to use as an example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants