Skip to content

Commit

Permalink
Remove --impersonate_service_account whenever PipelineOptions are s…
Browse files Browse the repository at this point in the history
…erialized (#32031)

* Remove the impersonate_service_account pipeline option during serialization.

* Update Changes.md
  • Loading branch information
tvalentyn authored Jul 31, 2024
1 parent b61ef75 commit 2824944
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 1 deletion.
3 changes: 2 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@

## Bugfixes

* Fixed X (Java/Python) ([#X](https://github.com/apache/beam/issues/X)).
* Fixed incorrect service account impersonation flow for Python pipelines using BigQuery IOs ([#32030](https://github.com/apache/beam/issues/32030)).

## Security Fixes
* Fixed (CVE-YYYY-NNNN)[https://www.cve.org/CVERecord?id=CVE-YYYY-NNNN] (Java/Python/Go) ([#X](https://github.com/apache/beam/issues/X)).
Expand Down Expand Up @@ -526,6 +526,7 @@ as a workaround, a copy of "old" `CountingSource` class should be placed into a
## Known Issues

* Long-running Python pipelines might experience a memory leak: [#28246](https://github.com/apache/beam/issues/28246).
* Python pipelines using the `--impersonate_service_account` option with BigQuery IOs might fail on Dataflow ([#32030](https://github.com/apache/beam/issues/32030)). This is fixed in 2.59.0 release.


# [2.48.0] - 2023-05-31
Expand Down
14 changes: 14 additions & 0 deletions sdks/python/apache_beam/options/pipeline_options.py
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,20 @@ def __init__(self, flags=None, **kwargs):
self._all_options[option_name] = getattr(
self._visible_options, option_name)

def __getstate__(self):
# The impersonate_service_account option must be used only at submission of
# a Beam job. However, Beam IOs might store pipeline options
# within transform implementation that becomes serialized in RunnerAPI,
# causing this option to be inadvertently used at runtime.
# This serialization hook removes it.
if self.view_as(GoogleCloudOptions).impersonate_service_account:
dict_copy = dict(self.__dict__)
dict_copy['_all_options'] = dict(dict_copy['_all_options'])
dict_copy['_all_options']['impersonate_service_account'] = None
return dict_copy
else:
return self.__dict__

@classmethod
def _add_argparse_args(cls, parser):
# type: (_BeamArgumentParser) -> None
Expand Down
1 change: 1 addition & 0 deletions website/www/site/content/en/blog/beam-2.49.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ For more information on changes in 2.49.0, check out the [detailed release notes

* Long-running Python pipelines might experience a memory leak: [#28246](https://github.com/apache/beam/issues/28246).
* Python SDK's cross-language Bigtable sink mishandles records that don't have an explicit timestamp set: [#28632](https://github.com/apache/beam/issues/28632). To avoid this issue, set explicit timestamps for all records before writing to Bigtable.
* Python pipelines using the `--impersonate_service_account` option with BigQuery IOs might fail on Dataflow ([#32030](https://github.com/apache/beam/issues/32030)). This is fixed in 2.59.0 release.


## List of Contributors
Expand Down

0 comments on commit 2824944

Please sign in to comment.