-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Task]: Update the minor version of google-cloud-storage library prior to Beam release. #27326
Comments
2.50 release manager here. Please complete work and get it into the main branch in that time, or move this issue to the 2.51 Milestone: https://github.com/apache/beam/milestone/15 |
Are there any plans to have |
@BjornPrime @cojenco did you by chance already discuss that Apache Beam has a dependency on Batch._responses ? Would it be possible to include it in a public api? It would be good to not have a risk of accidental breakages and additional overhead to change every minor version update if we can avoid it. |
2.51.0 release manager here. The release branch has already been cut. I might accept a cherrypick of this if it is vital, but dep upgrades are always high risk for making a release unusable. |
Pushing the version ceiling to 2.11 will be needed if everything goes well with the GCS client migration (#28079) and we decide to cherry-pick that into the release. Otherwise, it can wait until next release. |
@BjornPrime do you have an update on:
|
I discussed that with my contact on the GCS team and they didn't seem to consider it a priority but I could push harder on it. It would make things easier for us and seems like an easy change (though I don't want to speak with certainty on that). |
This has been punted for a few versions without any work AFAICT. I'm going to remove it from the release milestone. It can be re-added to another milestone if it's deemed release blocking (which feels unlikely as a P3). |
.take-issue |
I just checked for the latest gcsio (after the recent migration), we still rely on the internal variable
I believe we did that because we need to check the responses from running the batch request but the gcs client library does not provide a way to return this information. |
I talked to the team in charge of google-cloud-storage python library. They mentioned that they have put all the feature requests on the batch module on hold since there is a new API for batch operation under development in GCS. This is both good news and bad news for us. On one hand, this means that for a little while we won't see any changes in this module so the way we referencing the private member variable will still work. On the other, when the new API is ready, it is very likely to introduce breaking changes. I have already informed them to let us know when it happens, but at this moment, we don't have any action item. Notice that, an alternative to get rid of referencing the private member variable is to do a hack as follows: responses = None
current_batch = self.client.batch(raise_exception=False)
with current_batch:
# call and put batch operations in queue
# ...
# call finish to get the responses
responses = current_batch.finish(raise_exception=False)
# throw a blank exception so finish() will not be triggered when exiting the with statement.
raise Exception("nothing wrong, throwing an exception so finish() will not be called during __exit__()") I don't think we will go with this hack given the incoming change on the batch module. |
No need to update the library for now. Close this. |
I am fine with closing this as there is no action item for now. However, we will need to keep in mind that any upgrade of google-cloud-storage library may break our gcsio code, given we currently have the dependency of this private member variable. I submitted a feature request to google-cloud-storage: googleapis/python-storage#1214 and will follow up with them. |
@shunping could you add a comment with a link to this issue alongside the dependency in https://github.com/apache/beam/blob/master/sdks/python/setup.py? I'd like to avoid us bumping it without understanding the context. Its also worth noting that we're actually at risk today. It looks like we're pinned to |
in this scenario, will the pipeline fail at runtime or at job submission ? |
note that at runtime, an older version of cloud-storage client will likely be preinstalled in SDK containers. but if a pipeline would fail at job submission and we know it will happen and they won't budge to keep new version it backwards-compatible for us , then it would be better to add a tigher upper bound. |
I don't think this is necessarily true, my takeaway from the above conversation is that it is fairly unlikely for them to break us with a minor version update.
At runtime. So I think the only scenario where we'd run into issues is if they have an extra package (or custom container) that requires Again, I think all of this is pretty low likelihood of happening though. |
I have told the support team to keep me updated when there is any incoming change on the batch module. |
What needs to happen?
The current implementation of GCSIO uses an internal field google.cloud.storage.batch.Batch._responses from the GCS client. This is unlikely to be updated but still is potentially subject to change so the version has an upper limit set to avoid breakages. Please check if a new version of the GCS client has been released and increment the version in setup.py if it's compatible.
We can consider to close this issue when either condition is met:
Until then, don't close this issue, instead, move it to the next release milestone after updating the version in https://github.com/apache/beam/blob/master/sdks/python/setup.py
Issue Priority
Priority: 3 (nice-to-have improvement)
Issue Components
The text was updated successfully, but these errors were encountered: