-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gcsio migration #29360
Gcsio migration #29360
Conversation
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
a94b514
to
7b5367d
Compare
Assigning reviewers. If you would like to opt out of this review, comment R: @liferoad for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
I am going to do some more tests before it is merged. Will let you know |
ec18fcd
to
1d66d36
Compare
stop reviewer notifications |
Stopping reviewer notifications for this pull request: requested by reviewer |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #29360 +/- ##
==========================================
- Coverage 38.32% 37.85% -0.47%
==========================================
Files 694 690 -4
Lines 102373 101305 -1068
==========================================
- Hits 39235 38354 -881
+ Misses 61546 61359 -187
Partials 1592 1592
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
The two commits are merged into one: * Reapply "Replace StorageV1 client with GCS client (apache#28079)" (apache#28721) * added project parameter to apiclient
1d66d36
to
acb57f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved! lets hope this attempt and our safeguards are enough!
* Cherry pick two previous commits on migrating gcs client The two commits are merged into one: * Reapply "Replace StorageV1 client with GCS client (apache#28079)" (apache#28721) * added project parameter to apiclient * Initialze storage client with project from pipeline option. --------- Co-authored-by: Bjorn Pedersen <[email protected]>
@@ -352,27 +352,20 @@ def delete(self, paths): | |||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is line 350 comment still relevant? The directories are not getting deleted recursively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCS is currently a flat system. I think it will match and delete all the files with that prefix, which is equivalent to delete the files recursively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm..I am running few example and still the files within the matched prefix are not getting deleted.
For example,
I have path=gs://anandinguva-test/artifacts/53b617/
and when I call GCSFileSystem(options).delete([path])
, I expect it to delete the directories/buckets and files/objects within the path
. Maybe we can clarify this in comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apache/beam/pull/29477/files - this follows a pattern to local filesystem. Would this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you debugged the original code? I am curious of which part doesn't work. I guess it is the file matching part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the matching part. When a directory is provided, we append *
.
- expected - all the dirs and objects should be deleted.
- Actual - only objects are getting deleted.
Solution - let's use bucket.list_buckets()
and match the path with the provided dirs and delete them. I can spin up a PR if this sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. Thanks for working on this! Please go ahead to create a PR.
This value was changed in PR apache#29360 to 1000, which led to internal test failure.
This value was changed in PR #29360 to 1000, which led to internal test failure.
This value was changed in PR apache#29360 to 1000, which led to internal test failure.
This is the continuation of gcsio migration work. The majority work has been done by @BjornPrime, and here I am fixing a few edge cases and trying to make sure all tests are passed before submission.
fixes #25676
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.