-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add skipLines in FileIO.ReadableFile.readFullyAsUTF8String #28728
Conversation
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
Hi, @damondouglas. |
@damondouglas or @shunping-google could you please take a look at this one? |
Hi gudfhr95, I am reviewing your change today and will get back to you shortly. Sorry about the delay. |
Thanks again for submitting your PRs for review! I spent some time discussing with the previous reviewer (@damondouglas) about this PR (#28728) offline and below is our opinion. While there are merits in your changes, we also see certain problems that we will have to address:
In fact, I think FileIO is not the best place for this change. In the design of Beam IO, FileIO should be mainly focusing on getting a readable/writable channel ready. It should not have any assumption on what's inside the file. If you put the header skipping feature in FileIO, you are assuming it is a text-like file that has line breaks. We prefer putting the change in a class derived from FileIO. Just like your original PR (#28502), TextIO will be the best choice as it is for text files and it has already handled different types of delimiters. However, the original PR also introduced some breaking changes to the public functions. Moreover, it will not scale well because it needs to disable isSplittable() when the header skipping feature is used. In summary, we see problems in both PRs and it will be great if we can
There is another PR (#29202) submitted recently on implementing this feature. You can refer to it for some ideas. Thanks, Shunping |
Hi, since #29202 was merged. |
Fixes #17990
Changes
skipLines
inFileIO.ReadableFile.readFullyAsUTF8String
skipLines
inFileIO.ReadableFile.readFullyAsBytes
FileIOTest
toEnclosed.class
to add parameterized testRelated PR
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.