Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Iceberg Support #29569

Closed
wants to merge 15 commits into from
Closed

Conversation

byronellis
Copy link
Contributor

This PR adds support for reads and writes via Iceberg to the Java SDK. At the moment this isn't intended to be used as a standalone IO, but rather to be integrated into a forthcoming catalog representation that should make it easier to work with more structured sources.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@brucearctor
Copy link
Contributor

Great -- thanks for digging in on this @byronellis !

@jbonofre jbonofre self-requested a review February 10, 2024 14:43
@kennknowles kennknowles self-requested a review March 4, 2024 16:08
Byron Ellis added 4 commits March 20, 2024 16:19
…'s BatchLoad implementation since this is a pretty close analog. Has the beginnings of dynamic destination support, though doesn't do triggered windows yet (pretty mechanical just haven't done it yet). Successfully writes files and updates the catalog using a keyed pcollection to collect catalog updates. This appears to work much better than just doing it on bundle close, even in test that was causing collisions and performance issues.
…for right now ("failed writes" are really spilled writes not failures)
…o defer conversion to record and eliminates the need to pass through Row
…tifier from table.name(). If it matches the namespace of our catalog, remove the catalog part of the namespace first so things will work properly.
Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label May 21, 2024
@kennknowles
Copy link
Member

This was integrated into #30797 and #30805

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants