Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Apache Iceberg IO connector #20327

Closed
damccorm opened this issue Jun 4, 2022 · 16 comments
Closed

Add Apache Iceberg IO connector #20327

damccorm opened this issue Jun 4, 2022 · 16 comments

Comments

@damccorm
Copy link
Contributor

damccorm commented Jun 4, 2022

Apache Iceberg is an open table format layer on top of different filesystems.

Beam could support it in the form of an IO connector ideally integrating with Beam's own system for SQL support.

Inspiration can be taken from the ongoing effort to support Iceberg on Flink:
apache/iceberg#788
https://docs.google.com/document/d/1idTjQCubhOCea8LkKJ_5aMTV4V_Wf1n0aYhabGsLQZc/edit

Imported from Jira BEAM-10160. Original Jira may contain additional context.
Reported by: iemejia.

@gabrywu
Copy link
Member

gabrywu commented Sep 14, 2022

any updates?

1 similar comment
@yupbank
Copy link

yupbank commented Sep 16, 2022

any updates?

@yeshvantbhavnasi
Copy link

any updates ?

@brucearctor
Copy link
Contributor

@gabrywu , @yupbank , @yeshvantbhavnasi -- nobody is assigned [ therefore potentially not being worked on ], so not clear how quickly this will progress. That said, it is also open, and therefore something that anyone could work on and contribute. Contributions welcome! :-)

@pabloem
Copy link
Member

pabloem commented May 18, 2023

hi folks - I am taking a look at this...

@jotarada
Copy link

Hey folks, is there any plans regarding this?

@kellen
Copy link
Contributor

kellen commented Jul 13, 2023

@pabloem ping from Spotify

@chadlagore
Copy link

Affirm interested in this 👀 cc @lydian

@aromanenko-dev
Copy link
Contributor

CC: @mosche

@iemejia
Copy link
Member

iemejia commented Nov 7, 2023

There was some work on a sink for iceberg by @Fokko
apache/iceberg#1972

@Fokko do you have any info about this, has someone tried to build on your work to get Beam + Iceberg working together?

@Fokko
Copy link
Contributor

Fokko commented Nov 7, 2023

That was some early work, and probably I would just start from scratch. I would take inspiration from the Kafka-connect repository: https://github.com/tabular-io/iceberg-kafka-connect

@gabrywu
Copy link
Member

gabrywu commented Nov 8, 2023

we use iceberg low-level API adding data files directly

@RPasecky
Copy link

Wanted to check in on this, is beam prioritizing this?

@brucearctor
Copy link
Contributor

@RPasecky -- it is less 'beam' prioritizing, and a question of whether any individuals are prioritizing. I'd love for this to exist, but can't devote the time. Hopefully someone will. Contributions welcomed! :-)

@kennknowles
Copy link
Member

Added in #30797 and released in version 2.56.0

@github-actions github-actions bot added this to the 2.57.0 Release milestone May 17, 2024
@Fokko
Copy link
Contributor

Fokko commented May 17, 2024

Awesome! This is a great milestone! Thanks for working on this 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests