Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Iceberg Support for Python #31830

Closed
1 of 16 tasks
asheeshgarg opened this issue Jul 10, 2024 · 3 comments
Closed
1 of 16 tasks

[Feature Request]: Iceberg Support for Python #31830

asheeshgarg opened this issue Jul 10, 2024 · 3 comments

Comments

@asheeshgarg
Copy link

What would you like to happen?

As of pyiceberg release of 0.6.0 pyiceberg gives the read write capability to read and write data to iceberg table format in Python. It will be good if we have a Source/Sink support for Iceberg in Python SDK.

Issue Priority

Priority: 3 (nice-to-have improvement)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@liferoad
Copy link
Collaborator

This should be available through Managed IO for Python later. @ahmedabu98

@liferoad
Copy link
Collaborator

#31495 FYI

@ahmedabu98
Copy link
Contributor

@asheeshgarg just merged ##31495. See this example for how to use (note that input PCollection elements are Beam Rows)

import apache_beam as beam

with beam.Pipeline() as p:
    (p 
     | beam.Create([beam.Row(...), beam.Row(...)]) 
     | beam.managed.Write(
        "iceberg",
        config={
            "table": "namespace.table",
            "catalog_name": "test-catalog",
            "catalog_properties": {
                "catalog-impl": "org.apache.iceberg.hadoop.HadoopCatalog",
                "warehouse": "path/to/warehouse". # local file system and GCS are supported
            }}))

@github-actions github-actions bot added this to the 2.61.0 Release milestone Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants