You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm experimenting with HiveCatalog and noticed that when writing data with nested records, there is data loss. Specifically, the nested records are never committed to the table. I read back null values instead. I can confirm this happens even without our Beam library:
I've tried the same with HadoopCatalog and everything seems to be working fine. I'm not sure why I'm seeing this error with HiveCatalog. There may be a gimmick in how the catalog fetches and returns the table schema.
I believe in this case we shouldn't rely on the catalog, and instead create our writers using the schema of the records in our PCollection. ie. line 70 here:
What happened?
I'm experimenting with HiveCatalog and noticed that when writing data with nested records, there is data loss. Specifically, the nested records are never committed to the table. I read back null values instead. I can confirm this happens even without our Beam library:
I've tried the same with HadoopCatalog and everything seems to be working fine. I'm not sure why I'm seeing this error with HiveCatalog. There may be a gimmick in how the catalog fetches and returns the table schema.
I believe in this case we shouldn't rely on the catalog, and instead create our writers using the schema of the records in our PCollection. ie. line 70 here:
beam/sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/RecordWriter.java
Lines 66 to 70 in 21009e6
Issue Priority
Priority: 1 (data loss / total loss of function)
Issue Components
The text was updated successfully, but these errors were encountered: