You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just as org.apache.beam.sdk.io.gcp.bigquery.WriteResult.getFailedInserts() allows a user to collect failed writes for downstream processing (e.g., sinking the records into some kind of deadletter store), could the results of a BigQueryIO.read(SerializableFunction) be collected, allowing a user to access TableRows that were not able to be parsed by the provided function , for the purpose of downstream processing (e.g., some kind of deadletter handling).
In our use case, all data loaded into our Apache Beam pipeline must meet a specified schema, where certain fields are required to be non-null. It would be ideal to collect records that do not meet the schema to output them to some kind of deadletters store.
Our current implementation requires us to use the slower BigQueryIO.ReadTableRows() and then attempt, in a subsequent transform, to parse these TableRows into a custom typed object, outputting any failures to a side output for downstream processing. This isn't incredibly cumbersome, but it would be a nice feature of the connector itself.
Imported from Jira BEAM-11919. Original Jira may contain additional context.
Reported by: jacquelynwax.
The text was updated successfully, but these errors were encountered:
Just as org.apache.beam.sdk.io.gcp.bigquery.WriteResult.getFailedInserts() allows a user to collect failed writes for downstream processing (e.g., sinking the records into some kind of deadletter store), could the results of a BigQueryIO.read(SerializableFunction) be collected, allowing a user to access TableRows that were not able to be parsed by the provided function , for the purpose of downstream processing (e.g., some kind of deadletter handling).
In our use case, all data loaded into our Apache Beam pipeline must meet a specified schema, where certain fields are required to be non-null. It would be ideal to collect records that do not meet the schema to output them to some kind of deadletters store.
Our current implementation requires us to use the slower BigQueryIO.ReadTableRows() and then attempt, in a subsequent transform, to parse these TableRows into a custom typed object, outputting any failures to a side output for downstream processing. This isn't incredibly cumbersome, but it would be a nice feature of the connector itself.
Imported from Jira BEAM-11919. Original Jira may contain additional context.
Reported by: jacquelynwax.
The text was updated successfully, but these errors were encountered: