You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Bug]: [Java BQ FILE_LOADS] When streaming to dynamic destinations with copy jobs and CREATE_IF_NEEDED, only the first destination's table is created
#28309
Closed
2 of 15 tasks
ahmedabu98 opened this issue
Sep 5, 2023
· 4 comments
Was testing FILE_LOADS streaming writes and found that when dynamic destinations are set and copy jobs are used (ie. large data) and CREATE_IF_NEEDED is set, only the first table is created. For example, if I'm writing to two tables A and B, it becomes a race condition on which copy job is seen first in the pipeline. If copy job to table A is performed first, then table A will be created and all subsequent copy jobs to table B will fail with an error similar to the following:
The general idea is after the first pane, we set appropriate create and write dispositions so that subsequent jobs don't overwrite previous data. However here, c.pane().isFirst() in streaming is only true for the first copy job. Subsequent copy jobs seem to appear in different panes (maybe because of this GBK). This results in Beam setting CREATE_NEVER disposition on everything after the first copy job, even if its the first job for a particular destination. BigQuery tries to copy into a non-existent table and instead of creating the table it throws the error mentioned above.
I don't think a test was ever created for the changes in that PR, so I can't tell
But I see that the solution in that PR was not fully extended to the multiple partitions path. I can try implementing it there as well. Thanks @Abacn!
ahmedabu98
changed the title
[Bug]: [Java BQ FILE_LOADS] When streaming to dynamic destinations with copy jobs and CREATE_IF_NEEDED, only the first table is created
[Bug]: [Java BQ FILE_LOADS] When streaming to dynamic destinations with copy jobs and CREATE_IF_NEEDED, only the first destination's table is created
Sep 15, 2023
What happened?
Was testing FILE_LOADS streaming writes and found that when dynamic destinations are set and copy jobs are used (ie. large data) and CREATE_IF_NEEDED is set, only the first table is created. For example, if I'm writing to two tables A and B, it becomes a race condition on which copy job is seen first in the pipeline. If copy job to table A is performed first, then table A will be created and all subsequent copy jobs to table B will fail with an error similar to the following:
What we would expect instead is for all tables to be created.
P.S. not seeing this behavior in batch mode
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: