-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Datastore's Read fails Pipeline validation and throws on DataflowRunner #28034
Comments
@rafaelsms Ignore what I said before. I too was seeing the same error as you. After spending a few hours banging my head on the wall, I discovered it has to do with the way I am building and deploying my Beam application on GCP Dataflow. |
@haffar that is a strange solution, I too was building a fat jar, so I will try running it through Gradle on monday! Thank you for spending the time to tell me :) |
I can't do that in our setup, the jar is built and moved around for later use. Looking forward for any other alternative if there is one. |
Just ran into this issue as well. @haffar , can you share more of the insights you gained during your analysis? Does it have to do with all the file duplicates when packaging the fat jar? According to any examples I found, you have to set the duplicatesStrategy to "EXCLUDE", which seems a little dirty anyway. So maybe we can exclude certain files or dependencies to work around this issue and get the right version of the problematic files into the fat jar. I just don't understand enough of the Apache Beam runners yet to identify any file that might be the cause. |
@christopherfrieler unfortunately I am in the same boat. This was my first attempt at a beam project, and I figured a fat jar would be the way to go, but apparently not. I did not know about setting the duplicateStrategy to "Exclude", but I just tried it and it did not make a difference. |
@haffar: I can assure you that building fat jars and deploying to Dataflow worked fine at least up until v2.45.0. I hit this issue when I upgraded to 2.49.0, and it seems to be this URN validation thats gets in the way somehow. I had to manage with deploying with gradle for now as I didn't want to stick with the older versions. |
@henrihs We had the same problem with gradle's shadowJar task that is used to create the fat jar. And it's resolved by adding mergeServiceFiles(): |
Using Gradle Shadow Plugin with However, this requires an additional plugin. If the service files wouldn't collide, the regular JarTask could do the job. And the example at https://github.com/apache/beam-starter-kotlin/blob/main/app/build.gradle.kts is also outdated. |
Update: This still seems to be an issue. I am unable to deploy a flex template using beam The fact that the Gradle approach works specifically with the shadow plugin makes me wonder if there's some service file or other resource that maven shadow handles differently and is getting squashed? I'm not familiar enough with the dataflow codebase to be able to easily find which one, though. If anybody has any thoughts I'd be happy to do further investigation. |
What happened?
Hello!
Sorry if this is a duplicate. To be honest, I don't know much about Apache Beam and Dataflow, so I am still learning and might be doing something wrong, let me know :)
I attempted to create a somewhat minimal example below. The exception is thrown by a pipeline that just reads an entry from Datastore when running on
DataflowRunner
. It works and succeeds when usingDirectRunner
.The validation code was added in the PR #26675, throwing at this line.
Example code:
Exception thrown:
Other log related to this unique name:
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: