Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

Google Cloud Dataflow removing accents and special chars with '??' #647

Open
turboT4 opened this issue Oct 7, 2019 · 1 comment
Open

Comments

@turboT4
Copy link

turboT4 commented Oct 7, 2019

This is going to be quite a hit or miss question as I don't really know which context or piece of code to give you as it is a situation of it works in local, which does!

The situation here is that I have several services, and there's a step where messages are put in a PubSub topic awaiting for the Dataflow consumer to handle them and save as .parquet files (I also have another one which sends that payload to a HTTP endpoint).

The thing is, the message in that service prior sending it to that PubSub topic seems to be correct, Stackdriver logs show all the chars as they should be.

However, when I'm going to check the final output in .parquet or in the HTTP endpoint I just see, for example h?? instead of hí, which seems pretty weird as running everything in local makes the output be correct.

I can only think about encoding server-wise when deploying the Dataflow as a job and not running in local, or in any other services.

Hope someone can shed some light in something this abstract.

We're running SDK 2.9.0 (Beam 2.9.0), if that's something relevant also.

@turboT4
Copy link
Author

turboT4 commented Oct 7, 2019

Just did yet another quick try upgrading Beam to 2.15.0 and same thing happens. Running Dataflow locally the parquet file is generated without ?? and all characters are there, but whenever I deploy with gcloud beta then ?? appear within the parquet files.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant