-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maintainability of WikimediaLanguageCodes.java #413
Comments
Indeed. It's very painful to maintain. +1 for the SPARQL query. |
Of course the difficult bit is to import the existing data, because Wikidata does not contain all the Wikimedia language codes to date… I have started doing this, creating items such that https://www.wikidata.org/wiki/Q64363007. |
I'm not sure it's useful to import all language codes. New language code should follow BCP 47 so properly formatting them should be enough for converting most of language tags. We could then have a dictionnary of the exception extracted from a SPARQL queries or hardcoded in WikidataToolkit. |
Makes sense! Currently I am actually using this mapping in OpenRefine to check whether a Wikimedia language code exists at all, so for this application I do need completeness… But it's not what this class is intended for. So I might as well just store the allowed language codes there directly. |
If you need the full list. Let's keep it there. There is no point of having to write the conversion code if you already have to maintain the list of language tags. |
Yeah, but after all I am not exactly sure how the existing mapping was constructed, so it is not clear to me that I can safely import that in Wikidata. If I don't import this, then some mappings will disappear if the data becomes generated from SPARQL, so it might be a regression. I still think it could be a good idea to maintain this in Wikidata, but not knowing the specifics of these different language codes and the application @mkroetzsch had in mind when writing this, I will abstain from this for now. I will rather store the list of valid language codes for terms and monolingual text directly. |
The class WikimediaLanguageCodes contains a hand-crafted mapping between Wikimedia language codes and IETF language codes.
Maintaining this mapping there is not ideal: this requires curation effort, and this database is not easy to reuse for people who need it outside the Java ecosystem. This begs the question: Maybe we could use a sort of generic collaborative open data project to maintain a mapping between identifier schemes??? What on earth could that project be?
… we need to push the existing data to Wikidata and update the mapping periodically by running a SPARQL query. How meta!
Example query: https://w.wiki/4Vy
The text was updated successfully, but these errors were encountered: