-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annotate airline data with start/end dates #1439
Comments
SPARQL, how do I hate thee? Let me count the ways. # Name, Alias, IATA, ICAO, Callsign, Country
SELECT ?airline ?airlineLabel ?iata ?icao ?callsign ?countryLabel ?countryIso
(xsd:date(?start) AS ?startDate) (xsd:date(?end) AS ?endDate) # format as ISO dates
WHERE
{
?airline wdt:P31 wd:Q46970 . # instance of airline
?airline wdt:P229 ?iata . # IATA code not optional
OPTIONAL{?airline wdt:P230 ?icao .}
OPTIONAL{?airline wdt:P571 ?start .}
OPTIONAL{?airline wdt:P576 ?end .}
OPTIONAL { ?airline wdt:P17 ?country .
?country wdt:P297 ?countryIso }
OPTIONAL{?airline wdt:P432 ?callsign .}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} |
Here are dry runs against prod, both the full output and a grep of rename operations only -- please find bugs! If there's no major/systemic problems I'll pull the trigger in prod tomorrow, which will update 1296 airlines. |
Woo; wikidata. I've pinged some people at Wikimedia Deutschland to let them know about the move to using wikidata :) |
from the rename log... Not reviewed every line, but this stuck out to me:
Not sure we can rename it twice? Or is there some hidden/simplified characters in there not being displayed? Or more likely, two different IATA airline codes for the same "display" name, at least, in the OpenFlights DB? If it is that, we should probably output the IATA and/or the ICAO codes for the airlines in the logs Then we rename something to the old name; which is fine itself, I think.
|
The updated log looks more reasonable though... I guess the grep/rename shows the condensed version. BUT. > MATCH Myanma Airways (UB/UBA, Wikidata) == Myanma Airways (UB/UBA, 3569)
UPDATE airlines SET source='Wikidata', start_date='1989-04-01', end_date='2014-12-01', active='N' WHERE alid=3569
...
>> MATCH WITH RENAME: from Myanma Airways to Union of Burma Airways
> MATCH Union of Burma Airways (UB/UBA, Wikidata) == Myanma Airways (UB/UBA, 3569)
>> UPDATED: field name from Myanma Airways to Union of Burma Airways
UPDATE airlines SET name='Union of Burma Airways', alias='Myanma Airways', start_date='1948-12-15', end_date='1972-12-01' WHERE alid=3569
>> MATCH WITH RENAME: from Union of Burma Airways to Burma Airways
> MATCH Burma Airways (UB/UBA, Wikidata) == Union of Burma Airways (UB/UBA, 3569)
>> UPDATED: field name from Union of Burma Airways to Burma Airways
UPDATE airlines SET name='Burma Airways', alias='Union of Burma Airways', start_date='1972-12-01', end_date='1989-04-01' WHERE alid=3569
>> MATCH WITH RENAME: from Burma Airways to Myanma Airways
> MATCH Myanma Airways (UB/UBA, Wikidata) == Burma Airways (UB/UBA, 3569)
>> UPDATED: field name from Burma Airways to Myanma Airways
UPDATE airlines SET name='Myanma Airways', alias='Burma Airways', start_date='1989-04-01', end_date='2014-12-01' WHERE alid=3569
>> MATCH WITH RENAME: from Myanma Airways to Myanmar National Airlines
> MATCH Myanmar National Airlines (UB/UBA, Wikidata) == Myanma Airways (UB/UBA, 3569)
>> UPDATED: field name from Myanma Airways to Myanmar National Airlines
UPDATE airlines SET name='Myanmar National Airlines', alias='Myanma Airways', start_date='2014-12-01' WHERE alid=3569 And trim out the comments... UPDATE airlines SET source='Wikidata', start_date='1989-04-01', end_date='2014-12-01', active='N' WHERE alid=3569
UPDATE airlines SET name='Union of Burma Airways', alias='Myanma Airways', start_date='1948-12-15', end_date='1972-12-01' WHERE alid=3569
UPDATE airlines SET name='Burma Airways', alias='Union of Burma Airways', start_date='1972-12-01', end_date='1989-04-01' WHERE alid=3569
UPDATE airlines SET name='Myanma Airways', alias='Burma Airways', start_date='1989-04-01', end_date='2014-12-01' WHERE alid=3569
UPDATE airlines SET name='Myanmar National Airlines', alias='Myanma Airways', start_date='2014-12-01' WHERE alid=3569 As they all have the same We can see it makes a few different updates (at least 3 of which are no-ops by the end; not the end of the world, but not the most useful/performant), and will result in a row that (incorrectly) has an end date, for the current name. It will also incorrectly be marked as It kinda looks like we need multiple aliases for an airline (which is probably best served via a separate table, for ease of searchability, rather than storing multiple in one column and the fun that might bring) and can store start/end dates there too... |
I don't think we need aliases: the ideal end state would be a chain of airlines with start/end dates that smoothly cover the entire date range. Unfortunately Wikidata is quite inconsistent here, there's at least three different relations for inactive/followed by/replaced by and particularly for obscure stuff like Myanmar the data quality is poor anyway. And the other problem is that the updater script mostly matches and renames the existing entry, which isn't consistent with this. So I think it's back to the drawing board here. |
The joys of crowd sourcing... Some of this could be unintentional; different people adding different things for different reasons. But they could be different succession types too, to be really unhelpful. Depending on how wide spread the issue is, I could bring it up on the wiki, and see if we can get any normalisation etc. Or just fix them myself.
Aha, so if that was the intention, we don't want to be always updating using the same airline id, but would potentially also mean updating airline ids in existing flights depending on the date... |
Yup, updating the airlines to match dates would have to be retrofitted later. This seems like it should be quite straightforward, although there's still some nuance to eg. picking the passenger airline instead of the cargo operation... |
Airline data on the live site is badly out of date because IATA codes are frequently reused and it is not possible to match airlines to flight dates.
This issue tracks extracting start (creation) and end (dissolution) dates from Wikidata and uploading this to the live DB.
The text was updated successfully, but these errors were encountered: