Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wikidata query for wikipedia articles #6

Open
juba opened this issue Jun 12, 2024 · 3 comments
Open

Wikidata query for wikipedia articles #6

juba opened this issue Jun 12, 2024 · 3 comments

Comments

@juba
Copy link
Member

juba commented Jun 12, 2024

For new frontend, rework wikidata queries to get metadata + list of available wikipedia pages in different languages starting from ncbi taxid.

For the moment the reference query is:

SELECT * 
WHERE { ?item p:P685 ?statement0.
       OPTIONAL{?item wdt:P627 ?iucn.} 
       OPTIONAL{?item wdt:P846 ?gbif.} 
       OPTIONAL{?item wdt:P3151 ?inaturalist.} 
       OPTIONAL{?item wdt:P9157 ?openTreeOfLife.} 
       OPTIONAL{?item wdt:P10585 ?catalogueOfLife.} 
       OPTIONAL{?item wdt:P141 ?iucnStatus.} 
       ?item p:P685 ?ncbi.
       ?statement0 (ps:P685) "9615". 
       ?article schema:about ?item . 
       ?article schema:isPartOf "https://en.wikipedia.org/" . 
       SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } }

NCBI ids to test : 9615 for canis lupus familiaris and Neanderthal

https://query.wikidata.org

@juba
Copy link
Member Author

juba commented Jul 15, 2024

The following triple request allows to get wikipedia pages that either:

  • talk about a taxid
  • talk about an item having usage of this taxid
  • talk about the label with the taxid sciname (for when the species doesn't have a taxid in wikidata)
SELECT DISTINCT * WHERE {
  {
    SELECT DISTINCT ?article ?lang WHERE {
      ?taxid ps:P685 "63221". 
      ?speciesId p:P685 ?taxid.
      ?article schema:about ?speciesId.
      ?article schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] . 
      ?article schema:inLanguage ?lang .
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
    }
  }
  UNION {
    SELECT DISTINCT ?article ?lang WHERE {
      ?taxid ps:P685 "63221". 
      ?speciesId p:P685 ?taxid.
      ?species ^wdt:P366 ?speciesId .
      ?article schema:about ?species.
      ?article schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] . 
      ?article schema:inLanguage ?lang .
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
    }
  }
  UNION {
    SELECT DISTINCT ?article ?lang WHERE {
      ?item ?label "Homo neanderthalensis"@en.  
      ?article schema:about ?item .
      ?article schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] . 
      ?article schema:inLanguage ?lang .
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
    }
  }
}

This request can yield duplicates (several pages for one language), but by selecting only one of them it should be fine?

Note that when a species misses a taxid, it is possible to add it directly in wikidata by editing its page.

Example for Homo Neanderthalensis

@juba
Copy link
Member Author

juba commented Jul 15, 2024

Problem in this case: making the request on sciname can yield articles which are not for the same species, as there can be homonymous species.

Example for Stigmatella

Should we keep searching on sciname ?

@juba
Copy link
Member Author

juba commented Jul 16, 2024

It seems better to not query on scinames in order to avoid non-relevant results. So the query should be the following (replace 63221 by the taxid of interest):

SELECT DISTINCT * WHERE {
  {
    SELECT DISTINCT ?article ?lang WHERE {
      ?taxid ps:P685 "63221". 
      ?speciesId p:P685 ?taxid.
      ?article schema:about ?speciesId.
      ?article schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] . 
      ?article schema:inLanguage ?lang .
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
    }
  }
  UNION {
    SELECT DISTINCT ?article ?lang WHERE {
      ?taxid ps:P685 "63221". 
      ?speciesId p:P685 ?taxid.
      ?species ^wdt:P366 ?speciesId .
      ?article schema:about ?species.
      ?article schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] . 
      ?article schema:inLanguage ?lang .
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
    }
  }
}

Example for Neanderthal

@juba juba changed the title Wikidata queries Wikidata query for wikipedia articles Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant