entityfactspicturesharvester is a commandline command (Python3 program) that reads depiction information (images URLs) from given EntityFacts sheets* (as line-delimited JSON records) and retrieves and stores the pictures and thumbnails contained in this information
*) EntityFacts are "fact sheets" on entities of the Integrated Authority File (GND), which is provided by German National Library (DNB)
It eats EntityFacts sheets as line-delimited JSON records from stdin.
It retrieves and stores the pictures (/thumbnails) linked in the depiction information of the EntityFacts sheets one by one as file into the give directory.
entityfactspicturesharvester
optional arguments:
-h, --help show this help message and exit
- example:
example: entityfactspicturesharvester < [INPUT LINE-DELIMITED JSON FILE WITH ENTITYFACTS SHEETS]
Each (found) picture will be stored with the following pattern: image_[GND IDENTIFIER].[ORIGINAL FILE ENDING]
, e.g., image_116458461.jpg
(GND identfier = 116458461; file ending = jpg)
Each (found) thumbnail will be stored with the following pattern: thumbnail_[GND IDENTIFIER].[ORIGINAL FILE ENDING]
, e.g., thumbnail_172323940.png
(GND identfier = 172323940; file ending = png)
If you run into '429' responses ("too many requests", see, e.g., HTTP status code 429 at httpstatuses.com), then you may try to reduce the number of threads of the thread pool schedulers (line 31 and 32) and/or enable (+ (optionally) setup) the time delays before emitting the picture/thumbnail URLs (line 68 and 146) and/or before doing a request (line 157).
- clone this git repo or just download the entityfactspicturesharvester.py file
- run ./entityfactspicturesharvester.py
- for a hackish way to use entityfactspicturesharvester system-wide, copy to /usr/local/bin
sudo -H pip3 install --upgrade [ABSOLUTE PATH TO YOUR LOCAL GIT REPOSITORY OF ENTITYFACTSPICTURESHARVESTER]
(which provides you entityfactssheetsharvester
as a system-wide commandline command)
- entityfactssheetsharvester - a commandline command (Python3 program) that retrieves EntityFacts sheets from a given CSV with GND identifiers and returns them as line-delimited JSON records
- entityfactspicturesmetadataharvester - a commandline command (Python3 program) that reads depiction information (images URLs) from given EntityFacts sheets (as line-delimited JSON records) and retrieves the (Wikimedia Commons file) metadata of these pictures (as line-delimited JSON records)