The data sources can be found in the folders duckDuckGo
, metaGer
and googleScholar
as well as overview.md
.
Due to copyright restrictions all images and pdfs that are part of the source data are excluded from this repository.
The original image links can be reconstructed from the data.
This repository contains all scripts used during the study.
Image extraction:
google-scholar/images/extract-pdf-images.sh
reads all pdfs and outputs all images contained in the pdfextract-pictures.py
reads all image sources and outputsannotations.md
(for requirements seePipfile
)update-result-html.sh
readsannotations.md
and outputsresults.html
(needs pandoc, also usesresult-header.html
)result-explorer.js
runs in the browser and provides basic filtering and statistics inresults.html
The scripts extract-pictures.py
and result-explorer.js
are released under the MIT License.
The results can be viewed in the results.html
webpage (direct link).
The documentation for the tags used can be found in tag-documentation.md.
The resulting shapes can be found in shapes.md.
The search filters and the image type filters work subtractive. If a tag (or search text/source) is deselected then any image with that tag (or from that search) gets excluded.
The elements and elements specific filters work differently. If one or more tags are selected then only images with all selected tags are included. The element specific tags are provided as standalone filter and as tag bigrams. The tag bigrams can be used to target only elements with that specific tag.