This is a public repository containing scrapers created by the Stash Community.
❗ Make sure to read ALL of the instructions here before requesting any help in the Discord channel. For a more user friendly step-by-step guide you can check out the Guide to Scraping ❗
When asking for help do not forget to mention what version of Stash you are using, the scraper that is failing, the URL you are attempting to scrape, and your current Python version (but only if the scraper requires Python)
Note that some scrapers (notably ThePornDB for Movies and ThePornDB for JAV) require extra configuration. As of v0.24.0 this is not possible through the web interface so you will need to open these in a text editor and read the instructions to add the necessary fields, usually an API key or a cookie.
With the v0.24.0 release of Stash you no longer need to install scrapers manually: if you go to Settings > Metadata Providers
you can find the scrapers from this repository in the Community (stable)
feed and install them without ever needing to copy any files manually. Note that some scrapers still require manual configuration
If you still prefer to manage your scrapers manually that is still supported as well, using the same steps as before. Manually installed scrapers and ones installed through Stash can both be used at the same time.
To download all of the scrapers at once you can clone the git repository. If you only need some of the scrapers they can be downloaded individually.
When downloading directly click at the .yml
you want and then make sure to click the raw button:
and then save page as file from the browser to preserve the correct format for the .yml
file.
Any scraper file has to be stored in the path you've configured as your Scrapers Path
in Settings > System > Application Paths
, which is ~/.stash/scrapers
by default. You may recognize ~/.stash
as the folder where the config and database file are located.
After manually updating the scrapers folder contents or editing a scraper file a reload of the scrapers is needed and a refresh of the edit scene/performer page. (Scrape with... -> Reload scrapers)
Some sites block content if the user agent is not valid. If you get some kind of blocked or denied message make sure to configure the Scraping -> Scraper User Agent
setting in stash. Valid strings e.g. for firefox can be found here https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent/Firefox . Scrapers for those sites should have a comment mentioning this along with a tested and working user agent string
Scrapers with useCDP set to true require that you have properly configured the Chrome CDP path
setting in Stash. If you decide to use a remote instance the headless chromium docker image from https://hub.docker.com/r/chromedp/headless-shell/ is highly recommended.
Some scrapers require external programs to function, usually Python. All scrapers are tested with the newest stable release of Python, currently 3.12.2.
Depending on your operating system you may need to install both Python and the scrapers' dependencies before they will work. For Windows users we strongly recommend installing Python using the installers from python.org instead of through the Windows Store, and also installing it outside of the Users folder so it is accessible to the entire system: a commonly used option is C:\Python312
.
After installing Python you should install the most commonly used dependencies by running the following command in a terminal window:
python -m pip install stashapp-tools requests cloudscraper beautifulsoup4 lxml
You may need to replace python
with py
in the command if you are running on Windows.
If Stash does not detect your Python installation you can set the Python executable path
in Settings > System > Application Paths
. Note that this needs to point to the executable itself and not just the folder it is in.
Some scrapers need extra configuration before they will work. This is unfortunate if you install them through the web interface as any updates will overwrite your changes.
- ThePornDBMovies and ThePornDBJAV need to be edited to have your API key in them. Make sure you do not remove the
Bearer
part when you add your key. - Python scrapers that need to communicate with your Stash (to create markers, for example, or to search your file system) might need to be configured to talk to your local Stash: by default they will use
http://localhost:9999/graphql
with no authentication to make their queries, but if your setup requires otherwise then you can findpy_common/config.ini
and set your own values. - Python scrapers that can be configured will (usually) create a default configuration file called
config.ini
in their respective directories the first time you run them.
You can find a list of sites that currently have a scraper in SCRAPERS-LIST.md
💥 For most scrapers you have to provide the scene/performer URL
Stable build (>=v0.11.0) |
---|
Once you populate the URL field with an appropriate url, the scrape URL button will be active. |
Clicking on that button brings up a popup that lets you select which fields to update.
Some scrapers support the Scrape with...
function so you can you use that instead of adding a url. Scrape with...
usually works with either the Title
field or the filename so make sure that they provide enough data for the scraper to work with.
A Query
button is also available for scrapers that support that. Clicking the button allows you to edit the text that the scraper will use for your queries.
In case of errors/no results during scraping make sure to check stash's log section (Settings->Logs->Log Level Debug) for more info.
For more info please check the scraping help section
Contributions are always welcome! Use the Scraping Configuration help section to get started and stop by the Discord #scrapers channel with any questions.
The last line of a scraper definition (.yml
file) must be the last updated date, in the following format:
# Last Updated Month Day, Year
Month = Full month name (October
)
Day = Day of month, with leading zero (04
, 16
)
Year = Full year (2020
)
Example: # Last Updated October 04, 2020
The scrapers in this repository can be validated against a schema and checked for common errors.
First, install the validator's dependencies - inside the ./validator
folder, run: yarn
.
Then, to run the validator, use node validate.js
in the root of the repository.
Specific scrapers can be checked using: node validate.js scrapers/foo.yml scrapers/bar.yml
Instead of NodeJS being installed, Docker can be used to run the validator
docker run --rm -v .:/app node:alpine /bin/sh -c "cd /app/validator && yarn install --silent && cd .. && node validate.js --ci""