[Python] URL Scraper finder #1013

Belleyy · 2022-06-05T19:58:40Z

This scraper will get trigger by any url (it match a dot .).

What I need to do ?

You have the url you want to scrape but don't know if there is a scraper for it.
Put the URL in the field like other url scraper.
Press the scrape button.
Scrape window appear or not, check the log for more info 😄

Process

It will create a folder in your scraper folder called tmp. It will download the whole repo (master.zip) and place it inside this new folder.
- It will re-download the zip every week (7days).
- In this folder, it should have only 2 files permanently, the zip and list.
It extract the list (SCRAPERS-LIST.md), and find a scraper for your URL. It will also check your local scraper file.
❌If there no scraper with that URL, it don't do anything. Request the scraper because it probably don't exist. (There is some scraper that could exist but my script don't match like mgstage)
✔If a scraper is found, it will extract it, reload scraper on Stash, scrape, remove the scraper, reload scraper again.
- ⚠If the scraper is a python script, it won't extract it. They often require setup so don't want to deal with that. It will warn you in the log.
- If you have the scraper file in your scraper folder but don't have the url, the script will tell you that there is a update to this file, that added a new URL to it.
🔷If you already have the scraper locally, it will scrape your scene normally.
- This script sometime overwrite the correct scraper when you press the URL scrape button. There is no fixed order when you press the scrape button.
To don't call himself, the script rename his own .yml (to .yml.tmp) during the operation.

Why ?

You are sure to have latest scraper.
Don't have tons of file inside your scraper folder, only keeping scraper you use often:
- Scraper that need setup (like Python & ThePornDB)
- Scraper that you use for ScraperByName & Fragment
Lazy to check the scraper list / download the file.

Draft, because I don't know if it's should be in the repo. If someone download the repo without knowledge, this scraper will most likely to be trigger.

JaseNZC · 2022-06-06T09:48:29Z

Awesome idea @Belleyy

Belleyy added 8 commits May 24, 2022 22:26

Update graphql.py

6ccfc87

Add files via upload

c04b6b3

Update +AnyScraper.py

9d2f2a5

Rename +AnyScraper.py to +FindScraper.py

c047652

Update +AnyScraper.yml

4d0c91a

Rename +AnyScraper.yml to +FindScraper.yml

a11cbed

Update graphql.py

3e73da7

Update +FindScraper.py

d0d33e2

bnkai added the script Scraper executes a script label Jun 7, 2022

Maista6969 force-pushed the master branch from 48ab227 to 8e2b818 Compare September 10, 2023 10:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] URL Scraper finder #1013

[Python] URL Scraper finder #1013

Belleyy commented Jun 5, 2022 •

edited

Loading

JaseNZC commented Jun 6, 2022

[Python] URL Scraper finder #1013

Are you sure you want to change the base?

[Python] URL Scraper finder #1013

Conversation

Belleyy commented Jun 5, 2022 • edited Loading

What I need to do ?

Process

Why ?

JaseNZC commented Jun 6, 2022

Belleyy commented Jun 5, 2022 •

edited

Loading