GitHub - 1UC1F3R616/onion-crawler

Onion Crawler

Scrape and Store Dark Web Sites along with therir Latency | Crawler for Dark Web | Search Engine Oriented

Functionalities

It's implemented using BFS and in coming time it's implementation will be changed to A* Search (Preferential Crawler)
fetch onion links
recursive fetching
store scrapped data
user added url
url blacklisting

Increasing the crawler reach

Increase Crawl Depth
Add More starter links
Create more spiders with special focus on Directories

Spiders

DRL Link Dir Onion
- A big directory of urls
UADD User Added
- Added by user
- presently links are appened in user_added_urls.txt under spider_data
- Crawled in exactly similar fashion as to DRL

Instructions to run

Pre-requisites:

-  Py3
-  Tor

< directions to install >

pip install -r requirements.txt

< directions to execute > (For commit of Jul 10, 2020)

This commit contains pipeline to generate data in csv/json file
You can run this without much effort

# start tor on port 9150
pproxy -l http://:8181 -r socks5://127.0.0.1:9150 -vv
scrapy crawl name_of_spider # DRL

< directions to execute > (After commit Jul 10, 2020)

This commit contains data pipeline to save data on MongoDB Server
You need to setup MongoDB Server connection credentials and URI in settings.py/pipeline.py file

Sample JSON

Instructional Video

YouTube URL

Bigger Datasets

DarkNet Dataset

StoryBoarding

Idea hamster: --

Developer: I had prior experience with web scraping but hadn't worked with web crawlers before and scraping the deep-web was also something new to me as it required setting up tor proxy. This project developed my interest in the web-mining subject and therefore it encorged me to take this subject as a subject in college curriculum.

Contributors

Idea by- Angad Sharma

1UC1F3R616 (Kush Choudhary)

Made with ❤️ by DSC VIT

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
Connecting With Onion Service using requests		Connecting With Onion Service using requests
NodeApp		NodeApp
dark_web_scraping		dark_web_scraping
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Onion Crawler

Scrape and Store Dark Web Sites along with therir Latency | Crawler for Dark Web | Search Engine Oriented

Functionalities

Increasing the crawler reach

Spiders

Instructions to run

Pre-requisites:

< directions to install >

< directions to execute > (For commit of Jul 10, 2020)

< directions to execute > (After commit Jul 10, 2020)

Sample JSON

Instructional Video

Bigger Datasets

StoryBoarding

Contributors

About

Releases

Packages

Languages

1UC1F3R616/onion-crawler

Folders and files

Latest commit

History

Repository files navigation

Onion Crawler

Scrape and Store Dark Web Sites along with therir Latency | Crawler for Dark Web | Search Engine Oriented

Functionalities

Increasing the crawler reach

Spiders

Instructions to run

Pre-requisites:

< directions to install >

< directions to execute > (For commit of Jul 10, 2020)

< directions to execute > (After commit Jul 10, 2020)

Sample JSON

Instructional Video

Bigger Datasets

StoryBoarding

Contributors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages