Skip to content

koverholt/scrapy-site-downloader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scrapy-site-downloader

Overview

Template project for downloading a site with Scrapy. Crawls, scrapes, and saves HTML files from a given website, domain, and URL filters.

Steps to run

  1. Clone this repository and cd into it
  2. Install the dependencies using the following command:
    pip install -r requirements.txt
    
  3. Configure the crawler/spiders/site.py file for the site you want to crawl
  4. Start the downloader using the following command (be sure to run this from the repository root!):
    scrapy crawl site
    
  5. Refer to the Scrapy documentation for best practices and other configuration options
  6. When the crawler finishes, the HTML files will be located in the /html directory

About

Template project for downloading a site with Scrapy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages