arxiv-crawler

Crawl arXiv paper and organize as a database

Modifying crawling range

# crawler.py
fields = ['CV']
months = ['{:0>2d}'.format(i+1) for i in range(12)]
years = ['{:0>2d}'.format(i) for i in range(6, 17)]

Launch the crawler

$ python crawler.py
Retrieving http://arxiv.org/list/cs.CV/0601?show=1000
...

Check the results

$ python
>>> import sqlite3
>>> conn = sqlite3.connect('arxiv_raw.sqlite')
>>> cur = conn.cursor()
>>> cur.execute('SELECT * FROM sqlite_master')
>>> print cur.fetchall() # print the information for all tables

Future work

Still figuring the best way to visualize papers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

arxiv-crawler

Modifying crawling range

Launch the crawler

Check the results

Future work

Files

README.md

Latest commit

History

README.md

File metadata and controls

arxiv-crawler

Modifying crawling range

Launch the crawler

Check the results

Future work