A Node module that parses the popular, (at least in Greece) classifieds website, insomnia.gr
Please note that this is a personal side project I work on during my free time. I may use weird/uncommon ways of doing stuff just for learning purposes. As it stands though, the software is fully functional and I'm pushing only working prototypes.
If you wish you can always drop me a line with suggestions/issues in issues or at @karavas.
insomniac has the following dependencies:
- cheerio (for parsing purposes)
- request (how else?)
- request-native-promise (because we're all ES6 pros)
- querystring
You can install this module using npm:
npm install https://github.com/iokaravas/insomniac.git --save
const insomniac = require('insomniac')
insomniac.latest(2, {title: ['Playstation 4','PS4']}).then(function (listings) {
listings.map(listing => console.log(listing))
})
Printing on screen all the available categories (in url format -you'll probably need this for requesting a category)
insomniac.listCategories().then(function (categories) {
categories.forEach(function (category) {
console.log(Object.entries(category)[0].join('-'))
})
})
Printing on screen classifieds in category '8-πλήρη-συστήματα' (5 pages) with title containing 'Ryzen'
insomniac.category(5, {title: ['Ryzen']},'8-πλήρη-συστήματα').then(function (listings) {
listings.map(listing => console.log(listing))
})
This will print an array of objects (classified listings) of this structure:
{
title: 'The classified title',
price: 50, // the price in EUR
thumb: 'insomnia_classified_image.jpeg',
url: 'http://link-to-classified-in-site/',
dateScrapped: 2019-02-05T15:20:36Z
}
Unfortunately date information for each listing is not available. We keep the date insomniac scrapped the listing.
- Ioannis (John) Karavas - Initial work - iokaravas
See also the list of contributors who participated in this project.
DISCLAIMER:
Much of the parsing is quite fragile. Since sites change all the time, it is not uncommon for parsers to break when pages change in some way.
Several things could be added and/or improved, including :
- Allow for more precise filtering, as currently only title is taken into consideration
- Discard parameters for options object instead
- Handle specific errors, although you still can catch the final error from Promise.all()
- Parse even more information
- Better naming