-
Notifications
You must be signed in to change notification settings - Fork 17
Ferret
petermr edited this page May 4, 2020
·
4 revisions
GO (currently doesn't work on MacOS):
brew install golang
go get github.com/MontFerret/ferret
Binary:
Download the ferret_darwin_x86_64.tar.gz
binary from the ferret releases page, unzip your local directory and link an alias to it
alias ferret="/your/local/directory/ferret_darwin_x86_64/ferret"
To test type the ferret
command
$ ferret
Welcome to Ferret REPL 0.10.1
Please use `exit` or `Ctrl-D` to exit this program.
>
Further information and tutorials about ferret can be found here
Sample Ferret Code for Scraping a Biorxiv page:
LET doc = DOCUMENT(@url, { driver: "cdp" })
LET authors = (
FOR auth in ELEMENTS(doc, '.highwire-citation-authors')
RETURN {
firstname : INNER_TEXT(auth,'.nlm-given-names'),
surname : INNER_TEXT(auth,'.nlm-surname'),
orcid_id : auth.a
}
)
RETURN {
abstract: INNER_TEXT(doc, '.abstract'),
acknowledgements: INNER_TEXT(doc,'.ack'),
title: INNER_TEXT(doc,'.highwire-cite-title'),
pub_time: ELEMENT(doc, 'meta[name="description"]'),
authors: authors,
sections: INNER_TEXT_ALL(doc, '[id^="sec-"]')
}
Ferret Command:
ferret --param=url:\"https://www.biorxiv.org/content/10.1101/2020.02.02.931162v2.full\" get_data.fql
Refresh the packages
sudo apt-get update
Make a folder to hold Ferret and download it, then make it executable
mkdir ~/ferret
cd ferret/
wget https://github.com/MontFerret/ferret/releases/download/v0.10.2/ferret_linux_x86_64.tar.gz
tar -zxvf ferret_linux_x86_64.tar.gz
chmod 777 ferret
Now install Docker and then install Chrome to run headlessly
sudo apt install docker.io
sudo docker pull alpeware/chrome-headless-stable
sudo docker run -d -p=0.0.0.0:9222:9222 --name=chrome-headless -v /tmp/chromedata/:/data alpeware/chrome-headless
-stable
Set up an alias to point to Ferret
alias ferret="~/ferret/ferret"
Create a get_data.fql
file as above, by running nano
and cutting and pasting.
And then run the retrieval
ferret --param=url:\"https://www.biorxiv.org/content/10.1101/2020.02.02.931162v2.full\" getdata.fql >getdata.json
PLEASE SHOW THE OUTPUT