-
Notifications
You must be signed in to change notification settings - Fork 0
SEP Archives
Jaimie Murdock edited this page Jan 6, 2018
·
10 revisions
See also: SEP Mirror
Once the SEP Mirror has finished running, topic models for each archive can be trained through use of the sep-corpus-builder
.
- Clone the repo:
git clone https://github.com/inpho/sep-corpus-builder/
- Run
python corpusbuilder.py
to generate copies of each unique version of the entries. - Run
python build.py $SEASONYEAR
to generate adata_SEASONYEAR
folder - Run
topicexplorer init --name "Stanford Encyclopedia of Philosophy ()" data_$SEASONYEAR sep.$SEASONYEAR.ini
to generate the corpus file. - Run
topicexplorer prep --high .25 --low .1 --lang en -q
to stoplist the corpus - Run
topicexplorer train -k 20 40 60 80 100 120 --iter 1000 -p 8 -q
to train the models. - TODO: Insert instructions on updating ini file.
- Run
topicexplorer export -o /tmp/sep.$SEASONYEAR.tez sep.$SEASONYEAR.ini
to export the model - Upload to S3 with
aws s3 cp /tmp/sep.$SEASONYEAR.tez s3://hypershelf/sep/sep.$SEASONYEAR.tez --acl bucket-owner-full-control
.
- master
- mining
- sep-topics
- hypershelf