Skip to content

SEP Archives

Jaimie Murdock edited this page Jan 6, 2018 · 10 revisions

See also: SEP Mirror

Once the SEP Mirror has finished running, topic models for each archive can be trained through use of the sep-corpus-builder.

  1. Clone the repo: git clone https://github.com/inpho/sep-corpus-builder/
  2. Run python corpusbuilder.py to generate copies of each unique version of the entries.
  3. Run python build.py $SEASONYEAR to generate a data_SEASONYEAR folder
  4. Run topicexplorer init --name "Stanford Encyclopedia of Philosophy ()" data_$SEASONYEAR sep.$SEASONYEAR.ini to generate the corpus file.
  5. Run topicexplorer prep --high .25 --low .1 --lang en -q to stoplist the corpus
  6. Run topicexplorer train -k 20 40 60 80 100 120 --iter 1000 -p 8 -q to train the models.
  7. TODO: Insert instructions on updating ini file.
  8. Run topicexplorer export -o /tmp/sep.$SEASONYEAR.tez sep.$SEASONYEAR.ini to export the model
  9. Upload to S3 with aws s3 cp /tmp/sep.$SEASONYEAR.tez s3://hypershelf/sep/sep.$SEASONYEAR.tez --acl bucket-owner-full-control.
Clone this wiki locally