Skip to content

SEP Archives

Jaimie Murdock edited this page Apr 1, 2018 · 10 revisions

See also: SEP Mirror

Once the SEP Mirror has finished running, topic models for each archive can be trained through use of the sep-corpus-builder.

To train an individual quarter use the script in ~inphosite/sep-corpus-builder/build.py to get the documents. Then zip using zip $QUARTER.zip data_$QUARTER/*.

#!/bin/bash
SEASONYEAR=$1
SEASON=${SEASONYEAR::-4}
YEAR=${SEASONYEAR#$SEASON}

case $SEASON in
  'win') SEASONDESC='Winter';;
  'spr') SEASONDESC='Spring';;
  'sum') SEASONDESC='Summer';;
  'fall') SEASONDESC='Fall';;
esac

DESC="Stanford Encyclopedia of Philosophy ($SEASONDESC $YEAR)"
INI="sep.$SEASONYEAR.ini"

# python build.py $SEASONYEAR
# topicexplorer init --name $DESC data_$SEASONYEAR $INI
# topicexplorer prep $INI --high .25 --low .1 --lang en -q
# topicexplorer train $INI -k 20 40 60 80 100 120 --iter 1000 -p 8 -q
# topicexplorer export -o /tmp/sep.$SEASONYEAR.tez $INI
# aws s3 cp /tmp/sep.$SEASONYEAR.tez s3://hypershelf/sep.$SEASONYEAR.tez --acl bucket-owner-full-control
Clone this wiki locally