-
Notifications
You must be signed in to change notification settings - Fork 0
SEP Archives
Jaimie Murdock edited this page Apr 13, 2018
·
10 revisions
See also: SEP Mirror
Once the SEP Mirror has finished running, topic models for each archive can be trained through use of the sep-corpus-builder
.
To train an individual quarter use the script in ~inphosite/sep-corpus-builder/build.py
to get the documents. Then zip using zip $QUARTER.zip data_$QUARTER/*
.
#!/bin/bash
SEASONYEAR=$1
SEASON=${SEASONYEAR::-4}
YEAR=${SEASONYEAR#$SEASON}
case $SEASON in
'win') SEASONDESC='Winter';;
'spr') SEASONDESC='Spring';;
'sum') SEASONDESC='Summer';;
'fall') SEASONDESC='Fall';;
esac
DESC="Stanford Encyclopedia of Philosophy ($SEASONDESC $YEAR)"
INI="sep.$SEASONYEAR.ini"
# python build.py $SEASONYEAR
topicexplorer init --name $DESC data_$SEASONYEAR $INI -q
topicexplorer prep $INI --high-percent 45 --low-percent 5 --lang en --min-word-len 3 -q
topicexplorer train $INI -k 20 40 60 80 100 120 --iter 500 -p 24
topicexplorer export -o /tmp/sep.$SEASONYEAR.tez $INI
aws s3 cp /tmp/sep.$SEASONYEAR.tez s3://hypershelf/sep.$SEASONYEAR.tez --acl bucket-owner-full-control
- master
- mining
- sep-topics
- hypershelf