This is the model that is used in Vita application. The models are trained using CRFSuite.
If you want to download the trained model, please contact us at truongdo[at]vais.vn. The model is quite large and GitHub does not allow me to upload large files.
2 models are available:
- PoS (Part of speech tagging): models/word_pos.model
- Word segmentation: models/word_segment.model
- The training data for word segmentation comes from http://jvnsegmenter.sourceforge.net/.
- The training data for PoS comes from https://github.com/lupanh/vTools
- Word segmentation: ~95% F1 (about the same with the original paper)
- PoS: ~89.72% Accuracy
- Chunking: 86.20% Accuracy
The script ./run.sh
shows how I trained the model.