Introduction

This is the model that is used in Vita application. The models are trained using CRFSuite.

If you want to download the trained model, please contact us at truongdo[at]vais.vn. The model is quite large and GitHub does not allow me to upload large files.

2 models are available:

PoS (Part of speech tagging): models/word_pos.model
Word segmentation: models/word_segment.model

Data

The training data for word segmentation comes from http://jvnsegmenter.sourceforge.net/.
The training data for PoS comes from https://github.com/lupanh/vTools

Accuracy

Word segmentation: ~95% F1 (about the same with the original paper)
PoS: ~89.72% Accuracy
Chunking: 86.20% Accuracy

Training script

The script ./run.sh shows how I trained the model.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
scripts		scripts
.gitignore		.gitignore
README.md		README.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Data

Accuracy

Training script

About

Releases

Packages

Languages

truongdo/vita-model

Folders and files

Latest commit

History

Repository files navigation

Introduction

Data

Accuracy

Training script

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages