-
sentencepiece, SentencePiece is an unsupervised text tokenizer and detokenizer
-
subword-nmt, preprocessing scripts to segment text into subword units
-
fastBPE, C++ implementation of Neural Machine Translation of Rare Words with Subword Units, with Python API
-
python-pinyin, 汉字转拼音
-
zhconv, 中文简繁转换
-
jieba, Python Chinese word segmentation module
-
Macropodus, 中文分词、词性标注、命名实体识别、关键词抽取、文本摘要、新词发现、文本相似度、计算器、数字转换、拼音转换、繁简转换等常见NLP功能