Skip to content

Latest commit

 

History

History
25 lines (11 loc) · 1.09 KB

README.md

File metadata and controls

25 lines (11 loc) · 1.09 KB

wmd4j

wmd4j is a Java library for computing Word Mover's Distance (WMD) between 2 text documents. It provides same functionality as Word2Vec.wmdistance in Gensim.

wmd4j depends on deeplearning4j WordVectors interface for word vectors manipulation and uses optimized version of JFastEMD (Earth Mover's Distance transportaion problem) underneath, which is about 1.8x faster.

Usage

WordVectors vectors = WordVectorSerializer.loadGoogleModel(new File(word2vecPath), false);
WordMovers wm = WordMovers.Builder().wordVectors(vectors).build();

wm.distance("obama speaks to the media in illinois", "the president greets the press in chicago");

Validation

wmd4j is validated against Gensim's wmdistance results on custom word2vec model.