You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently Emit returns only the keyword and the start and end indexes of the keyword. I'd like to get the whole word that a match was part of.
Eg:
List<String> triggerList = Lists.list("cd");
Trietrie = Trie.builder()
.addKeywords(triggerList)
.build();
Collection<Emit> result = trie.parseText("abcdxyz abz cdefg");
// Ideal world, currently you have to do some whitespace/end of string searching starting// with your given start and end indexes with a custom handlerresult.stream()
.map(Emit::matchingWord)
.toList();
// List<String>("abcdxyz", "cdefg")
The text was updated successfully, but these errors were encountered:
I have it really crudely implemented via emitHandler here:
prefixTrie.parseText(text, emit -> {
// All of my trie keywords are prefixes so I can guarantee wordStart will always be the start of a word.// Otherwise you have to search backwards from emit.Start() till you hit 0 or space.if (emit.getStart() != 0 && text.charAt(emit.getStart() - 1) != ' ') {
returnfalse;
}
// Because this lib doesn't offer matchPartOfWord method, there is a degenerative case where// we have a list of codes that all scan to the end of text.// Eg: codes 11222222 1222222 with text 11222222 will both scan nearly the full text and the entire method// will be bounded by O(n^2) if I recall my time complexity correctly.varwordStart = emit.getStart();
varwordEnd = text.indexOf(" ", emit.getEnd());
if (wordEnd < 0) {
words.add(text.substring(wordStart));
} else {
words.add(text.substring(wordStart, wordEnd));
}
returntrue;
});
Honestly I'm really hoping for a better solution than by brute force one.
Currently
Emit
returns only the keyword and the start and end indexes of the keyword. I'd like to get the whole word that a match was part of.Eg:
The text was updated successfully, but these errors were encountered: