Skip to content

Difference English sentences via Liechtenstein distance, calculate word error rate, and list out word by word differences

Notifications You must be signed in to change notification settings

utunga/sentence_diff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentence Differences - sentence_diff

Package to difference English sentences via Liechtenstein distance, calculate word error rate, and list out word by word differences

Basic usage

from sentence_diff import SentenceDiff

d = SentenceDiff("can i has 7 loaves of bread please ", "Can I have seven loaves, please?")
assert d.mistakes() == [
  ('has', 'have', 2, 'changed'),
  ('of', None, 5, 'added'),
  ('bread', None, 6, 'added')]

Word Error Rate - wer()

d = SentenceDiff("I like to meet people", "I really like to meet people")
assert d.wer() == 1/6
d = SentenceDiff("I really like to meet people", "I like to meet people")
assert d.wer() == 1/5

Changes - mistakes()

Added words

d = SentenceDiff("I like Like to eat people", "I like to eat people")
assert d.mistakes() == [
("Like", None, 2,'added')]

Changed words

d = SentenceDiff("How do you", "how are you")
assert d.mistakes() == [
("do", "are", 1, 'changed')]

Skipped words

d = SentenceDiff("How see you", "how good to see you")
assert d.mistakes() == [
(None, "good", 1, 'skipped'), 
(None, "to", 1, 'skipped')]

No differences (ignores punctuation and case)

d = SentenceDiff("my name is joe", "My name is Joe!")
assert d.mistakes() == []

What words from original are OK - yes_no_words()

d = SentenceDiff("can i have 7 loaves please", "Can I have seven loaves, please?")
assert d.yes_no_words() == [
("can", True),
("i", True),
("have", True),
("7", True),
("loaves", True),
("please", True)]

What words from original are OK or not? - yes_no_words()

d = SentenceDiff("can i have 7 loaves please", "Can I have seven loaves, please?")
assert d.yes_no_words() == [
("can", True),
("i", True),
("have", True),
("7", True),
("loaves", True),
("please", True)]

Full list of changes - scored_words()

d = SentenceDiff("can i has 7 loaves of bread please ", "Can I have seven loaves, please?")
assert d.scored_words() == [
('can', 'Can', 0, None),
('i', 'I', 1, None),
('has', 'have', 2, 'changed'),
('7', 'seven', 3, None),
('loaves', 'loaves', 4, None),
('of', None, 5, 'added'),
('bread', None, 6, 'added'),
('please', 'please', 7, None)]

About

Difference English sentences via Liechtenstein distance, calculate word error rate, and list out word by word differences

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages