-
Notifications
You must be signed in to change notification settings - Fork 8
Home
Port of Python's difflib library to Rust. It's provide all necessary tools for comparing word sequences.
Simply add difflib to your dependencies block in Cargo.toml
[dependencies]
difflib = "0.2"
Sequence trait implements for str, Vec of str, String and Vec of String, so all parameterized functions(structs) accepts only this types.
fn context_diff<T: Sequence>(first_sequence: &T, second_sequence: &T, from_file: &str, to_file: &str, from_file_date: &str, to_file_date: &str, n: usize) -> Vec<String>
Compare first_sequence
and second_sequence
(vector of strings) and return vector of strings delta in context diff format. Context diffs are a compact way of showing just the lines that have changed plus a few lines of context. The changes are shown in a before/after style. The number of context lines is set by n which defaults to three. Context diff format has a header for filenames and modification times. Any or all of these may be specified using strings for from_file
, to_file
, from_file_date
, and to_file_date
.
fn unified_diff<T: Sequence>(first_sequence: &T, second_sequence: &T, from_file: &str, to_file: &str, from_file_date: &str, to_file_date: &str, n: usize) -> Vec<String>
Compare first_sequence
and second_sequence
(vector of strings) and return vector of strings delta in unified diff format. Unified diffs are a compact way of showing just the lines that have changed plus a few lines of context. The changes are shown in a before/after style. The number of context lines is set by n which defaults to three. Unified diff format has a header for filenames and modification times. Any or all of these may be specified using strings for from_file
, to_file
, from_file_date
, and to_file_date
.
fn get_close_matches<'a>(word: &str, possibilities: Vec<&'a str>, n: usize, cutoff: f32) -> Vec<&'a str>
Return a list of the n
best matches to word
from possibilities
vector. cutoff
is float in the range [0..1]. All possibilities smaller than cutoff will ignored. If cutoff not in [0..1] programm will panic.
// unified_diff
let first_text = "one two three four".split(" ").collect::<Vec<&str>>();
let second_text = "zero one tree four".split(" ").collect::<Vec<&str>>();
let diff = difflib::unified_diff(&first_text, &second_text, "Original", "Current",
"2005-01-26 23:30:50", "2010-04-02 10:20:52", 3);
for line in &diff {
println!("{:?}", line);
}
//context_diff
let diff = difflib::context_diff(&first_text, &second_text, "Original", "Current",
"2005-01-26 23:30:50", "2010-04-02 10:20:52", 3);
for line in &diff {
println!("{:?}", line);
}
//get_close_matches
let words = vec!["ape", "apple", "peach", "puppy"];
let result = difflib::get_close_matches("appel", words, 3, 0.6);
println!("{:?}", result);
fn new() -> Differ
Differ constructor. Differ struct has 2 optional arguments: line_junk: Option<fn(&str) -> bool>
and char_junk: Option<fn(&str) -> bool>
. It's public fields, by default it's None, but you can set your function. line_junk
used for junking string, when it return true it's mean that passed string is junk. char_junk
used for junking 1 element string(char), when it return true it's mean that passed 1 element string is junk.
fn compare<T: ?Sized + Sequence>(&self, first_sequence: &T, second_sequence: &T) -> Vec<String>
Compare two sequences of lines, and generate the delta (a sequence of lines).
fn restore(delta: &Vec<String>, which: usize) -> Vec<String>
Return one of the two sequences that generated a delta. delta
is a result of compare method, which
is number that tell which sequence you want to restore. which
must be 1 or 2. If you pass another number it will panic.
####Example
let differ = Differ::new();
let diff = differ.compare(&first_text, &second_text);
for line in &diff {
println!("{:?}", line);
}
fn new(first_sequence: &'a T, second_sequence: &'a T) -> SequenceMatcher<'a, T>
SequenceMatcher constructor. Accepts first and second sequencies.
fn set_is_junk(&mut self, is_junk: Option<fn(&str) -> bool>)
You can set user-defined function if you want to ignore some elements.
fn set_seqs(&mut self, first_sequence: &'a T, second_sequence: &'a T)
fn set_first_seq(&mut self, sequence: &'a T)
fn set_second_seq(&mut self, sequence: &'a T)
You can set another sequencies using this methods.
fn find_longest_match(&self, first_start: usize, first_end: usize, second_start: usize, second_end: usize) -> Match
Return Match struct with info about longest common sequence from self.first_sequence[first_start..first_end]
and self.second_sequence[second_start..second_end]
. Match struct has this fields: first_start, second_start, size.
fn get_matching_blocks(&mut self) -> Vec<Match>
Return all common sequencies.
fn ratio(&mut self) -> f32
Return a measure of the sequences similarity.
fn get_opcodes(&mut self) -> Vec<Opcode>
Return vector of Opcodes structs describing how to turn self.first_sequence
into self.second_sequence
. Opcode struct has this field: tag, first_start, first_end, second_start, second_end.
fn get_grouped_opcodes(&mut self, n: usize) -> Vec<Vec<Opcode>>
Return a vector of groups with up to n lines of context.
let mut matcher = SequenceMatcher::new("one two three four", "zero one tree four");
let m = matcher.find_longest_match(0, 18, 0, 18);
println!("{:?}", m);
let all_matches = matcher.get_matching_blocks();
println!("{:?}", all_matches);
let opcode = matcher.get_opcodes();
println!("{:?}", opcode);
let grouped_opcodes = matcher.get_grouped_opcodes(2);
println!("{:?}", grouped_opcodes);
let ratio = matcher.ratio();
println!("{:?}", ratio);
matcher.set_seqs("aaaaa", "aaaab");
println!("{:?}", matcher.ratio());