Skip to content
Dima Kudosh edited this page Jan 16, 2017 · 10 revisions

Difflib

Port of Python's difflib library to Rust. It's provide all necessary tools for comparing word sequences.

Installation

Simply add difflib to your dependencies block in Cargo.toml

[dependencies]
difflib = "0.2"

Usage

Sequence trait implements for str, Vec of str, String and Vec of String, so all parameterized functions(structs) accepts only this types.

  • fn context_diff<T: Sequence>(first_sequence: &T, second_sequence: &T, from_file: &str, to_file: &str, from_file_date: &str, to_file_date: &str, n: usize) -> Vec<String>

Compare first_sequence and second_sequence (vector of strings) and return vector of strings delta in context diff format. Context diffs are a compact way of showing just the lines that have changed plus a few lines of context. The changes are shown in a before/after style. The number of context lines is set by n which defaults to three. Context diff format has a header for filenames and modification times. Any or all of these may be specified using strings for from_file, to_file, from_file_date, and to_file_date.

  • fn unified_diff<T: Sequence>(first_sequence: &T, second_sequence: &T, from_file: &str, to_file: &str, from_file_date: &str, to_file_date: &str, n: usize) -> Vec<String>

Compare first_sequence and second_sequence (vector of strings) and return vector of strings delta in unified diff format. Unified diffs are a compact way of showing just the lines that have changed plus a few lines of context. The changes are shown in a before/after style. The number of context lines is set by n which defaults to three. Unified diff format has a header for filenames and modification times. Any or all of these may be specified using strings for from_file, to_file, from_file_date, and to_file_date.

  • fn get_close_matches<'a>(word: &str, possibilities: Vec<&'a str>, n: usize, cutoff: f32) -> Vec<&'a str>

Return a list of the n best matches to word from possibilities vector. cutoff is float in the range [0..1]. All possibilities smaller than cutoff will ignored. If cutoff not in [0..1] programm will panic.

Example

        // unified_diff
	let first_text = "one two three four".split(" ").collect::<Vec<&str>>();
	let second_text = "zero one tree four".split(" ").collect::<Vec<&str>>();
	let diff = difflib::unified_diff(&first_text, &second_text, "Original", "Current",
			"2005-01-26 23:30:50", "2010-04-02 10:20:52", 3);
	for line in &diff {
		println!("{:?}", line);
	}

	//context_diff
	let diff = difflib::context_diff(&first_text, &second_text, "Original", "Current",
			"2005-01-26 23:30:50", "2010-04-02 10:20:52", 3);
	for line in &diff {
		println!("{:?}", line);
	}

        //get_close_matches
	let words = vec!["ape", "apple", "peach", "puppy"];
	let result = difflib::get_close_matches("appel", words, 3, 0.6);
	println!("{:?}", result);

Differ struct

  • fn new() -> Differ

Differ constructor. Differ struct has 2 optional arguments: line_junk: Option<fn(&str) -> bool> and char_junk: Option<fn(&str) -> bool>. It's public fields, by default it's None, but you can set your function. line_junk used for junking string, when it return true it's mean that passed string is junk. char_junk used for junking 1 element string(char), when it return true it's mean that passed 1 element string is junk.

  • fn compare<T: ?Sized + Sequence>(&self, first_sequence: &T, second_sequence: &T) -> Vec<String>

Compare two sequences of lines, and generate the delta (a sequence of lines).

  • fn restore(delta: &Vec<String>, which: usize) -> Vec<String>

Return one of the two sequences that generated a delta. delta is a result of compare method, which is number that tell which sequence you want to restore. which must be 1 or 2. If you pass another number it will panic.

####Example

    let differ = Differ::new();
	let diff = differ.compare(&first_text, &second_text);
	for line in &diff {
		println!("{:?}", line);
	}

SequenceMatcher

  • fn new(first_sequence: &'a T, second_sequence: &'a T) -> SequenceMatcher<'a, T>

SequenceMatcher constructor. Accepts first and second sequencies.

  • fn set_is_junk(&mut self, is_junk: Option<fn(&str) -> bool>)

You can set user-defined function if you want to ignore some elements.

  • fn set_seqs(&mut self, first_sequence: &'a T, second_sequence: &'a T)
  • fn set_first_seq(&mut self, sequence: &'a T)
  • fn set_second_seq(&mut self, sequence: &'a T)

You can set another sequencies using this methods.

  • fn find_longest_match(&self, first_start: usize, first_end: usize, second_start: usize, second_end: usize) -> Match

Return Match struct with info about longest common sequence from self.first_sequence[first_start..first_end] and self.second_sequence[second_start..second_end]. Match struct has this fields: first_start, second_start, size.

  • fn get_matching_blocks(&mut self) -> Vec<Match>

Return all common sequencies.

  • fn ratio(&mut self) -> f32

Return a measure of the sequences similarity.

  • fn get_opcodes(&mut self) -> Vec<Opcode>

Return vector of Opcodes structs describing how to turn self.first_sequence into self.second_sequence. Opcode struct has this field: tag, first_start, first_end, second_start, second_end.

  • fn get_grouped_opcodes(&mut self, n: usize) -> Vec<Vec<Opcode>>

Return a vector of groups with up to n lines of context.

Example

    let mut matcher = SequenceMatcher::new("one two three four", "zero one tree four");
	let m = matcher.find_longest_match(0, 18, 0, 18);
	println!("{:?}", m);
	let all_matches = matcher.get_matching_blocks();
	println!("{:?}", all_matches);
	let opcode = matcher.get_opcodes();
	println!("{:?}", opcode);
	let grouped_opcodes = matcher.get_grouped_opcodes(2);
	println!("{:?}", grouped_opcodes);
	let ratio = matcher.ratio();
	println!("{:?}", ratio); 
	matcher.set_seqs("aaaaa", "aaaab");
	println!("{:?}", matcher.ratio());
Clone this wiki locally