Skip to content
This repository has been archived by the owner on Aug 5, 2024. It is now read-only.

Commit

Permalink
Update patch for fixing (null) bug
Browse files Browse the repository at this point in the history
Older versions of the Objective C library would occasionally create
patches from invalid separations of the surrogate pairs.

It would start by finding the longest common prefix which split a
surrogate pair in half when inserting in between existing surrogates
sharing the same high surrogate:

`\ud83c\udd70\ud83c\udd71` -> `\ud83c\udd70\ud83c\udd72\ud83c\udd71`

`\ud83c\udd70\ud83c` + `\udd72\ud83c` + `\udd71`

Next it would try to create diff groups from these and fail because of
the unpaired surrogates. The middle group is entirely wrong and gets
replaced with `(null)` while the other surrogate halves stick around _as
counts_ since that's how `toDelta()` converts them.

`=3\t+(null)\t=1`

When the libraries receive this patch they end up reconstructing an
invalid Unicode sequence because of the numbers which now instruct it to
split those same surrogate pairs.

---

In this patch we're identifying this specific sequence in `fromDelta()`
and removing the additional breakage by eliminating the `(null)` group
and re-joining the split surrogate halves. You can see that this will
effectively undo the change operation because that `(null)` is where the
new character was inserted. _There is no way to avoid this_ as the
problem occurred in the past and we have lost information to `(null)`.

In this patch we are merely removing the vestige of a failure that would
lead to additional failures if we passed on the data.
  • Loading branch information
dmsnell committed Dec 20, 2019
1 parent fa122d3 commit bdc4740
Show file tree
Hide file tree
Showing 11 changed files with 188 additions and 65 deletions.
39 changes: 28 additions & 11 deletions java/src/name/fraser/neil/plaintext/diff_match_patch.java
Original file line number Diff line number Diff line change
Expand Up @@ -1576,16 +1576,7 @@ private String decodeURI(String text) throws IllegalArgumentException {
throw new IllegalArgumentException();
}

// some objective-c versions of the library produced patches with
// (null) in the place where surrogates were split across diff
// boundaries. if we leave those in we'll be stuck with a
// high-surrogate (null) low-surrogate pattern that will break
// deeper in the library or consuming application. we'll "fix"
// these by dropping the (null) and re-joining the surrogate halves
return decoded.toString().replaceAll(
"([\\uD800-\\uDBFF])\\(null\\)([\\uDC00-\\uDFFF])",
"$1$2"
);
return decoded.toString();
}

/**
Expand All @@ -1601,7 +1592,8 @@ public LinkedList<Diff> diff_fromDelta(String text1, String delta)
LinkedList<Diff> diffs = new LinkedList<Diff>();
int pointer = 0; // Cursor in text1
String[] tokens = delta.split("\t");
for (String token : tokens) {
for (int x = 0; x < tokens.length; x++) {
String token = tokens[x];
if (token.length() == 0) {
// Blank tokens are ok (from a trailing \t).
continue;
Expand Down Expand Up @@ -1637,6 +1629,31 @@ public LinkedList<Diff> diff_fromDelta(String text1, String delta)
"Negative number in diff_fromDelta: " + param);
}
String text;

// some objective-c versions of the library produced patches with
// (null) in the place where surrogates were split across diff
// boundaries. if we leave those in we'll be stuck with a
// high-surrogate (null) low-surrogate pattern that will break
// deeper in the library or consuming application. we'll "fix"
// these by dropping the (null) and re-joining the surrogate halves
if (
x + 2 < tokens.length &&
Character.isHighSurrogate(text1.charAt(pointer + n - 1)) &&
tokens[x + 1].substring(1).equals("(null)") &&
Character.isLowSurrogate(text1.charAt(pointer + n))
) {
n -= 1;
tokens[x + 1] = "+";
int m;
try {
m = Integer.parseInt(tokens[x + 2].substring(1));
} catch (NumberFormatException e) {
throw new IllegalArgumentException(
"Invalid number in diff_fromDelta: " + tokens[x + 2].substring(1), e);
}
tokens[x + 2] = tokens[x + 2].charAt(0) + String.valueOf(m + 1);
}

try {
text = text1.substring(pointer, pointer += n);
} catch (StringIndexOutOfBoundsException e) {
Expand Down
12 changes: 6 additions & 6 deletions java/tests/name/fraser/neil/plaintext/diff_match_patch_test.java
Original file line number Diff line number Diff line change
Expand Up @@ -376,6 +376,12 @@ public static void testDiffText() {
LinkedList<Diff> diffs = diffList(new Diff(EQUAL, "jump"), new Diff(DELETE, "s"), new Diff(INSERT, "ed"), new Diff(EQUAL, " over "), new Diff(DELETE, "the"), new Diff(INSERT, "a"), new Diff(EQUAL, " lazy"));
assertEquals("diff_text1:", "jumps over the lazy", dmp.diff_text1(diffs));
assertEquals("diff_text2:", "jumped over a lazy", dmp.diff_text2(diffs));

assertEquals(
"diff_text2: Objective-C (null) bug",
"🙂🙁",
dmp.diff_text2(dmp.diff_fromDelta("🙂🙁", "=3\t+(null)\t=1"))
);
}

public static void testDiffDelta() {
Expand Down Expand Up @@ -460,12 +466,6 @@ public static void testDiffDelta() {
dmp.diff_toDelta(dmp.diff_fromDelta("\ud83c\udd70", "=1\t-1\t+%ED%B5%B1"))
);

assertEquals(
"diff_fromDelta: Invalid diff from objective-c with (null) string",
diffList(new Diff(INSERT, "\ud83c\udd70")),
dmp.diff_fromDelta("", "+%ED%A0%BC%28null%29%ED%B5%B0")
);

// Verify pool of unchanged characters.
diffs = diffList(new Diff(INSERT, "A-Z a-z 0-9 - _ . ! ~ * ' ( ) ; / ? : @ & = + $ , # "));
String text2 = dmp.diff_text2(diffs);
Expand Down
15 changes: 8 additions & 7 deletions javascript/diff_match_patch.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

30 changes: 23 additions & 7 deletions javascript/diff_match_patch_uncompressed.js
Original file line number Diff line number Diff line change
Expand Up @@ -1512,13 +1512,7 @@ diff_match_patch.prototype.decodeURI = function(text) {
throw new URIError('URI malformed');
}

// some objective-c versions of the library produced patches with
// (null) in the place where surrogates were split across diff
// boundaries. if we leave those in we'll be stuck with a
// high-surrogate (null) low-surrogate pattern that will break
// deeper in the library or consuming application. we'll "fix"
// these by dropping the (null) and re-joining the surrogate halves
return decoded.replace(/([\uD800-\uDBFF])\(null\)([\uDC00-\uDFFF])/g, "$1$2");
return decoded;
}
};

Expand Down Expand Up @@ -1556,6 +1550,28 @@ diff_match_patch.prototype.diff_fromDelta = function(text1, delta) {
if (isNaN(n) || n < 0) {
throw new Error('Invalid number in diff_fromDelta: ' + param);
}

// some objective-c versions of the library produced patches with
// (null) in the place where surrogates were split across diff
// boundaries. if we leave those in we'll be stuck with a
// high-surrogate (null) low-surrogate pattern that will break
// deeper in the library or consuming application. we'll "fix"
// these by dropping the (null) and re-joining the surrogate halves
if (
x + 2 < tokens.length &&
this.isHighSurrogate(text1[pointer + n - 1]) &&
'(null)' === tokens[x + 1].substring(1) &&
this.isLowSurrogate(text1[pointer + n])
) {
n -= 1;
tokens[x + 1] = "+";
var m = parseInt(tokens[x + 2].substring(1),10);
if (isNaN(m) || m < 0) {
throw new Error('Invalid number in diff_fromDelta: ' + tokens[x + 2].substring(1));
}
tokens[x + 2] = tokens[x + 2][0] + (m + 1).toString();
}

var text = text1.substring(pointer, pointer += n);
if (tokens[x].charAt(0) == '=') {
diffs[diffsLength++] = new diff_match_patch.Diff(DIFF_EQUAL, text);
Expand Down
15 changes: 6 additions & 9 deletions javascript/tests/diff_match_patch_test.js
Original file line number Diff line number Diff line change
Expand Up @@ -443,6 +443,12 @@ function testDiffText() {
assertEquals('jumps over the lazy', dmp.diff_text1(diffs));

assertEquals('jumped over a lazy', dmp.diff_text2(diffs));

// diff_text2: Objective-C (null) bug
assertEquals(
'\ud83d\ude42\ud83d\ude41',
dmp.diff_text2(dmp.diff_fromDelta('\ud83d\ude42\ud83d\ude41', '=3\t+(null)\t=1'))
);
}

function testDiffDelta() {
Expand Down Expand Up @@ -605,15 +611,6 @@ function testDiffDelta() {
assertEquals('Swap surrogate pair', 'crashed');
}

try {
assertEquivalent(
dmp.diff_fromDelta('', '+%ED%A0%BC%28null%29%ED%B5%B0'),
[[DIFF_INSERT, '\ud83c\udd70']]
);
} catch ( e ) {
assertEquals('Invalid diff from objective-c with (null) string' );
}

// Empty diff groups
assertEquivalent(
dmp.diff_toDelta([[DIFF_EQUAL, 'abcdef'], [DIFF_DELETE, ''], [DIFF_INSERT, 'ghijk']]),
Expand Down
62 changes: 43 additions & 19 deletions objectivec/DiffMatchPatch.m
Original file line number Diff line number Diff line change
Expand Up @@ -1493,23 +1493,7 @@ - (NSString *)diff_decodeURIWithText:(NSString *)percentEncoded
return nil;
}

// some objective-c versions of the library produced patches with
// (null) in the place where surrogates were split across diff
// boundaries. if we leave those in we'll be stuck with a
// high-surrogate (null) low-surrogate pattern that will break
// deeper in the library or consuming application. we'll "fix"
// these by dropping the (null) and re-joining the surrogate halves
NSString *result = [NSString stringWithCharacters:decoded length:output];
NSRegularExpression *replacer = [NSRegularExpression
regularExpressionWithPattern:@"([\\x{D800}-\\x{DBFF}])\\(null\\)([\\x{DC00}-\\x{DFFF}])"
options:0
error:nil];

return [replacer
stringByReplacingMatchesInString:result
options:0
range:NSMakeRange(0, [result length])
withTemplate:@"$1$2"];
return [NSString stringWithCharacters:decoded length:output];
}

/**
Expand All @@ -1526,10 +1510,12 @@ - (NSMutableArray *)diff_fromDeltaWithText:(NSString *)text1
{
NSMutableArray *diffs = [NSMutableArray array];
NSUInteger thisPointer = 0; // Cursor in text1
NSArray *tokens = [delta componentsSeparatedByString:@"\t"];
NSMutableArray *tokens = [NSMutableArray arrayWithArray:([delta componentsSeparatedByString:@"\t"])];
NSInteger n;
NSDictionary *errorDetail = nil;
for (NSString *token in tokens) {
NSInteger tokenCount = [tokens count];
for (NSInteger x = 0; x < tokenCount; x++) {
NSString *token = tokens[x];
if (token.length == 0) {
// Blank tokens are ok (from a trailing \t).
continue;
Expand Down Expand Up @@ -1572,6 +1558,44 @@ - (NSMutableArray *)diff_fromDeltaWithText:(NSString *)text1
}
return nil;
}

// some objective-c versions of the library produced patches with
// (null) in the place where surrogates were split across diff
// boundaries. if we leave those in we'll be stuck with a
// high-surrogate (null) low-surrogate pattern that will break
// deeper in the library or consuming application. we'll "fix"
// these by dropping the (null) and re-joining the surrogate halves
if (x + 2 < tokenCount &&
CFStringIsSurrogateHighCharacter([text1 characterAtIndex:(thisPointer + n - 1)]) &&
[@"(null)" isEqualToString:([tokens[x + 1] substringFromIndex:1])] &&
CFStringIsSurrogateLowCharacter([text1 characterAtIndex:(thisPointer + n)])
) {
n -= 1;
tokens[x + 1] = @"+";

NSInteger m = [[tokens[x + 2] substringFromIndex:1] integerValue];

if (m == 0) {
if (error != NULL) {
errorDetail = [NSDictionary dictionaryWithObjectsAndKeys:
[NSString stringWithFormat:NSLocalizedString(@"Invalid number in diff_fromDelta: %@", @"Error"), param],
NSLocalizedDescriptionKey, nil];
*error = [NSError errorWithDomain:@"DiffMatchPatchErrorDomain" code:100 userInfo:errorDetail];
}
return nil;
} else if (m < 0) {
if (error != NULL) {
errorDetail = [NSDictionary dictionaryWithObjectsAndKeys:
[NSString stringWithFormat:NSLocalizedString(@"Negative number in diff_fromDelta: %@", @"Error"), param],
NSLocalizedDescriptionKey, nil];
*error = [NSError errorWithDomain:@"DiffMatchPatchErrorDomain" code:101 userInfo:errorDetail];
}
return nil;
}

tokens[x + 2] = [NSString stringWithFormat:@"=%ld", m + 1];
}

NSString *text;
@try {
text = [text1 substringWithRange:NSMakeRange(thisPointer, (NSUInteger)n)];
Expand Down
6 changes: 2 additions & 4 deletions objectivec/Tests/DiffMatchPatchTest.m
Original file line number Diff line number Diff line change
Expand Up @@ -686,6 +686,8 @@ - (void)test_diff_textTest {

XCTAssertEqualObjects(@"jumped over a lazy", [dmp diff_text2:diffs], @"Compute the source and destination texts #2");

XCTAssertEqualObjects(@"🙂🙁", [dmp diff_text2:([dmp diff_fromDeltaWithText:@"🙂🙁" andDelta:@"=3\t+(null)\t=1" error:nil])]);

[dmp release];
}

Expand Down Expand Up @@ -814,10 +816,6 @@ - (void)test_diff_deltaTest {
[Diff diffWithOperation:DIFF_INSERT andText:[NSString stringWithFormat:@"%C", 0xdd71]],
nil])]);

// Invalid diff from objective-c with (null) string
XCTAssertEqualObjects([dmp diff_fromDeltaWithText:@"" andDelta:@"+%ED%A0%BC%28null%29%ED%B5%B0" error:nil],
([NSMutableArray arrayWithObjects:[Diff diffWithOperation:DIFF_INSERT andText:@"🅰"],nil]));

// Verify pool of unchanged characters.
diffs = [NSMutableArray arrayWithObject:
[Diff diffWithOperation:DIFF_INSERT andText:@"A-Z a-z 0-9 - _ . ! ~ * ' ( ) ; / ? : @ & = + $ , # "]];
Expand Down
Loading

0 comments on commit bdc4740

Please sign in to comment.