Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix array diffs #45

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Fix array diffs #45

wants to merge 2 commits into from

Conversation

GGabriele
Copy link

@GGabriele GGabriele commented Jan 19, 2022

As highlighted in various issues (#24, #30), the library is currently failing at computing correct and reliable diffs for arrays, and more specifically whenever the size of the "left" array is less than the "right" one:

case 1: len(left) < len(right) (bogus diffs)

$ cat before.json
{
   "array": [
      "x-posting-ip"
   ]
 }

$ cat after.json
{
   "array": [
      "accept-encoding",
      "x-forwarded-for"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
-    0: "x-posting-ip"
+    0: "accept-encoding"
+    0: "accept-encoding"
   ]
 }

case 1 (continuation): len(left) < len(right) (unreliable diffs)

$ cat before.json
{
   "array": [
      "blabla"
   ]
 }

$ cat after.json
{
   "array": [
      "accept-encoding",
      "x-forwarded-for"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
+    0: "accept-encoding"
   ]
 }

case 2: len(left) == len(right) (OK)

$ cat before.json
{
   "array": [
      "blabla",
      "blabla"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
-    0: "blabla",
+    0: "accept-encoding",
-    1: "blabla"
+    1: "x-forwarded-for"
   ]
 }

case 3: len(left) > len(right) (diffs are OK, but there is a lingering duplicate element at the bottom of the array)

$ cat before.json
{
   "array": [
      "blabla",
      "blabla",
      "blabla"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
-    0: "blabla",
+    0: "accept-encoding",
-    0: "blabla",
-    1: "blabla"
+    1: "x-forwarded-for"
     2: "blabla"                   // shouldn't be here
   ]
 }

Both issues highlighted in case 1 are the most serious ones, while the one in case 3 is mostly visually confusing.

I believe the cause of these issues are two:

  1. when you loop through the maybe matrixes here, you are missing the last element of each row/column
  2. the way the similarity and its score is calculated for strings is not reliable

What I'm proposing here is a simple fix for the (1) and to use strutil to calculate the similarity for (2).

$ cat before.json
{
   "array": [
      "x-posting-ip"
   ]
 }

$ cat after.json
{
   "array": [
      "accept-encoding",
      "x-forwarded-for"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
-    0: "x-posting-ip"
+    0: "accept-encoding"
+    1: "x-forwarded-for"
   ]
 }
$ cat before.json
{
   "array": [
      "blabla"
   ]
 }

$ go run main.go before.json after.json
 {
   "array": [
-    0: "blabla"
+    0: "accept-encoding"
+    1: "x-forwarded-for"
   ]
 }

Changes don't break current test cases:

$ go test -race ./...
# gojsondiff/jd
jd/main.go:42:4: Println arg list ends with redundant newline
# gojsondiff/jp
jp/main.go:24:4: Println arg list ends with redundant newline
ok  	gojsondiff	1.919s
ok  	gojsondiff/formatter	1.104s

The code is quite complex, so I may be missing some obvious things, so please let me know if that's the case!

I'm also introducing go.mod in the same PR, which is resulting in a giant bulk of changes. The actual code changes are very minimal and all under the gojsondiff.go file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant