Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relation to JSON Text Sequences standard #25

Closed
letmaik opened this issue Jun 22, 2015 · 9 comments
Closed

Relation to JSON Text Sequences standard #25

letmaik opened this issue Jun 22, 2015 · 9 comments

Comments

@letmaik
Copy link

letmaik commented Jun 22, 2015

What is the relation to the JSON Text Sequences RFC? Seems that it is being picked up by some projects, e.g. GeoJSON.

@Zorgatone
Copy link

Zorgatone commented Jan 17, 2017

Didn't know about that standard, seems a lot of people are trying to define a standard for the same purpose: json objects over multiple lines.
I've also seen application/x-json-stream and application/x-jsonlines. We should stick to only one of these

@Zorgatone
Copy link

Zorgatone commented Jan 17, 2017

I haven't looked into any parser, or tried to implement any. I'm wondering, though, how you'd handle json objects/arrays/strings containing the newline character. Certainly you can't roughly read line-by line and feed each line to a normal JSON parser, that'll break

EDIT: nevermind, I think they just handle minified JSON (ie. without newlines in between), and in strings they're escaped

@Zorgatone
Copy link

There's also this one http://jsonlines.org/

@kyeotic
Copy link

kyeotic commented Apr 4, 2019

I see JSON Text Sequences as having one major advantage and one minor disadvantages.

The advantage is that you can safely stream JSON that has values containing newlines, which NDJSON cannot do. This is major because string values containing newlines are valid JSON, and fairly common on the web, so NDJSON clients need to first replace all the newlines with some pre-arranged character before sending and then replace again when parsing.

The disadvantage is that its harder for a human to read the content. This is minor because its still possible for a human to read, and also because I don't think the wire-format needs to match the storage format.

I don't see why anyone would pick NDJSON over JSON text sequences given these.

@clue
Copy link

clue commented Apr 5, 2019

This is major because string values containing newlines are valid JSON.

According to RFC 7159 control characters (which includes newlines) need to be escaped, i.e. a valid string would be "hello\nworld" whereas a literal newline character in a string would be invalid JSON.

In other words, strings containing escaped newlines are supported in JSON just like they're supported in NDJSON and JSON Text Sequences.

On top of this, JSON also allow insignificant whitespace between any structural elements (this is commonly referred to as "pretty printing"). While JSON Text Sequences allows actual newlines here as well, this is not allowed by NDJSON.

I understand where you're coming from and agree that this may not cover all possible use cases. That being said, from my personal, professional experience I would argue that this is actually much less of problem that it might appear at first. Many of the applications where a streaming format like NDJSON makes sense use JSON values without any insignificant whitespace to reduce bandwidth.

I've been working with implementations of NDJSON and JSON Text Sequences for PHP and also did a quick comparison in my blog: https://www.lueck.tv/2018/introducing-reactphp-ndjson. My main take away is that they're somewhat interchangeable for the most part. JSON Text Sequences has the benefit of being a standard, but who knows if NDJSON will catch up (#21)…

@kyeotic
Copy link

kyeotic commented Apr 5, 2019

The front page of the repo says

The JSON texts MUST NOT contain newlines or carriage returns.

which contradicts what you said.

@clue
Copy link

clue commented Apr 5, 2019

@tyrsius A "JSON text" is the serialized value, for example an object {"name":"Alice"}, but also just primitive values like 42, null and strings like "Bob". See also https://tools.ietf.org/html/rfc7159#section-2 for more details.

You're right that the NDJSON spec says that each JSON text MUST NOT contain newlines and carriage returns. However, JSON already mandates that newlines in strings MUST be escaped like the previous example ("hello\nworld"). This means that JSON can support newlines in strings just fine, they just need to be escaped. Likewise, NDJSON also supports escaped newlines in string values.

What NDJSON does not allow is insignificant whitespace containing newlines. For example, the following JSON text is valid JSON, but not valid NDJSON because it spans multiple lines:

{
    "name": "Alice"
}

To re-iterate, the following is valid NDJSON:

{"name":"Alice","comment":"hello\nworld"}
{"name":"Bob","comment":"hello\nagain"}

@kyeotic
Copy link

kyeotic commented Apr 5, 2019

I guess I'm not clear on what the distinction is. The escape sequence is how an n becomes a "newline" in any string in JavaScript/JSON. If you have a string that contains newlines those newlines are \n in the string, The fact that they might also be in a JSON VALUE can only be determined if your parse the string as JSON first.

How could an NDJSON streaming parser tell the difference between a "newline that starts a _ new line_" and a "newline in a JSON string _value"?

In other words, how could an NDJSON client tell the difference between these two?
1.

{"name":"Alice","comment":"hello\nworld"}
{"name":"Bob","comment":"hello\nagain"}
{"name":"Alice","comment":"hello
world"}
{"name":"Bob","comment":"hello
again"}

@millette
Copy link

millette commented Apr 5, 2019

@tyrsius

{"name":"Alice","comment":"hello
world"}

is invalid json and invalid ndjson. The newline must be escaped (written out as \n).

Whereas

{"name":"Alice",
"comment":"helloworld"}

is valid json, but invalid ndjson.

Hope that helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants