Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null character (i.e. "\0") terminates string, but should actually be escaped instead #212

Open
nicholaides opened this issue Sep 11, 2024 · 6 comments

Comments

@nicholaides
Copy link

nicholaides commented Sep 11, 2024

My understanding of the JSON spec is that the null character (\0) is a perfectly cromulent character in an JSON string because JSON strings are UTF-8.

A null character in a string apparently terminates the string in jo:

% jo greeting=$'hello \0 world'
{"greeting":"hello "}

Other control control characters get escaped correctly:

% jo greeting=$'hello \1 world'
{"greeting":"hello \u0001 world"}

It's not a problem with the shell handling \0 because this works as expected:

% echo $'hello \0 world'
hello  world
@jpmens
Copy link
Owner

jpmens commented Sep 11, 2024

Which shell are you using? (I don't think it's csh)

I see:

$ echo $'hello \0 world' | od -cb
0000000    h   e   l   l   o      \n
          150 145 154 154 157 040 012
0000007

In a bash and in a sh.

Be that as it may, the json.[ch] we use doesn't handle zero bytes.

@nicholaides
Copy link
Author

Oh, I forgot-- IIRC command line arguments on Mac/Linux are C-strings, so a \0 will terminate the string anyway.

But, jo still has this problem when a shell isn't involved, though. See:

% echo $'hello \0 world' > hw.txt
% od -cb hw.txt      
0000000    h   e   l   l   o      \0       w   o   r   l   d  \n        
          150 145 154 154 157 040 000 040 167 157 162 154 144 012        
0000016
% jo [email protected]
{"greeting":"hello "}

In any case, if zero bytes aren't being handled correctly internally, then there's nothing that can be done, I guess.

It's a shame to not support UTF-8 correctly, though.

@jpmens
Copy link
Owner

jpmens commented Sep 11, 2024

I'd still like to know which shell you're using:

$ echo $'hello \0 world' > hw.txt
$ od -cb hw.txt
0000000    h   e   l   l   o      \n
          150 145 154 154 157 040 012
0000007

@nicholaides
Copy link
Author

nicholaides commented Sep 11, 2024

zsh on MacOS Sonoma

% $SHELL --version
zsh 5.9 (x86_64-apple-darwin23.0)

@gromgit
Copy link
Collaborator

gromgit commented Sep 28, 2024

zsh

That explains it. From the zshoptions man page:

POSIX_STRINGS <K> <S>
This option affects processing of quoted strings. Currently it only affects the behaviour of null characters, i.e. character 0 in the portable character set corresponding to US ASCII.

When this option is not set, null characters embedded within strings of the form $'...' are treated as ordinary characters. The entire string is maintained within the shell and output to files where necessary, although owing to restrictions of the library interface the string is truncated at the null character in file names, environment variables, or in arguments to external programs.

When this option is set, the $'...' expression is truncated at the null character. Note that remaining parts of the same string beyond the termination of the quotes are not truncated.

For example, the command line argument a$'b\0c'd is treated with the option off as the characters a, b, null, c, d, and with the option on as the characters a, b, d.

As for dealing with embedded null characters, that's a massive change, involving adding the concept of counted strings all through the current C codebase. Given that zsh's default treatment of null characters seems to be an outlier rather than the norm among command-line shells, making the huge effort to support embedded nulls will inevitably lead to a different set of issues raised here, viz. "you can't get there (embedded nulls) from here (your current shell, except zsh)".

@nicholaides
Copy link
Author

will inevitably lead to a different set of issues raised here, viz. "you can't get there (embedded nulls) from here (your current shell, except zsh)".

I get that.

I would make a few points, however:

  1. At the very least, jo should complain instead of failing silently if a file included via @ contains a null byte. I assume implementing this wouldn't require reworking all of the internals string representations.

  2. It would be nice if the following error message could specify that the error is because of a limitation of jo rather than the JSON being invalid.

% cat nb.json
{"nb": "\u0000"}

% jo fileContents=:nb.json
jo: Cannot decode JSON in file nb.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants