You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to donate to you or your favorite charity to help encourage a new feature: Unicode separated values (USV) which uses Unicode unit separator U+241F and Unicode record separator U+241E.
Unicode separated values (USV) are much like comma separated values (CSV), tab separated values (TSV) a.k.a. tab delimited format (TDF), and ASCII separated values (ASV) a.k.a. DEL (Delimited ASCII) a.k.a. ASCII 30-31.
The advantages of USV for me are that USV handles text that happens to contain commas and/or tabs and/or newlines, and also having a visual character representation.
For example USV is great for me within typical source code, such as Unix scripts, because the characters show up, and also easy to copy/paste, and also easy to use within various kinds of editor search boxes.
When data are solely for machines, then for me the choice of characters doesn't matter. When data are potentially for reading or editing, such as by a programmer, then I prefer typically-visible characters (U+241F & U+241F) over typically-invisible zero-width characters (ASCII 30 & 31).
For example I can write code samples such as:
$ echo 'a␟b␟c␞d␟e␟f␞g␟h␟i' | tr ␟␞ '\t\n'
a b c
d e f
g h i
In addition, Unicode U+241F & U+241E are semantically meaningful, and use an international standard, and are able to work well in any typical Unicode language and any typical Unicode font.
USV is akin to TSV in that the delimiter characters cannot not appear in the content.
I would like to donate to you or your favorite charity to help encourage a new feature: Unicode separated values (USV) which uses Unicode unit separator U+241F and Unicode record separator U+241E.
Unicode separated values (USV) are much like comma separated values (CSV), tab separated values (TSV) a.k.a. tab delimited format (TDF), and ASCII separated values (ASV) a.k.a. DEL (Delimited ASCII) a.k.a. ASCII 30-31.
The advantages of USV for me are that USV handles text that happens to contain commas and/or tabs and/or newlines, and also having a visual character representation.
For example USV is great for me within typical source code, such as Unix scripts, because the characters show up, and also easy to copy/paste, and also easy to use within various kinds of editor search boxes.
When data are solely for machines, then for me the choice of characters doesn't matter. When data are potentially for reading or editing, such as by a programmer, then I prefer typically-visible characters (U+241F & U+241F) over typically-invisible zero-width characters (ASCII 30 & 31).
For example I can write code samples such as:
In addition, Unicode U+241F & U+241E are semantically meaningful, and use an international standard, and are able to work well in any typical Unicode language and any typical Unicode font.
USV is akin to TSV in that the delimiter characters cannot not appear in the content.
For comparison I am using the TSV standard by IANA here:
https://www.iana.org/assignments/media-types/text/tab-separated-values
I'm offering similar donations to similar projects. If you know of ones that could be interested, I'm happy to connect with them.
Thank you for your consideration.
The text was updated successfully, but these errors were encountered: