Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[brew completion] -<something> breaks on some commands #515

Open
devidw opened this issue Oct 6, 2024 · 41 comments
Open

[brew completion] -<something> breaks on some commands #515

devidw opened this issue Oct 6, 2024 · 41 comments
Labels
compatibility External Problem/Bug Problems/Bugs of other projects

Comments

@devidw
Copy link

devidw commented Oct 6, 2024

ble version: 0.4.0-nightly+32f290d
Bash version: 5.2.37(1)-release (aarch64-apple-darwin23.4.0)

On some commands I noticed that typing -- will break the current input and leak other output.

eg

brew list --

As soon as I've typed the -- this happens:

$ brew list -/usr/bin/awk: towc: multibyte conversion failure on: 's API for new formulae or cask data every'
-- INSERT --
 input record number 4, file
 source line number 29
--/usr/bin/awk: towc: multibyte conversion failure on: 's API for new formulae or cask data every'

 input record number 4, file
 source line number 29
/usr/bin/awk: towc: multibyte conversion failure on: 's API for new formulae or cask data every'

 input record number 4, file
 source line number 29
/usr/bin/awk: towc: multibyte conversion failure on: 's API for new formulae or cask data every'

 input record number 4, file
 source line number 29
/usr/bin/awk: towc: multibyte conversion failure on: 's API for new formulae or cask data every'
@akinomyoga
Copy link
Owner

akinomyoga commented Oct 6, 2024

Thanks for the report. This type of error message is specific to macOS awk, but I don't have macOS so cannot test it. MacOS awk produces error messages when it sees data that is not compatible with the current encoding. As far as I search GitHub for the data in the error message, it seems to come from the man page of brew.

  • Q1: What are the results of the following commands in your environment?
$ man brew | grep 'API for new formulae' | cat -v
$ man brew | grep 'API for new formulae' | /usr/bin/awk '{print NF;}'

@devidw
Copy link
Author

devidw commented Oct 8, 2024

Thanks for the prompt follow-up @akinomyoga much appreciated. Also really appreciate the great work on this awesome project. ✨

As for your question, indeed I see some weird encoding in the first command output:

$ man brew | grep 'API for new formulae' | cat -v
Check HomebrewM-^@M-^Ys API for new formulae or cask data every

$ man brew | grep 'API for new formulae' | /usr/bin/awk '{print NF;}'
10

@akinomyoga
Copy link
Owner

akinomyoga commented Oct 8, 2024

Thanks for the result.

indeed I see some weird encoding in the first command output:

The weird encoding you see is actually what is intended by cat -v.

$ man brew | grep 'API for new formulae' | cat -v
Check HomebrewM-^@M-^Ys API for new formulae or cask data every

I actually expect M-bM-^@M-^Y if the code \u2019 is properly encoded, but the beginning byte (which is supposed to be converted to M-b by cat -v) seems to be missing in your output. It might be a characteristic of macOS cat, so I'd like to confirm the output of the following command:

  • Q2: What is the result of the following command in macOS?
$ echo $'\u2019' | cat -v
$ man brew | grep 'API for new formulae' | /usr/bin/awk '{print NF;}'
10

I thought macOS /usr/bin/awk would have a problem with processing the line containing "s API for new formulae or cask data every", but the above result seems to suggest that it actually works. Maybe it is related to the locale.

  • Q3: What are the results of the following commands?
$ ble/widget/display-shell-version
$ locale
$ man brew | grep 'API for new formulae' | LC_ALL=C /usr/bin/awk '{print NF;}'

@akinomyoga akinomyoga added the External Problem/Bug Problems/Bugs of other projects label Oct 19, 2024
@akinomyoga akinomyoga changed the title -- breaks on some commands [brew] -- breaks on some commands Oct 19, 2024
@akinomyoga akinomyoga changed the title [brew] -- breaks on some commands [brew completion] -- breaks on some commands Oct 19, 2024
@devidw
Copy link
Author

devidw commented Oct 19, 2024

Thank you

Q2

$ echo $'\u2019' | cat -v
M-^@M-^Y

Q3

$ ble/widget/display-shell-version
-bash: `ble/bin/diff': not a valid identifier
-bash: ble/bin/diff: No such file or directory
GNU bash, version 5.2.37(1)-release (aarch64-apple-darwin23.4.0)
ble.sh, version 0.4.0-nightly+31f264a (noarch) [git 2.46.2, GNU Make 4.3, GNU Awk 5.1.0, API: 3.0 (GNU MPFR 4.1.0, GNU MP 6.2.1)]
locale: LANG=en_US.UTF-8
terminal: TERM=tmux-256color wcwidth=15.1-west/16.0-2+ri, tmux:0 (84;0;0)
options:
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
$ man brew | grep 'API for new formulae' | LC_ALL=C /usr/bin/awk '{print NF;}'
 

this one does not seem to produce any output, its not exiting either

@akinomyoga
Copy link
Owner

akinomyoga commented Oct 19, 2024

$ man brew | grep 'API for new formulae' | LC_ALL=C /usr/bin/awk '{print NF;}'
 

this one does not seem to produce any output, its not exiting either

Thank you. Hmm, this is interesting.

The above result seems to contain an empty line. Is that a part of the output of the command? (I.e., does the command output an empty line? or does it output nothing?)

Anyway, as far as the output of man brew | grep 'API for new formulae' is not changed since the last time, this means that /usr/bin/awk doesn't function as expected when the locale is forced to C by LC_ALL=C.

  • Q4: Could you provide the results of the following commands?
$ echo $'\u2019' | od -t x1
$ man brew | grep 'API for new formulae' | od -t x1
$ echo $'\u2019' | /usr/bin/awk '{print NF;}'
$ echo $'\u2019' | LC_ALL=C /usr/bin/awk '{print NF;}'

$ ble/widget/display-shell-version
-bash: `ble/bin/diff': not a valid identifier
-bash: ble/bin/diff: No such file or directory

This suggests another problem with the test for the existence of the command diff.

  • Q5: Could you also check the output of the following command?
$ type -a diff
$ (set -x; ble/bin#freeze-utility-path diff; echo "exit status: $?"; type -a ble/bin/diff)

@devidw
Copy link
Author

devidw commented Oct 19, 2024

Ah, actually I recently set nvim to be my man pager, I guess this is the reason piping the man page does not produce output. Resetting it to less does produce output:

$ MANPAGER=less
$ man brew | grep 'API for new formulae' | LC_ALL=C /usr/bin/awk '{print NF;}'
10

Q4

$ echo $'\u2019' | od -t x1
0000000    e2  80  99  0a
0000004

$ MANPAGER=less
$ man brew | grep 'API for new formulae' | od -t x1
0000000    20  20  20  20  20  20  20  20  20  20  20  20  20  20  43  68
0000020    65  63  6b  20  48  6f  6d  65  62  72  65  77  e2  80  99  73
0000040    20  41  50  49  20  66  6f  72  20  6e  65  77  20  66  6f  72
0000060    6d  75  6c  61  65  20  6f  72  20  63  61  73  6b  20  64  61
0000100    74  61  20  65  76  65  72  79  0a
0000111

$ echo $'\u2019' | /usr/bin/awk '{print NF;}'
1


$ echo $'\u2019' | LC_ALL=C /usr/bin/awk '{print NF;}'
1

Q5

$ type -a diff
diff is /usr/bin/diff
diff is /opt/homebrew/bin/diff

$  (set -x; ble/bin#freeze-utility-path diff; echo "exit status: $?"; type -a ble/bin/diff)
>>>> ble/bin#freeze-utility-path diff
>>>> local cmd path 'q='\''' 'Q='\''\'\'''\''' fail= flags=
>>>> for cmd in "$@"
>>>> [[ diff == -n ]]
>>>> [[ '' == *n* ]]
>>>> ble/bin#has ble/bin/.frozen:diff
>>>> builtin type -t -- ble/bin/.frozen:diff
>>>> ble/bin#get-path diff
>>>> local cmd=diff
>>>> ble/util/assign path 'builtin type -P -- "$cmd" 2>/dev/null'
>>>> local _ble_local_tmpfile
>>>> ble/util/assign/mktmp
>>>> _ble_local_tmpfile=/tmp/blesh/502/82534.util.assign.tmp.0
>>>> (( BASH_SUBSHELL ))
>>>> _ble_local_tmpfile=/tmp/blesh/502/82534.util.assign.tmp.0.83911
>>>> builtin eval -- 'builtin type -P -- "$cmd" 2>/dev/null'
>>>>> builtin type -P -- diff
>>>> local _ble_local_ret=0 _ble_local_arr=
>>>> mapfile -t _ble_local_arr
>>>> ble/util/assign/rmtmp
>>>> (( _ble_util_assign_level-- ))
>>>> (( BASH_SUBSHELL ))
>>>> printf 'caller %s\n' ble/util/assign/rmtmp ble/util/assign ble/bin#get-path ble/bin#freeze-utility-path
>>>> IFS='
'
>>>> builtin eval 'path="${_ble_local_arr[*]}"'
>>>>> path=/usr/bin/diff
>>>> return 0
>>>> [[ -n /usr/bin/diff ]]
>>>> [[ /usr/bin/diff == ./* ]]
>>>> [[ /usr/bin/diff == ../* ]]
>>>> builtin eval 'function ble/bin/diff { '\''/usr/bin/diff'\'' "$@"; }'
-bash: `ble/bin/diff': not a valid identifier
>>>> (( !fail ))
>>>> echo 'exit status: 0'
exit status: 0
>>>> type -a ble/bin/diff
-bash: type: ble/bin/diff: not found

last one did exit with exit code 1


If it is of any value, here is another example of the behavior:

$ lsof -i --/usr/bin/awk: towc: multibyte conversion failure on: '   AIX:'
-- INSERT --
 input record number 4, file
 source line number 29
/usr/bin/awk: towc: multibyte conversion failure on: '   AIX:'

 input record number 4, file
 source line number 29
/usr/bin/awk: towc: multibyte conversion failure on: '   AIX:'

 input record number 4, file
 source line number 29
/usr/bin/awk: towc: multibyte conversion failure on: '   AIX:'

 input record number 4, file
 source line number 29
--

Okay, seems like it's not necessarly breaking on --, but on - and then on the next character the output is slipping through, on any subsequent input the same thing is happening.

In this video you can see my typing lsof - and as soon as I type a the output is produced and then as soon as I type a again the behavior is repeated as well.

sample.mov
$ type -a lsof
lsof is /usr/sbin/lsof

So it might be something that is not limited to brew.

@devidw devidw changed the title [brew completion] -- breaks on some commands [brew completion] -<something> breaks on some commands Oct 19, 2024
@akinomyoga
Copy link
Owner

akinomyoga commented Oct 20, 2024

Thank you for the answers and additional information.

Ah, actually I recently set nvim to be my man pager, I guess this is the reason piping the man page does not produce output. Resetting it to less does produce output:

OK, the new result seems consistent with the previous one, yet the original error message is not reproduced. Maybe there are other conditions for the error message to be output.

$ echo $'\u2019' | od -t x1
0000000    e2  80  99  0a

I suspected the possibility of an unexpected behavior of $'\u2019', but it seems OK according to the above result.

$ man brew | grep 'API for new formulae' | od -t x1
0000000    20  20  20  20  20  20  20  20  20  20  20  20  20  20  43  68
0000020    65  63  6b  20  48  6f  6d  65  62  72  65  77  e2  80  99  73
0000040    20  41  50  49  20  66  6f  72  20  6e  65  77  20  66  6f  72
0000060    6d  75  6c  61  65  20  6f  72  20  63  61  73  6b  20  64  61
0000100    74  61  20  65  76  65  72  79  0a

Seems fine. The only non-7-bit characters are "80 99 73" (which is the UTF-8 representation of \u2019). Anyway, /usr/bin/awk does not produce the error message with this data, so this may not contain a hint.

$ echo $'\u2019' | /usr/bin/awk '{print NF;}'
1
$ echo $'\u2019' | LC_ALL=C /usr/bin/awk '{print NF;}'
1

These are also consistent with the previous results.

Okay, seems like it's not necessarly breaking on --, but on - and then on the next character the output is slipping through, on any subsequent input the same thing is happening.

I suspected that the man page of brew contains a misencoded character or some characters that /usr/bin/awk cannot handle, but yeah, other man pages can also contain similar data on which /usr/bin/awk would produce the error message.

In this video you can see my typing lsof - and as soon as I type a the output is produced and then as soon as I type a again the behavior is repeated as well.

Thank you for the additional explanation. I'm sorry that I haven't explained the details to you, but I knew that the reported error is related to the extraction of the option description from the man page in the auto-complete feature of ble.sh, and that repeated error is normal when the extraction is broken. The auto-complete feature attempts to generate the candidate words based on the programmable completion setting on every character insertion, so the error message is repeated. When the current word starts with -, ble.sh tries to extract the options and descriptions from the man page of the current command, so the error message appears with -<something>. ble.sh uses awk to process the man page content. The error message is output in this process and seems to be related to the specific content of the man page of brew, lsof, and others.

I'd like to find out anything common in the affected parts of the man pages in brew and lsof.

  • Q6: What is the result of the following command?
$ MANPAGER=less
$ man lsof | grep '   AIX:' | cat -v

>>>> builtin eval 'function ble/bin/diff { '\''/usr/bin/diff'\'' "$@"; }'
-bash: `ble/bin/diff': not a valid identifier

Hmm, OK. Do you set set -o posix? If so, I can add a workaround for this issue.

  • Q5a: What is the result of the following command?
$ declare -p POSIXLY_CORRECT

@akinomyoga
Copy link
Owner

I think I need to ask you to run a patched version of ble.sh to record some internal data. I'll prepare it and attach .zip later.

@devidw
Copy link
Author

devidw commented Oct 20, 2024

Appreciate the explantations very much. 🙏

I think I need to ask you to run a patched version of ble.sh to record some internal data. I'll prepare it and attach .zip later.

Sounds good, thank you.

Q6

$ MANPAGER=less
$ man lsof | grep '   AIX:' | cat -v
# nothing
$ man lsof | grep '   AIX:' | od -t x1
# nothing
$ man lsof | grep 'AIX:' | cat -v
           AIX:
$ man lsof | grep 'AIX:' | od -t x1
0000000    20  20  20  20  20  20  20  c2  a0  c2  a0  c2  a0  c2  a0  41
0000020    49  58  3a  0a
0000024

Q5a

Yes, set -o posix is part of my bashrc.

$ declare -p POSIXLY_CORRECT
declare -- POSIXLY_CORRECT="y"

@akinomyoga
Copy link
Owner

  • Q7: Can you try core-complete.tar.gz?
    1. Download the tar archive from the above link.
    2. Extract the file core-complete.sh by running tar xzf core-complete.tar.gz.
    3. Overwrite lib/core-complete.sh in the directory where ble.sh is installed with the extracted file. You can run mv -f core-complete.sh "$_ble_base/lib/core-complete.sh" in a ble.sh session.
    4. Start a new interactive Bash session. For example, you can run bash or create another tmux pane. Or you can open a new terminal window.
    5. Input lsof - and press TAB to produce an error message.
    6. If some data that cannot be handled by /usr/bin/awk is detected, the data will be saved in ~/blesh-debug-mandb-lsof.tar.gz. Could you attach the file ~/blesh-debug-mandb-lsof.tar.gz if it is created?

@devidw
Copy link
Author

devidw commented Oct 20, 2024

I actually seem to hit this case:

Error not found in the intermediate data of the man page analysis.

Screenshot 2024-10-19 at 7 44 27 PM

@akinomyoga
Copy link
Owner

OK, thank you! I'm now preparing the next version to check another place.

@devidw
Copy link
Author

devidw commented Oct 20, 2024

Cool, much appreciated!

@akinomyoga
Copy link
Owner

  • Q7a: Could you try this: core-complete-v2.tar.gz? The test procedure is the same as Q7, but the created file will be ~/blesh-debug-mandb-lsof-v2.tar.gz.

@devidw
Copy link
Author

devidw commented Oct 20, 2024

Hm I can't get it to produce the debug file for some reason.

Screen.Recording.2024-10-19.at.8.02.14.PM.mov

@akinomyoga
Copy link
Owner

Hm I can't get it to produce the debug file for some reason.

Did you instead see the message debug (check2): Error not found.?

@devidw
Copy link
Author

devidw commented Oct 20, 2024

Just did a search on the output against this, but could not find any match. Including the full output here for reference:

$ lsof -aa/usr/bin/awk: towc: multibyte conversion failure on: '   AIX:'
-- INSERT --
 input record number 4, file
 source line number 35
Error not found in the intermediate data of the man page analysis.

-aaa/usr/bin/awk: towc: multibyte conversion failure on: '   AIX:'

 input record number 4, file
 source line number 35
Error not found in the intermediate data of the man page analysis.

/usr/bin/awk: towc: multibyte conversion failure on: '   AIX:'

 input record number 4, file
 source line number 35
Error not found in the intermediate data of the man page analysis.

/usr/bin/awk: towc: multibyte conversion failure on: '   AIX:'

 input record number 4, file
 source line number 35
Error not found in the intermediate data of the man page analysis.

-aaaa/usr/bin/awk: towc: multibyte conversion failure on: '   AIX:'

 input record number 4, file
 source line number 35
Error not found in the intermediate data of the man page analysis.

/usr/bin/awk: towc: multibyte conversion failure on: '   AIX:'

 input record number 4, file
 source line number 35
Error not found in the intermediate data of the man page analysis.

/usr/bin/awk: towc: multibyte conversion failure on: '   AIX:'

 input record number 4, file
 source line number 35
Error not found in the intermediate data of the man page analysis.

/usr/bin/awk: towc: multibyte conversion failure on: '   AIX:'

 input record number 4, file
 source line number 35
Error not found in the intermediate data of the man page analysis.

-aaaa

Also just checked the patch again to confirm I'm running the latest patch:

$ grep "check2" $_ble_base/lib/core-complete.sh
function __blesh_debug_mandb_check2__ {
    echo 'debug (check2): Error not found.' >/dev/tty
    __blesh_debug_mandb_check2__ "${subcaches[@]}"

@akinomyoga
Copy link
Owner

akinomyoga commented Oct 20, 2024

Thank you.

  • Q7b: Could you try this? core-complete-v3.tar.gz. This time a new file will not be created. I'd like to narrow down the possible places where the error is produced. Could you copy the output in the terminal and paste it here?

@devidw
Copy link
Author

devidw commented Oct 20, 2024

Sounds good, ty.

$ lsof -able/complete/mandb/generate-cache: step1 (caller ble/complete/mandb/load-cache ble/complete/source:option/generate-for-command ble/complete/source:option ble/complete/source:argument/generate ble/complete/source:argument ble/complete/candidates/generate-with-filter ble/complete/candidates/generate ble/complete/auto-complete/source:syntax ble/complete/auto-complete.impl ble/complete/auto-complete.idle ble/util/idle.do/.call-task ble/util/idle.do ble-edit/bind/.tail ble-decode/EPILOGUE _ble_decode_hook)
ble/complete/mandb/generate-cache: step3a (caller ble/complete/mandb/load-cache ble/complete/source:option/generate-for-command ble/complete/source:option ble/complete/source:argument/generate ble/complete/source:argument ble/complete/candidates/generate-with-filter ble/complete/candidates/generate ble/complete/auto-complete/source:syntax ble/complete/auto-complete.impl ble/complete/auto-complete.idle ble/util/idle.do/.call-task ble/util/idle.do ble-edit/bind/.tail ble-decode/EPILOGUE _ble_decode_hook)
awk: ble/function#advice/before:ble/bin/awk ble/function#try ble/function#advice/.proc ble/bin/awk ble/complete/mandb/.generate-cache-from-man ble/complete/mandb/generate-cache ble/complete/mandb/load-cache ble/complete/source:option/generate-for-command ble/complete/source:option ble/complete/source:argument/generate ble/complete/source:argument ble/complete/candidates/generate-with-filter ble/complete/candidates/generate ble/complete/auto-complete/source:syntax ble/complete/auto-complete.impl ble/complete/auto-complete.idle ble/util/idle.do/.call-task ble/util/idle.do ble-edit/bind/.tail ble-decode/EPILOGUE _ble_decode_hook / 1 5 12 1 16 52 3 9 19 35 13 21 18 15 16 20 1 38 3 10 70 1
awk: ble/function#advice/before:ble/bin/awk ble/function#try ble/function#advice/.proc ble/bin/awk ble/complete/mandb/.generate-cache-from-man ble/complete/mandb/generate-cache ble/complete/mandb/load-cache ble/complete/source:option/generate-for-command ble/complete/source:option ble/complete/source:argument/generate ble/complete/source:argument ble/complete/candidates/generate-with-filter ble/complete/candidates/generate ble/complete/auto-complete/source:syntax ble/complete/auto-complete.impl ble/complete/auto-complete.idle ble/util/idle.do/.call-task ble/util/idle.do ble-edit/bind/.tail ble-decode/EPILOGUE _ble_decode_hook / 1 5 12 1 341 52 3 9 19 35 13 21 18 15 16 20 1 38 3 10 70 1
/usr/bin/awk: towc: multibyte conversion failure on: '   AIX:'

 input record number 4, file
 source line number 35
Error not found in the intermediate data of the man page analysis.

ble/complete/mandb/generate-cache: step3b (caller ble/complete/mandb/load-cache ble/complete/source:option/generate-for-command ble/complete/source:option ble/complete/source:argument/generate ble/complete/source:argument ble/complete/candidates/generate-with-filter ble/complete/candidates/generate ble/complete/auto-complete/source:syntax ble/complete/auto-complete.impl ble/complete/auto-complete.idle ble/util/idle.do/.call-task ble/util/idle.do ble-edit/bind/.tail ble-decode/EPILOGUE _ble_decode_hook)
ble/complete/mandb/generate-cache: step4 (caller ble/complete/mandb/load-cache ble/complete/source:option/generate-for-command ble/complete/source:option ble/complete/source:argument/generate ble/complete/source:argument ble/complete/candidates/generate-with-filter ble/complete/candidates/generate ble/complete/auto-complete/source:syntax ble/complete/auto-complete.impl ble/complete/auto-complete.idle ble/util/idle.do/.call-task ble/util/idle.do ble-edit/bind/.tail ble-decode/EPILOGUE _ble_decode_hook)
-a

@akinomyoga
Copy link
Owner

Thank you for your quick responses. So, the error seems to be produced inside ble/complete/mandb/.generate-cache-from-man for sure.

  • Q7c: Could you try this: core-complete-v4.tar.gz? This time, the file ~/blesh-debug-mandb-lsof.tar.gz will be unconditionally created.

@devidw
Copy link
Author

devidw commented Oct 20, 2024

Nice, here you go:

blesh-debug-mandb-lsof.tar.gz

@akinomyoga
Copy link
Owner

  • Q8: What is the result of the following command?
$ /usr/bin/awk -W version || /usr/bin/awk --version

@devidw
Copy link
Author

devidw commented Oct 20, 2024

$ /usr/bin/awk -W version
/usr/bin/awk: unknown option -W ignored

$ /usr/bin/awk --version
awk version 20200816

@akinomyoga
Copy link
Owner

  • Q8a: How about this?
$ path=/usr/bin/awk
$ ble/util/assign version '"$path" -W version || "$path" --version' 2>/dev/null </dev/null
$ declare -p version

@devidw
Copy link
Author

devidw commented Oct 20, 2024

$ path=/usr/bin/awk
$ ble/util/assign version '"$path" -W version || "$path" --version' 2>/dev/null </dev/null
$ declare -p version
declare -- version=""

@akinomyoga
Copy link
Owner

Strange...

  • Q8b: How about these?
$ { /usr/bin/awk -W version || /usr/bin/awk --version; } 2>/dev/null </dev/null
$ /usr/bin/awk --version 2>/dev/null
$ /usr/bin/awk --version >/dev/null

@devidw
Copy link
Author

devidw commented Oct 20, 2024

$ { /usr/bin/awk -W version || /usr/bin/awk --version; } 2>/dev/null </dev/null
# nothing
$ /usr/bin/awk --version 2>/dev/null
awk version 20200816
$ /usr/bin/awk --version >/dev/null
# nothing

@akinomyoga
Copy link
Owner

akinomyoga commented Oct 20, 2024

Thank you for the results. Ah, OK. /usr/bin/awk -W version </dev/null seems to succeed even though it produces an error?

  • Q8c: What are the results of the following commands?
$ /usr/bin/awk -W version < /dev/null; echo "$?"
$ /usr/bin/awk --version 2>/dev/null </dev/null

  • Q7d: Could you try this: core-complete-v5.tar.gz? The test procedure is the same as Q7. (edit: I want the new blesh-debug-mandb-lsof.tar.gz.)

@devidw
Copy link
Author

devidw commented Oct 20, 2024

A8c

/usr/bin/awk -W version </dev/null seems to succeed even though it produces an error?

Exactly, kind of weird, its printing this error message, that seems to be more of a warning, as its not exiting with a non-0 exit code.

$ /usr/bin/awk -W version < /dev/null; echo "$?"
/usr/bin/awk: unknown option -W ignored

0

$ /usr/bin/awk --version 2>/dev/null </dev/null
awk version 20200816

A7d

blesh-debug-mandb-lsof.tar.gz

@akinomyoga
Copy link
Owner

/usr/bin/awk -W version </dev/null seems to succeed even though it produces an error?

Exactly, kind of weird, its printing this error message, that seems to be more of a warning, as its not exiting with a non-0 exit code.

Thanks! I'll add a fix for this. However, this is probably an issue independent of the original error message (though I initially suspected this).


blesh-debug-mandb-lsof.tar.gz

Thank you for this new version. Hmm, the error is not recorded somehow. I have a question.

  • Q7d': With the above version of core-complete.sh (the one in Q7d), do you see the error message towc: multibyte conversion failure on: '...'?

@devidw
Copy link
Author

devidw commented Oct 20, 2024

Sounds good.

Yea, let me share the full output:

$ lsof -aable/complete/mandb/generate-cache: step3a (caller ble/complete/mandb/load-cache ble/complete/source:option/generate-for-command ble/complete/source:option ble/complete/source:argument/generate ble/complete/source:argument ble/complete/candidates/generate-with-filter ble/complete/candidates/generate ble/complete/auto-complete/source:syntax ble/complete/auto-complete.impl ble/complete/auto-complete.idle ble/util/idle.do/.call-task ble/util/idle.do ble-edit/bind/.tail ble-decode/EPILOGUE _ble_decode_hook)
/usr/bin/awk: towc: multibyte conversion failure on: '   AIX:'

 input record number 4, file
 source line number 35
ble/complete/mandb/generate-cache: step3b (caller ble/complete/mandb/load-cache ble/complete/source:option/generate-for-command ble/complete/source:option ble/complete/source:argument/generate ble/complete/source:argument ble/complete/candidates/generate-with-filter ble/complete/candidates/generate ble/complete/auto-complete/source:syntax ble/complete/auto-complete.impl ble/complete/auto-complete.idle ble/util/idle.do/.call-task ble/util/idle.do ble-edit/bind/.tail ble-decode/EPILOGUE _ble_decode_hook)

@akinomyoga
Copy link
Owner

akinomyoga commented Oct 20, 2024

  • Q9: What are the results of the following commands?
$ echo '\ \ \ \ AIX:' | ble/complete/mandb/.preconv
$ echo '\ \ \ \ AIX:' | ble/complete/mandb/.preconv | groff -T utf8 -m man | od -t x1
$ echo '\ \ \ \ AIX:' | groff -T utf8 -m man | od -t x1
$ echo '\ \ \ \ AIX:' | groff -T utf8 -m man | /usr/bin/awk '{print NF;}'
$ printf '\x20\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0' | /usr/bin/awk '{print NF;}'

edit: Also, this:

$ type ble/complete/mandb/convert-mandoc
$ echo '\ \ \ \ AIX:' | ble/complete/mandb/convert-mandoc | od -t x1
$ printf '\x20\xc2\xa0\xc2\xa0' | /usr/bin/awk '{line = $0; gsub(/[\x00-\x1F]/, "", line); print line}' | od -t x1

@akinomyoga
Copy link
Owner

  • Q7e: Can you try this: core-complete-v6.tar.gz? I'm interested in the "source line number" in the error message from awk. This time, the file will not be created.

@devidw
Copy link
Author

devidw commented Oct 20, 2024

$ echo '\ \ \ \ AIX:' | ble/complete/mandb/.preconv
# Exit Code: 0
# stdout:
.lf 1 - \ \ \ \ AIX:
# stderr:
$ echo '\ \ \ \ AIX:' | ble/complete/mandb/.preconv | groff -T utf8 -m man | od -t x1
# Exit Code: 0
# stdout:
0000000 20 20 20 20 41 49 58 3a 0a 0a 0a 0a 0a 0a 0a 0a 0000020 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 1u�[?25h�[?2026l 1u�[?25h�[?2026l�[?2026h�[?25l�[?2004h�[ 36836.debug.mandb-lsof.txt Applications Brewfile.lock.json Desktop Documents Downloads Dropbox Library Local Sites Movies Music Pictures PlayOnMac's virtual drives Postman Public a a.log a.py a.sh a.ts abc ai_overlay_tmp arm-target athame bentoml bin blesh-debug-mandb-brew.tar.gz blesh-debug-mandb-lsof.tar.gz captain-hardcore dead.letter deno.lock env.sh fonts go iCloud Drive (Archive) init.lua intelephense mbox miniforge3 nltk_data node_modules opt out.txt package-lock.json pnpm pnpm-lock.yaml test_out.afm test_out.ttf tmux-client-54992.log tmux-client-90135.log 0000100 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0000112
# stderr:
$ echo '\ \ \ \ AIX:' | groff -T utf8 -m man | od -t x1
# Exit Code: 0
# stdout:
0000000 20 20 20 20 41 49 58 3a 0a 0a 0a 0a 0a 0a 0a 0a 0000020 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 1u�[?25h�[?2026l 1u�[?25h�[?2026l�[?2026h�[?25l�[?2004h�[ 36836.debug.mandb-lsof.txt Applications Brewfile.lock.json Desktop Documents Downloads Dropbox Library Local Sites Movies Music Pictures PlayOnMac's virtual drives Postman Public a a.log a.py a.sh a.ts abc ai_overlay_tmp arm-target athame bentoml bin blesh-debug-mandb-brew.tar.gz blesh-debug-mandb-lsof.tar.gz captain-hardcore dead.letter deno.lock env.sh fonts go iCloud Drive (Archive) init.lua intelephense mbox miniforge3 nltk_data node_modules opt out.txt package-lock.json pnpm pnpm-lock.yaml test_out.afm test_out.ttf tmux-client-54992.log tmux-client-90135.log 0000100 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0000112
# stderr:
$ echo '\ \ \ \ AIX:' | groff -T utf8 -m man | /usr/bin/awk '{print NF;}'
# Exit Code: 0
# stdout:
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# stderr:
$ printf '\x20\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0' | /usr/bin/awk '{print NF;}'
# Exit Code: 0
# stdout:
1
# stderr:
$ type ble/complete/mandb/convert-mandoc
# Exit Code: 0
# stdout:
ble/complete/mandb/convert-mandoc is a function ble/complete/mandb/convert-mandoc () { if [[ $_ble_util_locale_encoding == UTF-8 ]]; then ble/bin/groff -k -Tutf8 -man; else ble/bin/groff -Tascii -man; fi }
# stderr:
$ echo '\ \ \ \ AIX:' | ble/complete/mandb/convert-mandoc | od -t x1
# Exit Code: 0
# stdout:
0000000 20 20 20 20 41 49 58 3a 0a 0a 0a 0a 0a 0a 0a 0a 0000020 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 1u�[?25h�[?2026l 1u�[?25h�[?2026l�[?2026h�[?25l�[?2004h�[ 36836.debug.mandb-lsof.txt Applications Brewfile.lock.json Desktop Documents Downloads Dropbox Library Local Sites Movies Music Pictures PlayOnMac's virtual drives Postman Public a a.log a.py a.sh a.ts abc ai_overlay_tmp arm-target athame bentoml bin blesh-debug-mandb-brew.tar.gz blesh-debug-mandb-lsof.tar.gz captain-hardcore dead.letter deno.lock env.sh fonts go iCloud Drive (Archive) init.lua intelephense mbox miniforge3 nltk_data node_modules opt out.txt package-lock.json pnpm pnpm-lock.yaml test_out.afm test_out.ttf tmux-client-54992.log tmux-client-90135.log 0000100 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0000112
# stderr:
$ printf '\x20\xc2\xa0\xc2\xa0' | /usr/bin/awk '{line = $0; gsub(/[\x00-\x1F]/, "", line); print line}' | od -t x1
# Exit Code: 0
# stdout:

# stderr:
/usr/bin/awk: towc: multibyte conversion failure on: '†¬†' input record number 1, file source line number 1

A7e

lsof -iible/complete/mandb/generate-cache: step3a (caller ble/complete/mandb/load-cache ble/complete/source:option/generate-for-command ble/complete/source:option ble/complete/source:argument/generate ble/complete/source:argument ble/complete/candidates/generate-with-filter ble/complete/candidates/generate ble/complete/auto-complete/source:syntax ble/complete/auto-complete.impl ble/complete/auto-complete.idle ble/util/idle.do/.call-task ble/util/idle.do ble-edit/bind/.tail ble-decode/EPILOGUE _ble_decode_hook)
ble/complete/mandb/generate-cache: step3b (caller ble/complete/mandb/load-cache ble/complete/source:option/generate-for-command ble/complete/source:option ble/complete/source:argument/generate ble/complete/source:argument ble/complete/candidates/generate-with-filter ble/complete/candidates/generate ble/complete/auto-complete/source:syntax ble/complete/auto-complete.impl ble/complete/auto-complete.idle ble/util/idle.do/.call-task ble/util/idle.do ble-edit/bind/.tail ble-decode/EPILOGUE _ble_decode_hook)
-ii

Got the above on first run, trying to replicate it again, but can't produce the behavior anymore, did the patch include a potential fix, because it seems like I no longer get anything slipping through :)

@akinomyoga
Copy link
Owner

akinomyoga commented Oct 20, 2024

Thank you so much for these results with the details of exit code/stdout/stderr!

$ printf '\x20\xc2\xa0\xc2\xa0' | /usr/bin/awk '{line = $0; gsub(/[\x00-\x1F]/, "", line); print line}' | od -t x1
# Exit Code: 0
# stdout:

# stderr:
/usr/bin/awk: towc: multibyte conversion failure on: '†¬†' input record number 1, file source line number 1

Excellent, this must be the core part of the issue!

  • Q9a: I'd like to minimize the test case.
Could you see if the same error appears with the following variations?
$ printf '\x20\xc2\xa0\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x1F]/, "")'
$ printf '\x20\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x1F]/, "")'
$ printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x1F]/, "")'
$ printf '\u00A0' | /usr/bin/awk 'gsub(/[\x00-\x1F]/, "")'
$ printf '\u00A0' | /usr/bin/awk 'gsub(/[\x01-\x1F]/, "")'
$ printf '\u00A0' | /usr/bin/awk 'gsub(/[\x02-\x1F]/, "")'
$ printf '\u00A0' | /usr/bin/awk 'gsub(/[\x03-\x1F]/, "")'

Instead of running the above commands, you can use this script: blesh-gh515-test-awk.sh.txt. Can you download it, run bash blesh-gh515-test-awk.sh.txt, and provide the output?

  • Q9b: Also, I'd like to check if setting LC_COLLATE would work around the issue. What are the results with the following versions?
$ printf '\x20\xc2\xa0\xc2\xa0' | LC_COLLATE=C /usr/bin/awk '{line = $0; gsub(/[\x00-\x1F]/, "", line); print line}'
$ printf '\x20\xc2\xa0\xc2\xa0' | LC_CTYPE=C LC_COLLATE=C /usr/bin/awk '{line = $0; gsub(/[\x00-\x1F]/, "", line); print line}'
$ printf '\x20\xc2\xa0\xc2\xa0' | LC_ALL=C /usr/bin/awk '{line = $0; gsub(/[\x00-\x1F]/, "", line); print line}'

A7e

[...]

Got the above on first run, trying to replicate it again, but can't produce the behavior anymore, did the patch include a potential fix, because it seems like I no longer get anything slipping through :)

Hmm, OK. Maybe you need to clear the cache to test it properly, but now you can forget about this question. Q7e was to identify the problematic line in the AWK source, but the result of the last command in the answer to Q9 tells that the line gsub(/[\x00-\x1F]/, "", line) causes the problem.

@devidw
Copy link
Author

devidw commented Oct 20, 2024

Sorry, felt asleep last night lol. Aha that's wonderful!

Yea, I wrote this very basic utility to run the test commands:

dug

function dug {
    > out.txt

    while IFS= read -r line; do
        cmd="${line/#\$ /}"
        
        stdout=$(eval "$cmd")
        stderr=$(eval "$cmd" 2>&1 1>/dev/null)
        exit_code=$?
        
        {
            echo '```bash'
            echo $line
            echo "# Exit Code: $exit_code"
            echo "# stdout:"
            echo $stdout
            echo "# stderr:"
            echo $stderr
            echo '```'
            echo ""
        } >> out.txt
    done < "$1"

    cat out.txt
    rm out.txt
}

A9a

$ bash blesh-gh515-test-awk.sh.txt
printf '\x20\xc2\xa0\xc2\xa0' | /usr/bin/awk '{line = $0; gsub(/[\x00-\x1F]/, "", line); print line}': FAIL
printf '\xc2\xa0\xc2\xa0' | /usr/bin/awk '{line = $0; gsub(/[\x00-\x1F]/, "", line); print line}': FAIL
printf '\xc2\xa0\u00A0' | /usr/bin/awk '{line = $0; gsub(/[\x00-\x1F]/, "", line); print line}': FAIL
printf '\xc2\xa0x' | /usr/bin/awk '{line = $0; gsub(/[\x00-\x1F]/, "", line); print line}': FAIL
printf '\xc2\xa0' | /usr/bin/awk '{line = $0; gsub(/[\x00-\x1F]/, "", line); print line}': FAIL
printf '\u00A0' | /usr/bin/awk '{line = $0; gsub(/[\x00-\x1F]/, "", line); print line}': OK
printf '\xc2\xa0' | /usr/bin/awk '{gsub(/[\x00-\x1F]/, "", $0); print $0}': FAIL
printf '\xc2\xa0' | /usr/bin/awk '{gsub(/[\x00-\x1F]/, ""); print $0}': FAIL
printf '\xc2\xa0' | /usr/bin/awk '{gsub(/[\x00-\x1F]/, "")}': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x1F]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x01-\x1F]/, "")': OK
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x1E]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x1D]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x1C]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x1B]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x1A]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x19]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x18]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x17]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x16]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x15]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x14]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x13]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x12]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x11]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x10]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x0F]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x0E]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x0D]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x0C]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x0B]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x0A]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x09]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x08]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x07]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x06]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x05]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x04]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x03]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x02]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x01]/, "")': FAIL
printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x00]/, "")': FAIL

A9b

$ printf '\x20\xc2\xa0\xc2\xa0' | LC_COLLATE=C /usr/bin/awk '{line = $0; gsub(/[\x00-\x1F]/, "", line); print line}'
# Exit Code: 2
# stdout:

# stderr:
/usr/bin/awk: towc: multibyte conversion failure on: '†¬†' input record number 1, file source line number 1
$ printf '\x20\xc2\xa0\xc2\xa0' | LC_CTYPE=C LC_COLLATE=C /usr/bin/awk '{line = $0; gsub(/[\x00-\x1F]/, "", line); print line}'
# Exit Code: 0
# stdout:
  
# stderr:
$ printf '\x20\xc2\xa0\xc2\xa0' | LC_ALL=C /usr/bin/awk '{line = $0; gsub(/[\x00-\x1F]/, "", line); print line}'
# Exit Code: 0
# stdout:
  
# stderr:

@akinomyoga
Copy link
Owner

akinomyoga commented Oct 20, 2024

Thank you! The minimal reproducer of the error would be this:

$ printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x00]/, "")'

Since '\xc2\xa0' is a valid UTF-8 representation of U+00A0, /usr/bin/awk should properly handle this. In particular, it shouldn't change anything with gsub(/[\x00-\x00]/, "") since \x00 is not contained in the input string, so it shouldn't cause an encoding error.

I think this is a bug of the UTF-8 support of macOS /usr/bin/awk. The original nawk (which macOS awk is based on) doesn't have this issue.


A9b

$ printf '\x20\xc2\xa0\xc2\xa0' | LC_CTYPE=C LC_COLLATE=C /usr/bin/awk '{line = $0; gsub(/[\x00-\x1F]/, "", line); print line}'
# Exit Code: 0
# stdout:
  
# stderr:

Thank you, so LC_CTYPE=C LC_COLLATE=C seems to work around the issue. I'll add this workaround to ble.sh later.

  • A9c: If you have time, could you also check these?
$ printf '\xc2\xa0' | /usr/bin/awk 'gsub(/\x00/, "")'
$ printf '\xc2\xa0' | LC_CTYPE=C /usr/bin/awk 'gsub(/[\x00-\x00]/, "")'

@devidw
Copy link
Author

devidw commented Oct 20, 2024

Aha, yea that makes sense, I can confirm that I see a difference between the default awk installation under macOS and the latest awk I installed with brew:

$ /usr/bin/awk --version
# Exit Code: 0
# stdout:
awk version 20200816
# stderr:
$ printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x00]/, "")'
# Exit Code: 2
# stdout:

# stderr:
/usr/bin/awk: towc: multibyte conversion failure on: '' input record number 1, file source line number 1
$ /opt/homebrew/bin/awk --version
# Exit Code: 0
# stdout:
awk version 20240728
# stderr:
$ printf '\xc2\xa0' | /opt/homebrew/bin/awk 'gsub(/[\x00-\x00]/, "")'
# Exit Code: 0
# stdout:
 
# stderr:

A9C

$ printf '\xc2\xa0' | /usr/bin/awk 'gsub(/\x00/, "")'
# Exit Code: 0
# stdout:
 
# stderr:
$ printf '\xc2\xa0' | LC_CTYPE=C /usr/bin/awk 'gsub(/[\x00-\x00]/, "")'
# Exit Code: 0
# stdout:
 
# stderr:

Thank you so much for digging deep into this!

@akinomyoga
Copy link
Owner

akinomyoga commented Oct 20, 2024

The source code of macOS awk is provided here. It seems to be based on nawk-20200816, but it diverged from the original version. macOS awk added some multibyte support in apple-oss-distributions/awk@74f4968 (awk-32) independently of the upstream nawk. Since this is a bug of the code original to macOS awk, I think it wouldn't be automatically fixed by rebasing macOS awk onto the latest version of nawk.

I tried to find a way to report a bug to Apple. There seem to be several ways, but they seem to require an Apple user account and detailed information on the macOS computer. Could you report the bug? I think the following result you provided is sufficient to demonstrate the bug in the report:

$ printf '\xc2\xa0' | /usr/bin/awk 'gsub(/[\x00-\x00]/, "")'
# Exit Code: 2
# stdout:

# stderr:
/usr/bin/awk: towc: multibyte conversion failure on: '†' input record number 1, file source line number 1

It should also be mentioned that '\xc2\xa0' is a valid UTF-8 code for U+00A0.


$ printf '\xc2\xa0' | LC_CTYPE=C /usr/bin/awk 'gsub(/[\x00-\x00]/, "")'
# Exit Code: 0
# stdout:

# stderr:

Thank you for these results. The above one means that LC_CTYPE=C is sufficient to work around the issue. Anyway, I'll also specify LC_COLLATE=C for consistency.

@devidw
Copy link
Author

devidw commented Oct 20, 2024

Ah that's interesting.

Sure, I just filled a bug report with macOS Feedback Assistant including a link to your response (FB15548158).

The above one means that LC_CTYPE=C is sufficient to work around the issue. Anyway, I'll also specify LC_COLLATE=C for consistency.

Awesome, ty.

@akinomyoga
Copy link
Owner

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility External Problem/Bug Problems/Bugs of other projects
Projects
None yet
Development

No branches or pull requests

2 participants