-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fzf on Windows assumes UTF8 stdin, but it should assume CP_ACP stdin (current system codepage) #4065
Comments
Thanks for the pointers. Here's the background.
Because I'm not a Windows user and have a limited understanding of the system, I don't think I'll be able to fix the problem myself. I'm hoping that someone will contribute. |
Yup, that'll only work when the console code page is 65001 (UTF8). But I don't know whether that can handle function keys, arrow keys, and other extended keys, etc. I've never tried doing file IO reads from CONIN$ when reading keyboard input, since Windows hasn't had VT input emulation until very recently (partway thru the Win10 lifecycle), and programs typically want to be backward compatible with older versions of Windows. The normal approach for reading keyboard input is to use ReadConsoleInputW on CONIN$, which returns a struct with details about a keyboard/mouse/resize/etc event (depending on console mode flags on the conin handle). On Win11 with the right console mode flag set on the console input handle, then I expect using ReadFile (like the patch currently does) would work. If you want to support older versions of Windows (which in my opinion is probably NOT necessary for fzf), then it would be necessary to use ReadConsoleInputW instead -- but I think that's probably not worth implementing.
As long as it's careful to intercept Ctrl-Break and Ctrl-C and restore the original code page, then that should be ok.
UPDATE: Yes, any console modes and the console code page must be restored before fzf exits. If fzf isn't already intercepting Ctrl-Break/Ctrl-C and restoring the mode(s) and CP, then let me know and I can share how to hook that up.
Understood. I don't know Go at all, but I have decades of deep knowledge of Win32 APIs and system design, and C/C++ experience. So at the very least, I can offer "consultation services". There are multiple ways to support Unicode keyboard input, so I think that can get solved soon, and probably using SetConsoleCP is sufficient, as long as the original CP is restored before fzf exits. But I'm even more interested in the stdin encoding issue. It's unrelated to keyboard input, and addressing it should just be a matter of parsing line breaks, and then for each line call MultiByteToWideChar once from CP_ACP, then call WideCharToMultiByte once using CP_UTF8. I'm willing to try to learn Go enough to at least write some pseudocode, or maybe even get it compiling and testable. Can someone share a code pointer to where stdin is read? Windows console (text mode) programs write 8-bit output encoded in ACP ( |
@junegunn just to be clear -- I opened this issue about stdin encoding, not about keyboard input encoding. They're separate things on Windows. |
Also, a caveat ... I said stdin/stdout use GetACP() encoding on Windows. That's an oversimplification and is a bit inaccurate. In reality console mode apps should emit stdout using the GetConsoleOutputCP() encoding. The OEM codepages are very similar to normal codepages, but have some different legacy characters. E.g. 1252 is the English codepage but 437 is the English OEM codepage (I'm oversimplifying, but GetConsoleOutputCP() reports the actual codepage, whatever it is). GetConsoleOutputCP() should be the right general default, especially since cmd.exe's So...
Does that sound like a reasonable design? |
In PowerShell 7.4.6 (not Windows PowerShell 😒) And, imho, stop feeding Microsoft legacy. |
I thought Powershell 7.4.6 is not installed by default, right?
That doesn't feel very empathetic to people who use Windows. Why can't we make it Just Work? |
Not a word about users from my side. Don't misinterpret. Intention - stop feeding legacy - sooner UTF-8 will be default in Win.
Because of this we more than 15 years struggling with different codepages rather one. And just |
I have users of Clink who install fzf. And then they're surprised that Fzf doesn't work with non-ASCII content by default, they have to either install extra programs or do a bunch of research and workarounds to be able to use fzf. Out of the box, fzf doesn't play very well with Windows. I'm offering to junegunn to help make fzf work better out of the box on Windows. Your suggestions are beside the point of the issue. I appreciate them, and may use some of them, but I'd like to keep the discussion focused on the stated issue.
I understand your viewpoint.
I understand that you disagree with making any change to fzf, and that changes should be external to fzf for Windows, to try to change how Windows works. That doesn't mean junegunn and I cannot make fzf work seamlessly on Windows. It only means you prefer not to. |
If junegunn is not willing, then I'll drop the topic. But I'd like to have the conversation about what it would take to do it. I think it's a pretty easy change to make. I'll grant that maybe the standalone Powershell using UTF8 by default means fzf should continue to default to expecting UTF8. But adding a flag to accept the system code page (or even arbitrary encodings) is not unreasonable. There are plenty of Linux programs that explicitly have flags to control both input and output encodings. The concept is very compatible with Linux. |
Thanks for the detailed comment. I need some time to process this, as the whole issue is quite new to me.
I agree. fzf should respect the current setting of the terminal.
I'm not sure if I follow you here. When would a user want to use dir | iconv -f utf-8 -t xxx | fzf |
Having said that, I can understand @skarasov's reluctance. We need to assess the amount of code needed to support non-UTF-8 settings and decide if it's worth the effort. |
I think the reason it hasn't been much of an issue so far is that But Clink hooks system APIs inside So, fzf integration is possible for It's mainly the intersection of non-English users + Clink users + fzf users who get hit with the quirky complexity and need for workarounds.
I agree. I expect it to be small code changes. I don't feel it would be reasonable to add larger code changes for it. The claim is that the cross-platform Powershell defaults to UTF8 stdout on Windows. I haven't verified that, but I'm happy to accept it. And then that implies that fzf would need to continue to assume UTF8 stdin to avoid breaking existing usage of fzf with the cross-platform Powershell (and any workarounds that users have already applied). So, maybe a
Windows doesn't have an I'm working on coming up with some kind of reliable workaround by default in the Clink fzf integration script, but the need to save/restore the original
The Something like |
I was able to make the Clink fzf integration Lua script apply a workaround that automatically applies to whatever customizations the user has made. So that users hopefully don't need to trouble themselves with how to construct safe/reliable custom workaround scripts.
The chances of failing to restore the codepage are low, so the workaround should generally be reasonably reliable/safe. So, arguably, this means fzf doesn't need any changes to play well on Windows, at least not when using the combination of |
Just wondering if instead of Fzf changing the codepage (for good!?) the Powershell integration script could not, like you Clink integration script, temporarily set and (reliably) restore the code page for legacy Powershell (< 6 !?) as well? I also wonder about the harm that |
Save/set/restore is definitely (much) safer than simply setting it and leaving it set. But it's still a hacky workaround with a wider blast radius than it might appear at first glance. See next...
Yes. If it could be restricted to a process, then the issues would be easy to solve. But This can have non-obvious side effects. For example, suppose a prompt customization script spawns a background console mode program (as an analogy consider async prompt updating like some zsh prompt customizations do, and which Clink also does). And suppose that while that's running in the background, the user invokes file completion which spawns Calling |
Checklist
man fzf
)Output of
fzf --version
0.55.0 (fc69308)
OS
Shell
Problem / Steps to reproduce
dir | fzf
when any file names are present with non-ASCII characters.Result = fzf interprets the stdin as UTF8, but the stdin is never UTF8 by default, and so fzf shows garbled text.
Expected = fzf interprets the stdin as being in the system codepage by default, like native Windows programs do.
Something in fzf is going out of its way to assume UTF8 stdin on Windows, but that isn't how Windows programs work.
Ideally fzf would assume system codepage by default, and could have something like a
--utf8
flag to select assuming UTF8 stdin.The text was updated successfully, but these errors were encountered: