Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support paths with UTF characters on Windows #681

Open
carlescufi opened this issue Jun 2, 2023 · 13 comments
Open

Support paths with UTF characters on Windows #681

carlescufi opened this issue Jun 2, 2023 · 13 comments
Assignees
Labels
enhancement platform: Windows Issues related to Zephyr SDK on Windows hosts

Comments

@carlescufi
Copy link
Member

carlescufi commented Jun 2, 2023

C:\Users\carles\src\tmp\仄費羅斯\zephyr\zephyr>C:/Users/carles/bin/zephyr-sdk-0.16.1/aarch64-zephyr-elf/bin/aarch64-zephyr-elf-gcc.exe c:\Users\carles\src\tmp\仄費羅斯\zephyr\zephyr\kernel\banner.c
cc1.exe: fatal error: c:\Users\carles\src\tmp\????\zephyr\zephyr\kernel\banner.c: Invalid argument
compilation terminated.

Note that GNU Arm Embedded doesn't work either:

C:\Users\carles\src\tmp\仄費羅斯\zephyr\zephyr>"c:\Users\carles\bin\gnuarmemb\10 2021.10\bin\arm-none-eabi-gcc.exe" c:\Users\carles\src\tmp\仄費羅斯\zephyr\zephyr\kernel\banner.c
arm-none-eabi-gcc.exe: error: c:\Users\carles\src\tmp\????\zephyr\zephyr\kernel\banner.c: Invalid argument
arm-none-eabi-gcc.exe: fatal error: no input files
compilation terminated.
@carlescufi carlescufi changed the title utf-8 characters not working on Windows Paths with utf-8 characters not working on Windows Jun 2, 2023
@carlescufi carlescufi changed the title Paths with utf-8 characters not working on Windows Paths with UTF characters not working on Windows Jun 2, 2023
@stephanosio stephanosio self-assigned this Jun 13, 2023
@stephanosio stephanosio added platform: Windows Issues related to Zephyr SDK on Windows hosts enhancement labels Jun 13, 2023
@stephanosio stephanosio changed the title Paths with UTF characters not working on Windows Support paths with UTF characters on Windows Jun 13, 2023
@stephanosio
Copy link
Member

Marking this as an enhancement since this is more of a general problem with MinGW/MSVCRT.

From https://www.msys2.org/docs/environments/:

MSVCRT (Microsoft Visual C++ Runtime) is available by default on all Microsoft Windows versions, but due to backwards compatibility issues is stuck in the past, not C99 compatible and is missing some features.
...

  • It doesn't support the UTF-8 locale

Also from https://blog.r-project.org/2022/11/07/issues-while-switching-r-to-utf-8-and-ucrt-on-windows/#why-utf-8-via-ucrt:

MSVCRT does not allow UTF-8 to be the encoding of the C runtime (as reported by setlocale() function and used by standard C functions). Applications linked to MSVCRT, in order to support Unicode, hence have either to use Windows-specific UTF-16LE API for anything that involves strings, or some third-party library, such as ICU.

The easiest way to fix this (i.e. without modifying the Binutils and GCC themselves to use the Windows UTF-16LE API) would be to build the Windows Zephyr SDK binaries against the UCRT instead of the MSVCRT; but, this requires more investigation and discussion on the potential side effects.

@piernov
Copy link

piernov commented Jan 31, 2024

Could an alternative build linked with UCRT be provided? Currently, building Zephyr from a user directory name containing Unicode characters on Windows is broken due to this issue (among others), since absolute paths are used almost everywhere in Zephyr's build system.

As for the GNU Arm Embedded toolchain, it seems like the new ARM GNU Toolchain might have fixed this problem.

As a side note, in my case the issue is that GCC preprocessor produces files with include paths in an "ANSI" character set instead of UTF-8. When the path contains characters that can be converted from Unicode to the compatibility "ANSI" character set (e.g., é in ISO-8859), GCC can actually read input files properly but does not re-encode the paths to UTF-8 it seems. This causes the zephyr.dts.pre file to contain paths in an "ANSI" character set with characters outside the 7-bit ASCII table, which cannot be decoded in UTF-8. However, the python-devicetree library of Zephyr tries to read the file as UTF-8 and fails. A workaround is to call the preprocessor with the -P arguments to omit the paths from the output file.

@dpkristensen
Copy link

dpkristensen commented Feb 5, 2024

I would like to point out that not supporting UNICODE in Windows API also leads to issues with path length restrictions. See https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation

The functions that are affected by setting the registry value to override the length limitation are ONLY the wide versions, the narrow character versions still have the max length restriction of 260 and there is no way to change it.

This has led to issues with path restrictions on Windows PCs that are not present on other systems because GCC can't open a file with a long path name.

So I would say this is more akin to a bug than an Enhancement.

@stephanosio
Copy link
Member

Note that the path length limitation issue will not be fully solved even if we make the SDK use the Unicode functions, because there are other components in the build system (notoriously, Ninja) that does not support long paths due to the very same underlying problem.

@dpkristensen
Copy link

Right but as long as the SDK is in a path accessible by Ninja, it will have no problem launching GCC. The source is passed as a string, so it will still work if only the source is in a long path. Some of the files generated by the build system have very long paths due to being added as relative to the build directory.

If GCC is able to accept such a path, then it would work in a lot of places. If there's an issue with CMake or Ninja, then maybe they should be built with UNICODE as well; but it shouldn't stop this from being supported here.

@piernov
Copy link

piernov commented Apr 11, 2024

Turns out the issue wasn't just about UCRT, but also from the way the command line arguments are passed to the main function of GCC as explained there https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 .
This has been fixed by the following commits now available in GCC 13.1:

Additionally, in order for the build with UCRT to be successful, this commit is also required (also available in GCC 13.1):

UCRT might not even be needed at all, not sure.

Zephyr's GCC hasn't been ported to GCC 13.1 yet sadly, so I merged these commits on top of Zephyr's GCC fork for SDK v0.16.5-1: https://github.com/piernov/gcc/commits/fix/ucrt-utf8/

I then rebuilt a MinGW-W64 toolchain with UCRT and with win32 threads (instead of posix/winpthreads) on ArchLinux:

  • mingw-w64-headers: configure […] --with-default-msvcrt=ucrt
  • mingw-w64-crt: configure […] --with-default-msvcrt=ucrt
  • mingw-w64-gcc: configure […] --enable-threads=win32 --disable-libgomp --disable-libssp
    This required rebuilding and reinstalling the packages roughly in that order: mingw-w64-headers mingw-w64-crt mingw-w64-gcc mingw-w64-crt mingw-w64-gcc (yes they need to be rebuilt twice, maybe more).

The next release of MinGW should default to UCRT: https://sourceforge.net/p/mingw-w64/mingw-w64/ci/82b8edc101d7f8fefd44e84d2e24a6edd01901f9/ . However, sdk-ng's uses Ubuntu 20.04 packages, so it may take a very long time until with see a UCRT build used by default. I hope there is a way the process can be sped up.
Using win32 threads instead of posix threads is in order to avoid the dependency on the external winpthreads library, like how the official SDK toolchains are built.

Finally I rebuilt Zephyr's SDK toolchain for ARM, and I obtained a GCC compiler that can take UTF-8 paths on the command line, and generates preprocessed files with UTF-8 paths as well.

I uploaded my build there: https://github.com/piernov/sdk-ng/releases/download/v0.16.5-1-ucrt-utf8/toolchain_windows-x86_64_arm-zephyr-eabi.7z

Lastly, in order to support running the toolchain from a path with Unicode characters in the Zephyr build system, I had to add ENCODING UTF-8 to the execute_process() call that sets LIBGCC_FILE_NAME in https://github.com/zephyrproject-rtos/zephyr/blob/f0212367dc033d152b1d3f08d0efc130400034dd/cmake/compiler/gcc/target.cmake#L100 .

@dpkristensen
Copy link

dpkristensen commented Apr 11, 2024

For this particular issue, supporting UTF-8 encoded paths may fix the problem mentioned; but it still does not solve the issue of Windows API path length being different based on the setting of UNICODE, which affects the Windows API call (e.g., CreateFileW vs CreateFileA).

If the arguments are passed in a separate file to GCC, then that would allow bypassing the path length restriction in a greater number of cases.

@piernov
Copy link

piernov commented Apr 11, 2024

Does UCRT solve that or not? Can you confirm your bug still exists in my build?
Since the issue for the original bug report (Unicode characters in path) is already fixed in upstream GCC, it just needs to trickle down to Zephyr's branch, if your issue isn't solved the same way I'd suggest opening another bug report.

@dpkristensen
Copy link

Yes, it is technically a separate issue. I have solved it locally by building from a shorter path on my local filesystem, so it's not a blocker. Maybe a "nice-to-have".

@piernov
Copy link

piernov commented Aug 19, 2024

MinGW 12.0.0 is released and defaults to UCRT https://sourceforge.net/p/mingw-w64/mailman/message/58776404/
and it looks like it will be part of Ubuntu 24.10 https://packages.ubuntu.com/oracular/mingw-w64-common (although I haven't checked if it is actually UCRT).

GCC 14.2 is also released https://lists.nongnu.org/archive/html/info-gnu/2024-08/msg00000.html so hopefully there'll be some progress on #740 .

That said I'm not sure the build environment targeting Windows will be updated to use Ubuntu 24.10 MinGW packages, so it may still take quite a long time before we see an official SDK build for Windows with these fixes.

@stephanosio
Copy link
Member

@piernov Thanks for looking into this and providing detailed explanation. I will see if I can get the latest MinGW-w64 release integrated into the Windows Zephyr SDK build process alongside #789.

@piernov
Copy link

piernov commented Sep 23, 2024

@stephanosio thanks, I pushed a sdk-ng branch that uses the still in-development Ubuntu 24.10 image, and also backported some patches for gdb, gcc and crosstool-ng to build with the newer MinGW:
https://github.com/piernov/sdk-ng/tree/mingw-12
https://github.com/piernov/binutils-gdb/tree/fix/mingw-12
https://github.com/piernov/gcc/tree/fix/ucrt-utf8-2
https://github.com/piernov/crosstool-ng/tree/fix/ncurses-gcc-version-check

I also pushed a mingw-12-act branch for sdk-ng that uses nektos/act to run the build locally.

However, I haven't had the time to check that it works properly.

In the meantime I was debugging a CMake issue ( https://gitlab.kitware.com/cmake/cmake/-/issues/26262 ) which basically means that a bunch of execute_process() calls in the Zephyr build system needs to have ENCODING UTF-8 (unlike what the CMake documentation said for several versions).
Additionally, the DTC Chocolatey package still needs to be updated to support UTF-8 as well for the Zephyr build to work ( carlescufi/chocolatey-packages#1 ).
Hopefully that should be enough to get a basic build of Zephyr running on Windows with Unicode characters in the path.
Then there's also the issue with whitespaces ( zephyrproject-rtos/zephyr#43959 ), since it is common on Windows to have user home folder containing a space (separating first name from last name).
There's still a long road ahead of us…

@stephanosio
Copy link
Member

A more up-to-date version of MinGW-w64 toolchain, which uses UCRT by default, has been added to the sdk-build Docker image. This will be integrated as part of the Clang/LLVM toolchain support in Zephyr SDK planned for 0.18.0 release.

For more details, see #830 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement platform: Windows Issues related to Zephyr SDK on Windows hosts
Projects
None yet
Development

No branches or pull requests

4 participants