Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bazel] Unset PYTHONSAFEPATH in env.bat and env.sh. #1423

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

allsey87
Copy link
Contributor

@allsey87 allsey87 commented Jul 9, 2024

In Python 3.11 and above PYTHONSAFEPATH prevents the containing directory of a script from being imported into sys.path. This results in emcc.py (and other programs) failing with ModuleNotFoundError: No module named 'tools'. By unsetting PYTHONSAFEPATH in env.bat and env.sh, the expected behavior of importing the containing directory of a script to sys.path is restored. Resolves #1420.

In Python 3.11 and above `PYTHONSAFEPATH` prevents the containing directory of a script from being imported into `sys.path`. This results in `emcc.py` (and other programs) failing with "ModuleNotFoundError: No module named 'tools'". By unsetting `PYTHONSAFEPATH` in env.bat and env.sh, the expected behavior of importing the containing directory of a script to `sys.path` is restored.
@allsey87
Copy link
Contributor Author

allsey87 commented Jul 9, 2024

@PiotrSikora what do you think? This is needed for using the meson rule from rules_foreign_cc.

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

BTW, where is PYTHONSAFEPATH being set?

bazel/emscripten_toolchain/env.bat Show resolved Hide resolved
bazel/emscripten_toolchain/env.bat Outdated Show resolved Hide resolved
@allsey87
Copy link
Contributor Author

Thanks!

BTW, where is PYTHONSAFEPATH being set?

I believe it comes from python_bootstrap_template.txt and makes its way into py_binary. Since py_binary is used to run the Meson build system (which in turn calls emcc.sh) this variable is still set when we start emcc.py which prevents the script from importing the Emscripten tools module.

@allsey87 allsey87 requested a review from sbc100 July 10, 2024 07:40
@walkingeyerobot
Copy link
Collaborator

If Meson is setting this variable and calling out to Bazel, I'm inclined to think the appropriate solution is for Meson to unset PYTHONSAFEPATH - is that possible instead?

@@ -3,3 +3,7 @@
set ROOT_DIR=%CD%
set EMSCRIPTEN=%ROOT_DIR%\%EM_BIN_PATH%\emscripten
set EM_CONFIG=%ROOT_DIR%\%EM_CONFIG_PATH%

:: Ensure PYTHONSAFEPATH is not set so that modules in the same
:: directory as emcc.py and emar.py etc. can be found via sys.path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that this is needed because py_binary sets this?

Alternatively, perhaps there is a more canonical way to express this in a py_binary? i.e. perhaps there is an explicit way to add the emscripten root the PYTHONPATH?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

py_binary is the more idiomatic solution. Although I am blocked on this at the moment due to a bug in python_rules.

@allsey87
Copy link
Contributor Author

If Meson is setting this variable and calling out to Bazel, I'm inclined to think the appropriate solution is for Meson to unset PYTHONSAFEPATH - is that possible instead?

It is the other way around. Bazel is calling into Meson to build something whose outputs can then be used by other rules. py_binary is the more idiomatic solution for what we are trying to do here, but clearing PYTHONSAFEPATH is how we should do this for the current approach. Meson doesn't set PYTHONSAFEPATH, Bazel does.

@walkingeyerobot
Copy link
Collaborator

can you unset PYTHONSAFEPATH in bazel at the point it calls into Meson?

@allsey87
Copy link
Contributor Author

can you unset PYTHONSAFEPATH in bazel at the point it calls into Meson?

It would also be possible to change this in rules_foreign_cc, where the Meson py_binary is declared. But that would then apply to all compilers and linkers that Meson could invoke which seems incorrect to me.

@walkingeyerobot
Copy link
Collaborator

It seems that the emscripten bazel toolchain doesn't set it; python rules (I think?) set it. So changing how the python rules work makes sense as a solution, and changing where bazel is calling into Meson makes sense as well. But If it's something that emscripten doesn't set and doesn't read, I'm not sure it makes sense to change it in emscripten.

@walkingeyerobot
Copy link
Collaborator

It's entirely possible that I don't fully understand the situation, and I appreciate your patience with me if that's the case. Having bazel interact so heavily with another build system is quite foreign to me.

@sbc100
Copy link
Collaborator

sbc100 commented Jul 10, 2024

If bazel wants to set PYTHONSAFEPATH then I think it probably make sense to add the paths we need explicitly using whatever the offiical/recommended way of doing so is.

@allsey87
Copy link
Contributor Author

It's entirely possible that I don't fully understand the situation, and I appreciate your patience with me if that's the case. Having bazel interact so heavily with another build system is quite foreign to me.

Quick explainer: currently we have cc_binary in Bazel which uses our toolchain to cross compile to WebAssembly and wasm_cc_binary to "extract" the outputs of that compilation. But what if our project has several dependencies that need to be linked to the final application? Arguably this is where the power of Bazel really becomes relevant since it allows you to build not only your project but also all of its dependencies in a clean and reproducible manner.

So why not just compile the dependencies with cc_library and link them in? The problem there is that many software projects rely on the build system for code generation. So porting to cc_library is not as simple as just listing the C/C++ source files, one would need to port all the code generation logic to Starlark and then maintain it!

The solution to this problem is rules_foreign_cc, it provides rules that bridge the Bazel build system with Autoconf, CMake, Ninja, and Meson. It hooks into the current C/C++ toolchain and allows us to build our dependencies with code generation etc. and to then link the outputs of those build systems into our final application.

If bazel wants to set PYTHONSAFEPATH then I think it probably make sense to add the paths we need explicitly using whatever the offiical/recommended way of doing so is.

PYTHONSAFEPATH is being set as part of the implementation of the py_binary rule for (and only for) the Python code that it is wrapping (in my case, the Meson build system). The fact that it influences the Emscripten compiler is just a side effect of how environment variables propagate from parent to child processes. It is not intentionally set by Bazel nor rules_python for any reason that applies to Emscripten.

@sbc100
Copy link
Collaborator

sbc100 commented Jul 11, 2024

So meson itself it written in python and you use py_binary to run it? And py_binary (which is part of bazel) always sets PYTHONSAFEPATH?

Is meson itself happy to run in PYTHONSAFEPATH mode? does it not need to access modules relative to the main script?

The problem seems to be that because meson is a py_binary that all of it subprocesses now have to run in PYTHONSAFEPATH mode.. not just emcc.

This suggests to me that meson itself should unset PYTHONSAFEPATH on startup, so that it doesn't force this behaviour on all of its children.

@allsey87
Copy link
Contributor Author

allsey87 commented Jul 11, 2024

So meson itself it written in python and you use py_binary to run it? And py_binary (which is part of bazel) always sets PYTHONSAFEPATH?

Exactly

Is meson itself happy to run in PYTHONSAFEPATH mode? does it not need to access modules relative to the main script?

Yes! Meson needs access to other modules but that is part of the functionality that py_binary provides. It wraps up all dependencies and sets up the interpreter so the main script can find everything it needs in a reproducible manner.

The problem seems to be that because meson is a py_binary that all of it subprocesses now have to run in PYTHONSAFEPATH mode.. not just emcc.

This suggests to me that meson itself should unset PYTHONSAFEPATH on startup, so that it doesn't force this behaviour on all of its children.

I have some counter arguments there:

  1. The Meson codebase does not cause the problem, it is the inner workings of Bazel's py_binary and we are discussing the Emscripten toolchain for Bazel after all. In other words, shouldn't Bazel related code undo a Bazel related problem?
  2. Although I can't name any other examples right now, this issue would also be present for any py_binary that calls emcc as a child process, not just Meson
  3. Emscripten is the only toolchain (that I know of) where the compiler/linker involves invoking a Python script. No other tool that Meson would call would have this problem
  4. Ultimately we rely on PYTHONSAFEPATH not being set for our code to work correctly, so it seems that if this is a requirement for us we should deal with it in this repo

I still feel that unsetting PYTHONSAFEPATH in env.bat/sh (which are only used in the Bazel toolchain) is the least-bad solution to this problem.

@sbc100
Copy link
Collaborator

sbc100 commented Jul 11, 2024

  • Emscripten is the only toolchain (that I know of) where the compiler/linker involves invoking a Python script. No other tool that Meson would call would have this problem

Surely there are plenty of builds that run python scripts are part of the build. e.g. any project that does code generation using python. I'm pretty llvm does this, and wabt. Lots of build tools use python don't they? And they would all be effected by this problem I think.

@sbc100
Copy link
Collaborator

sbc100 commented Jul 11, 2024

I would argue that as long as Meson could conceivably run other python subprocesses then it should unset PYTHONSAFEPATH before it does so.

I imagine the PYTHONSAFEPATH setting in py_binary is designed for the python binary itself, not for the transitive set of all python subprocesses. Most likely the py_binary authors didn't imagine python code spawning sub-processes that could include other python programs.

@allsey87
Copy link
Contributor Author

allsey87 commented Jul 11, 2024

I imagine the PYTHONSAFEPATH setting in py_binary is designed for the python binary itself, not for the transitive set of all python subprocesses. Most likely the py_binary authors didn't imagine python code spawning sub-processes that could include other python programs.

This is exactly the problem.

I would argue that as long as Meson could conceivably run other python subprocesses then it should unset PYTHONSAFEPATH before it does so.

But how would you limit the scope of that logic? Beyond PYTHONSAFEPATH there are all sorts of environment variables that could be set which influence child processes - which ones should Meson choose to block and which ones should it allow?

@sbc100
Copy link
Collaborator

sbc100 commented Jul 11, 2024

I imagine the PYTHONSAFEPATH setting in py_binary is designed for the python binary itself, not for the transitive set of all python subprocesses. Most likely the py_binary authors didn't imagine python code spawning sub-processes that could include other python programs.

This is exactly the problem.

I would argue that as long as Meson could conceivably run other python subprocesses then it should unset PYTHONSAFEPATH before it does so.

But how would you limit the scope of that logic? Beyond PYTHONSAFEPATH there are all sorts of environment variables that could be set which influence child processes - which ones should Meson choose to block and which ones should it allow?

This would only be when meson is running as part of py_binary, in this case we know that py_binary has set an environment variable that prevents normal python processes from working as expected.

So it seems reasonable to explicitly unset this one? Are there other environment variable that effect python the py_binary also sets? We could consider those too if they also cause problems.

@allsey87
Copy link
Contributor Author

Potential alternative: bazelbuild/rules_python#2060

@walkingeyerobot
Copy link
Collaborator

It looks like the rules_python fix is moving along and should fix your issue. Let me know if that isn't the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bazel] [rules_foreign_cc] ModuleNotFoundError: No module named 'tools'
3 participants