Skip to content

Latest commit

 

History

History
278 lines (204 loc) · 14.3 KB

README.md

File metadata and controls

278 lines (204 loc) · 14.3 KB

Python NativeClient Sandbox (pynbox)

⚠️ NOTE: This project is no longer active, and the technology it is based on (NativeClient) is no longer supported. For alternatives, consider Pyodide or gVisor.

The project provides a version of Python that runs in the NaCl (NativeClient) sandbox. Most OS operations are unavailable, and access to the filesystem is limited to the specified directories (similar to chroot or docker mounts).

Security is a major focus of NativeClient. It allows safely executing untrusted code, in this case the Python interpreter, and all Python code running in it, including native Python modules.

The Pynbox project make this functionality easy to set up and use, whether by installing pre-built packages, or building from source. It works on Mac, Windows, and Linux.

Contents

To use the pre-built sandbox, clone the project, pick a destination directory DEST_DIR, and run ./pynbox install:

git clone https://github.com/dsagal/pynbox.git

cd pynbox
./pynbox install DEST_DIR python tests

DEST_DIR/bin/test_pynbox

This installs the packages containing the sandbox, the Python 2.7 interpreter, and a test, and runs the test. It should work on Mac, Windows, and Linux. If the output ends with "All passed", then things are good. If not, please open an issue.

You can now run Python in a sandbox:

DEST_DIR/bin/run python -c 'import os; print os.listdir("/")'
>>> ['python', 'slib', 'test']

By default, the Python code only has read-only access to DEST_DIR/root. It can't see anything else.

On Windows, this process is tested with the Bash shell that comes with Git. Note that on Windows, the sandboxed code sees a POSIX-like filesystem.

To allow the sandboxed code to interact with the outside world, DEST_DIR/bin/run supports a number of options, which you can see by running DEST_DIR/bin/run with no arguments.

To give the script access to more of the filesystem, use the -m <host_dir>:<virt_dir>:<ro|rw> option. It mounts the directory <host_dir> (on your machine) under the virtual paths <virt_dir> where it will be seen by the sandboxed code. The suffix of ro or rw determines whether to mount the directory as read-only or read-write. (If you've used Docker, this option is similar to Docker's -v.)

Another connection to the outside world is the standard streams (stdin, stdout, sterr) and additional file descriptors which you can redirect using -h, -r, and -w options to DEST_DIR/bin/run.

What you can do with those is up to you. For example, you can run code inside and outside of the sandbox, which sets ups RPC using forwarded file descriptors.

Note

On Windows, the run script does not pass along open file descriptors to child processes. If you need to use -h, -r, or -w options, run VERBOSE=1 DEST_DIR/bin/run to get the underlying sel_ldr command line, and execute that command directly, bypassing the run script.

The pynbox install command allows installing several packages. Here's a brief description:

  • sandbox_outer is always required and automatically added when installing anything. It includes the run convenience script, and sel_ldr which is the actual trusted loader: it is responsible for loading, validating, and running sandboxed code and enforcing the sandbox restrictions. Note that this package is OS-specific, so comes with .win, .mac, and .linux suffixes, and it is the only OS-specific package.

  • sandbox_inner is also required and automatically added when installing anything. It contains some helper libraries that run inside the sandbox, in particular for dynamic library loading.

  • python package contains Python 2.7 built to run in the sandbox.

  • lxml package contains the Python lxml library, which includes a native module. It illustrates how native modules can be made to work in the sandbox.

Note that except for sandbox_outer, packages are not OS-specific. However, packages are specific to CPU architecture, and currently only x86-64 (aka AMD64) is supported.

Prerequisites:

  • x86-64 (aka AMD64) CPU architecture. This covers most modern machines, including laptops and desktops. NaCl supports more architectures, but you'll have to edit pynbox scripts to make that work.
  • Docker for building OS-independent packages (all internal packages, i.e. all other than sandbox_outer). These packages can be used with any OS.
  • Bash is required. On Windows, it is tested with Bash that comes with Git.
  • On Windows, Visual Studio 2013 (vs120) is required, available here (no need to select any options), because nacl's scons build script fails to detect the presence of the newer VS 2015 version.

The following command builds all the packages we support at the moment.

./pynbox build sandbox_outer sandbox_inner python lxml tests

Note that lxml is an example of a Python module with native (binary) code. Most Python modules do not require building, but only need to be placed somewhere under DEST_DIR/root, to be visible inside the sandbox.

Managing the Docker build

Packages internal to the sandbox are OS-independent, and built using Docker. These are known as "webports". When running ./pynbox build to build them, it automatically creates a Docker image named pynbox-webports and runs it as a Docker container named pynbox-webports1. It then executes commands within that Docker container.

You can start and stop the Docker container without building anything using ./pynbox startdocker and ./pynbox stopdocker commands.

Pynbox modifications

For building from source, Pynbox uses cloned repositories for NativeClient (the sandbox runner and libraries), and for Webports (software packages that have been made to build under the sandbox).

Installing from built packages

If you've built your own packages, e.g. after modifying their source code, the built packages end up in ./build/packages/PACKAGE.VERSION.tgz2. If you then run ./pynbox install, they should be picked up and installed into your destination directory.

Note: if you already have the same version of a package installed in DEST_DIR, subsequent installations will skip it. To force a reinstall, remove the "install receipt" file DEST_DIR/packages/PACKAGE.installed first. Alternatively, you can update a package's version (e.g. by adding -dev1 suffix) in ./packages/PACKAGE.create.sh.

You may place your built packages in a separate directory (or online at some URL), and you'll then be able to install from there by using ./pynbox install --repo REPO option. This allows you to build webports packages on one OS, and use the built packages on other OS's.

NativeClient (or NaCl, and a variant called PNaCl) is a sandbox for running native code in the Google Chrome browser. The approach involves building the C or C++ code using NaCl suite of tools (compiler, linker, etc) which produce binaries that verifiably access only certain APIs (not OS directly), and then running them in an environment (typically Chrome browser) which provides the necessary APIs.

The NaCl project is developed by Google, but is open source, as part of the Chromium project (the open-source version of Google Chrome).

There are pretty good resources for building native apps using NaCl to run in Chrome. What's not common is to run the NaCl sandbox without Chrome. The sandbox comes with a tool for just that, sel_ldr (for Secure ELF Loader). It's no longer used by Chrome itself, so is less well documented, and less well supported and maintained.

NativeClient project includes a set of ports, known as webports which are software packages that have been made to build under NaCl to run in Chrome.

One of the ports is Python, which makes Python interactive shell work under Chrome. It can also run under sel_ldr.

The sel_ldr runner can enable access to the filesystem, including a restricted mode when it limits access to a given directory (similarly to chroot). If populated with all the modules and libraries that Python need, this offers a way to run Python with that directory as the filesystem root.

NaCl vs PNaCl

NativeClient encompasses PNaCl (portable native client) and just NaCl. These differ in toolchains used to build code, and produce .pexe and .nexe files respectively. The idea is that .nexe is architecture-specific, and .pexe is more portable: it can be translated to a suitable .nexe file on the fly.

There is a hitch, however: shared libraries are only supported by the glibc toolchain which builds architecture-specific .nexe files directly. We need shared libraries, in particular, to allow Python to load C extension modules (including a number of standard modules).

Note: Loading shared libraries uses "libdl.so" library. This library isn't part of NativeClient source. It is downloaded as part of an architecture specific tgz archive (for each architecture). It seems to have some bugs (or super-weird behavior), in particular opening "/lib/foo" translates to "/foo", while "/./lib/foo" works. This is special for the "/lib" path, so we avoid the bug in pynbox setup by placing libraries in the sandbox under "/slib" instead of "/lib".

If you are considering sandboxing, then security is important to you.

NativeClient itself has a robust design to ensure security. Here's Chrome's brief security FAQ. Google has also published a paper about it: Native Client: A Sandbox for Portable, Untrusted x86 Code (PDF). Another analysys by Chris Rohlf is available here: Analysis Of A Secure Browser Plugin Sandbox (PDF).

Overall, the security approach of NativeClient relies on verifying instructions, preventing new unverified instructions from being created at runtime, verifying that all jumps land on verified addresses, and providing a suite of build tools that produce code that can pass these verifications. The build tools themselves aren't trusted: the verifications happen at load time and run time.

There is other trusted code that implements allowed system calls and other communication between the sandbox and the outside world.

The design is robust and powerful, but bugs will exist as anywhere, and these can cause vulnerabilities that allow untrusted code to escape the sandbox. This post includes a great discussion and lists some examples from a security contest in 2009: Security Implications.

So the biggest risk to NativeClient's security is if it is little-used, and nobody spends the time to discover and fix vulnerabilities.

Python layer

If you only use Pynbox to run Python code (and not to run untrusted native code), it mitigates many risks, since Python code doesn't have a direct way to execute CPU instructions or manipulate the format of the executable, which is the basis for various attempted exploits.

Modifications by Pynbox

Pynbox adds features to the trusted code that have seen less vetting than other code. Namely, it adds support for mounting multiple directories from the host system, including new support for Windows and for read-only mounts. The code is written with security in mind, but has been less tested than those parts of NativeClient codebase that are used in Chrome.

With mounted directories, one area of concern is symlinks. In short, it is recommended to avoid symlinks in mounted directories. If you have symlinks in the mounted directories that point outside, the trusted code follows them and interprets them as inside the virtual filesystem (e.g. HOST_DIR/foo -> /etc/passwd would translate to HOST_DIR/etc/passwd). This is good. However, there is a race condition between this verification and actual operations on the file. If a new symlink is created along the resolved path between the resolution and the actual operation, it may allow an escape outside of the mounted directories. For this reason, creation of symlinks, and renames of symlinks or directories are disallowed for sandboxed code, so such an exploit is only possible with help from code running outside of the sandbox.

We want to hear from you

If you have discover vulnerabilities or have questions about security, please get in touch at dmitry at getgrist.com, or open an issue.