Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop explicit preopens #128

Open
badeend opened this issue Aug 11, 2023 · 16 comments
Open

Drop explicit preopens #128

badeend opened this issue Aug 11, 2023 · 16 comments

Comments

@badeend
Copy link
Contributor

badeend commented Aug 11, 2023

Given WASI's CloudABI heritage I understand why preopens currently are the way that they are.

But they feel like something that should be just an host implementation detail. Ie. could the following also work?:

Remove

get-directories: func() -> list<tuple<descriptor, string>>

in exchange for:

instance-root-directory: func() -> descriptor

which returns a single directory that represents the "default filesystem". In similar vein to:

  • wasi-clocks/instance-monotonic-clock
  • wasi-clocks/instance-wall-clock
  • wasi-network/instance-network

By default this is an empty directory. But the host can "mount" specific external directories/files into it. Exactly how the contents of this "filesystem" are populated (preopened or not), would be of concern only to the host, not the client application / libc.

@pchickey
Copy link
Contributor

I'm up for exploring this idea more sometime in the future, but for the preview2 timeframe I have advocated that we keep interfaces and concepts that existed in preview 1 as similar as we can possibly get away with.

@BentonEdmondson
Copy link

BentonEdmondson commented Aug 15, 2023

I think this is a good idea. This would solve WebAssembly/wasi-libc#414

@badeend
Copy link
Contributor Author

badeend commented Aug 16, 2023

for the preview2 timeframe I have advocated that we keep interfaces and concepts that existed in preview 1 as similar as we can possibly get away with.

Yeah, that's totally fine!

The main reason why I am asking this is because I'm trying to get a grip on where WASI wants to end up regarding "preopens", so I can take that into account for wasi-sockets.

@sunfishcode
Copy link
Member

As the person who initially brought preopens to WASI, based on similar ideas in CloudABI and Capsicum, I agree. I've also come to believe that migrating wasi-filesystem to have a rooted filesystem view would be beneficial to WASI. This would be from many people's perspective a much more "normal" filesystem API. We could have an open function where you can just pass it a string, and open a file. We could even add a builtin current working directory. Overall, this would eliminate a common source of friction when people look to port code to WASI.

This new namespace could still be expected to be sandboxed, but the sandbox could look more like a simple VFS with things mounted in it. Many use cases could just use that directly, rather than using preopens. I expect the VFS could even be populated by the existing --dir command-line flags. Overall, it seems like this is something we can migrate to incrementally, without breaking compatibility with existing code.

Why the change in stance? Preopens were added in the Preview 1 era, at a time when there wasn't even a component model proposal, and we had a lot of questions with no answers. But today:

  • We can be more confident that wasi-filesystem should just be a filesystem API, and specialize it accordingly. WASI will add many new kinds of resources, but not everything is a file, and we now have tools like an IDL, handles and resources, and more, so we don't need wasi-filesystem to be a general-purpose resource namespace.
  • We now have a more developed understanding of "link-time authority". In particular, one of the key properties of capability systems is that all authorities be interposable, and with link-time authority, we can still interpose them. We will still need to watch out for ghosts, strings being passed around that implicitly assume everyone has the same filesystem namespace, but we can help by building up tooling such as WASI-Virt to promote best practices.

(But what if you want an Everything Is A File experience? We can provide it, by teaching WASI-Virt how to implement features in terms of wasi-filesystem APIs!)

So, should we start moving in this direction? I don't see any objections to this overall issue so far, but I'll do some more asking around.

@yamt
Copy link

yamt commented Feb 28, 2024

This new namespace could still be expected to be sandboxed, but the sandbox could look more like a simple VFS with things mounted in it.

i concern the "simple VFS" is likely a bit more expensive than preopens, which is just a list.

@yoshuawuyts
Copy link
Member

yoshuawuyts commented Jul 9, 2024

Dan published a blog post about this today: Capabilities and Filesystems. In it he says the following about wasi:filesystem:

We could add a plain open function, which takes a path argument and resolves it relative to a filesystem root that comes with the instance that the function is imported from. That makes this open function use a link-time capability. It should support absolute paths, symlinks to absolute paths, and all the rest. And then, we could add similar non--at versions of all the -at functions. This new API could live in a new interface that could coexist with the current interface.

To me this seems like something worth experimenting with now that we've shipped Preview 2.

@badeend
Copy link
Contributor Author

badeend commented Aug 7, 2024

I've created a proof-of-concept of this over at: bytecodealliance/wasmtime@main...badeend:root-directory


We could add a plain open function, which takes a path argument and resolves it relative to a filesystem root that comes with the instance that the function is imported from. That makes this open function use a link-time capability. It should support absolute paths, symlinks to absolute paths, and all the rest. And then, we could add similar non--at versions of all the -at functions. This new API could live in a new interface that could coexist with the current interface.

I went with a simpler design, similar to the one outlined in the initial post. Instead of duplicating the entire path-based API-surface into non-_at versions, the POC only has one new WIT method:

  package wasi:[email protected];
  interface preopens {
+     @deprecated
      get-directories: func() -> list<tuple<descriptor, string>>;
  
+     /// Get the root directory of the file system.
+     /// In POSIX filesystems this is located at: `/`
+     @unstable
+     root-directory: func() -> descriptor;
  }

This feels like a sweet spot to me:

  • For guests: wasi-libc can remove its internal path-to-preopen mapping code and simply pass any absolute path straight into root_directory.open_at().
  • For hosts: there's only one version of each API to maintain.
  • Still easy enough to communicate to users: if all they want to do is get up-and-running fast and don't care about the capability model, they can just perform whatever path-based syscall they like to perform on the root directory handle. E.g. open(...) == rootdir.open_at(...), and: symlink(...) == rootdir.symlink_at(...), etc.

To test it all out I disabled the preview1-style preopens and every unit test still passes, except preview2_file_read_write. This is quite obvious when you look at its source code:

let preopens = wasi::filesystem::preopens::get_directories();
let (dir, _) = &preopens[0]; // <--- panics here

All other tests continue work because they go through the preview1-to-preview2 adapters, which I also modified:

On initialization, the adapter calls root-directory() and appends the fd to its internal preopens list with a hardcoded path of: /
Next, it attempts to get the initial-cwd and open that using root_directory.open_at(initial_cwd). If that succeeds, the fd is likewise appended to the preopens at path: .

This was enough to make the preview1 test suite pass without modification.
In terms of upgrade path; a future preview2-to-preview3 adapter could take a similar approach and return exactly 1 or 2 paths from its synthesized preopens::get-directories(). One for /, and optionally another one for .

Note: eagerly preopening initial_cwd as . was the easiest way to get the adapter to work. Don't think this is a fundamental requirement, though.


Some notes about the POC implementation:

  • The WasiCtx has gained an initial_cwd setting. This is exposed by wasi:cli/environment::initial-cwd().
  • Relative guest paths passed into WasiCtxBuilder::preopened_dir are converted to absolute paths based on that new initial_cwd setting.
  • Before, only directories could be preopened. Now, regular files can be preopened as well, although this feature is not exposed on the WasiCtxBuilder yet.
  • In various places the code that processes the guest path strings is barely working and only "good" enough to make the unit tests work.
  • Much of the FS implementation was moved from crates/wasi/src/host/filesystem.rs to crates/wasi/src/filesystem.rs. So the diff is on those files is not really helpful.

@badeend
Copy link
Contributor Author

badeend commented Aug 7, 2024

The POC also fixes WebAssembly/wasi-libc#414, btw

@yoshuawuyts
Copy link
Member

@badeend thanks for doing this work! - Would you be up for presenting these changes at either the WASI sub-group or the component implementers meeting? It doesn't have to be anything big - but more to get all implementers on the same page about this.

I went with a simpler design, similar to the one outlined in the initial post. Instead of duplicating the entire path-based API-surface into non-_at versions, the POC only has one new WIT method

Oh also, I was curious about this: for 0.3, do you think we should keep the at_ APIs? Or is this change specific to 0.2.x, and we should actually remove the at_ APIs in 0.3?

@badeend
Copy link
Contributor Author

badeend commented Aug 12, 2024

do you think we should keep the at_ APIs? Or is this change specific to 0.2.x, and we should actually remove the at_ APIs in 0.3?

Even for 0.3+, I think there's still merit in keeping them. Two reasons I can think of right now:

  1. Paths passed to the *_at methods are securely confined to the reference directory descriptor. This on its own is already a nice guarantee to build software on. Additionally, this simplifies the composition of components with increasingly tightened sandboxes; A parent component can open-at a specific subdirectory and expose that descriptor as-is, without further virtualization, to a child component as their root-directory.
  2. If we were to remove all *_at API's and replace them with only ambient authority equivalents, this would significantly increase the complexity of the preview1 adapters as they would need to rebuild the security model that exists today.

@sunfishcode
Copy link
Member

A subtlety here is that the current *-at functions perform this sandboxing where they prevent ".." and symlinks from escaping the given directory. In contrast, POSIX openat doesn't do this sandboxing.

@badeend
Copy link
Contributor Author

badeend commented Aug 19, 2024

That's correct. The current WASI open-at is most similar to Linux' openat2 with RESOLVE_BENEATH.

To get POSIX openat behavior with my POC, is for wasi-libc to prepend the path of the reference directory to the input path and use that to call open-at on the root directory, correct?

@sunfishcode
Copy link
Member

In a POSIX sense, prepending the path isn't sufficient, because according to POSIX, the main purpose of openat is to avoid race conditions:

The purpose of the openat() function is to enable opening files in directories other than the current working directory without exposure to race conditions. Any part of the path of a file could be changed in parallel to a call to open(), resulting in unspecified behavior. By opening a file descriptor for the target directory and using the openat() function it can be guaranteed that the opened file is located relative to the desired directory.

@badeend
Copy link
Contributor Author

badeend commented Aug 20, 2024

Ah right. The devil is always in the details.
Do you have any ideas on how to go about this in such a way that it can be implemented efficiently but also doesn't require exposing the entire host filesystem?

@bjorn3
Copy link

bjorn3 commented Aug 20, 2024

Do we actually need to support openat without RESOLVE_BENEATH?

@sunfishcode
Copy link
Member

Do we actually need to support openat without RESOLVE_BENEATH?

That's a good question. I've been thinking part of the goal here is to have as "normal" a filesystem API as we can get, which would include having an openat that works like POSIX's does. But I could also see the argument that this particular detail of it isn't as important as keeping the overall API surface simpler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants