-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for multiple readers via access handles #34
Comments
(I believe I know the answer to this): Do you need the ability to seek within these files? I presume performance reading (from WASM) is important here. Adding an 'accessMode: read' to createSyncAccessHandle() would seem straightforward, and can leverage existing accessmode logic used currently for SyncAccessHandle and getWritableFileStream |
Yes, we do need the ability to seek within these files. Your assumption about the importance of the performance of the read operation is correct. :-) |
Is a read-only handle sufficient, or do you need a handle you can write to and read from (from multiple places) at the same time? To me, this becomes a more broad question around how we should handle file locking (#18). #19 lays out a more general proposal that gives developers more control over locking behavior, among other things. Taking this proposal a bit further makes me wonder whether we should allow things like multiple writers to the same file. An argument could be made that since this is intended to be a high-performance API, the browser should get out of the way and let the developer and underlying OS handle keeping file data consistent. |
We don't need to do read/write simultaneously. We typically write once and read many. |
From Emscripten's point of view, allowing arbitrarily many readers and writers would be ideal. We aim to emulate a normal POSIX environment to make bringing applications to the Web as easy as possible, and any differences imposed by Web APIs become problems our users (or we) have to work around. Our user's applications already handle the challenges of simultaneous file access on native platforms, so the Web platform trying to protect them from those challenges causes more problems than it solves. |
The web API needs to be interoperable, having the same behavior across different file systems or OS, so we probably cannot make this API the mirror of POSIX file system API, or any file system API. (We could provide interface that client needs, with clear definition of expected behavior.) |
Even if the browser had to do something like serialize all accesses to a file to ensure that no inconsistent state (i.e. partially applied write) could be observed, that would still be much better than the status quo of allowing only a single writer. |
Yes, I agree. We can probably get away with just relaxing the exclusive locking requirement (i.e. making it possible to create a SyncAccessHandle with a read-only lock) to support multiple readers, but supporting multiple writers without a way to coordinate writes could get hairy. As for whether we should support multiple writers in the first place... The current restriction of having one writer is not a problem for applications using this API, since single-paged applications can route all write requests through the same dedicated worker. But this restriction imposes architectural constraints that are significant hurdles to building libraries on top of this API, since all writes have to be proxied through the same worker, even when there are multiple tabs open for the same origin. Adding support for byte-range file locking would solve this problem, albeit while introducing quite a bit of complexity. It's unclear to me at this point whether this is something we want to pursue. |
Is it possible for multiple instances of the same application open in multiple tabs to route accesses through a shared worker? If so, that's news to me and would make things significantly simpler. If not, the single writer limit is still painful at the application level, since having multiple tabs open to the same application is extremely common. |
Shared workers exist, but my understanding is they're not a context where we'd want to call sync APIs. You can
Sorry, I should have been more clear that I was referring to just a single-page application. You are correct that this is an issue for any use case with multiple tabs |
Right, we need to figure out how to coordinate the sync writes from multiple writers first, which could be in different threads or different processes.
I'd prefer making analysis and adding support based on realworld use cases, like the Adobe example for multiple readers. |
I'm glad you asked! :) Developers want relational databases. A goal of the Access Handles project is to allow developers to bring their own database to the web. As mentioned in this recent article about WebSQL deprecation by @tomayac, Chrome is working with the SQLite team to bring SQLite to the web over WASM. In the shorter term this will allow Chromium-based browsers to finally remove WebSQL, but the vision (or my vision, at least) is that Access Handles are the primitive enabling an ecosystem of WASM-based database libraries to become the way sites store data on the web. Fulfilling this vision requires those libraries to support reading and writing data from multiple tabs. This API becomes significantly less useful if it's slowed down by bloat. Something like byte-range file locking is not worthwhile if it would degrade performance. |
I totally agree with:
From the Adobe perspective, the multiple-readers is the priority as it would allow us to share data and caches between tabs. In particular, this allows us to avoid downloads of large data files like documents and fonts when shared between tabs. While we see the value in multiple-writers for libs like SQLite, it is not a priority. I prefer we focus on performant APIs. |
The discussion at the Breakout session on OPFS seemed to end with consensus on allowing an optional lock-type parameter, which defaults to exclusive, but allows taking a shared read lock, or no lock at all. This is in reference to SyncAccessHandles. |
These three categories of locks SGTM (as discussed in the meeting). Now to bikeshed the API surface :) I think we need to be clear that the "no lock" mode is AT YOUR OWN RISK. I'd prefer something like
I'm not sure whether the locking mechanisms for WritableFileStream should be the same as for SyncAccessHandles. For example, a shared read lock (and any read lock) doesn't make sense in reference to WritableFileStream. And perhaps more importantly, it's not possible for the browser to enforce an exclusive locking requirement for files outside of the OPFS, which are accessible by other applications on the machine (or even from other machines, if the file lives on a shared file system. See WICG/file-system-access#286). Another question here: I think there's an argument to be made that locking should be exclusive across concepts. For example, if a site has a number of SyncAccessHandles in the no-restrictions ("unsafe") mode, should we still block a |
The spec currently requires a shared lock for WritableFileStream (which blocks a SyncAccessHandle while we have a WritableFileStream) I'm not going to address interactions with the non-OPFS File System API. Being able to overwrite a file that has open AcessHandles would in fact be the point of "unsafe" access; the requirement to deal with any collisions/races falls upon the application, as we discussed yesterday. |
As an application developer, I prefer protective behaviors from the file system. I want to have the operation rejected and to receive an error code if I am performing something that will result in data loss or corruption. Maybe I'd like an override mechanism for some scenario, but nothing is coming to mind as to a scenario. I think a file should not be movable if it is open for read or write. This is based on my experience over the decades of writing code for Windows, Mac and Linux. |
To explain this another way, I'm wondering whether "shared" and "exclusive" locks is the wrong model here, but if all locks should be exclusive with respect to other concepts (or "operations"). So opening any SyncAccessHandle would exclude performing any other sort of operation that could modify the file (such as opening a WritableFileStream or calling Realistically, any operation which modifies the file ( This also makes specifying locking literally exponentially easier, since we won't to specify the expected behavior between every possible locking state for every possible concept; only within concepts (i.e. when there are multiple WritableFileStreams or SyncAccessHandles) Thoughts? |
I like the idea of defining the behavior for each operation and not using lock in interface. If we are not going to offer lock API (e.g. let client acquire or release lock explicitly or check locking state), I think "readonly" and "readwrite" might be more straightforward than "exclusive" and "shared". (I am not quite sure about "unsafe" due to the completely undefined behavior and uncertain benefits). To describe the concurrency model which allows multiple readers in OPFS, I think it might be something like:
|
WritableFileStream right now takes a shared lock (which also allows for all sorts of oddities depending on who closes it last). It could also take a lock-type option for an exclusive lock (and even an unsafe mode open). |
Before SyncAccessHandles, the API had no concept of locking. WPTs asserted that you could create multiple WritableFileStreams, and this seemed reasonable given the triviality of creating multiple swap files. Arguably this makes less sense now, especially with the proposed This decision has had some strange side effects. For one, it makes supporting multiple read-only SyncAccessHandles tricky. But also, what about modifying operations, such as
Thank you for the proposal! I agree that this pattern is more straightforward and maybe easier for users to wrap their heads around, but I'm skeptical for a few reasons...
I personally prefer the model where locks are exclusive across primitives that could modify the file (I wish I'd used the word "primitives" instead of "concepts" above). @jesup do you have any opinions? |
We've updated (locally at least) the WPTs to not allow remove/etc if there's a WFS - we require an exclusive lock to remove or move a file. Do current applications depend on multiple WFSs to the same file? Note that if we allow the "Unsafe" locks for SyncAccessHandles (or WFSs), as discussed at TPAC, we'll need to still deal with how that interacts with move/removeEntry... We could still take an "Unsafe" lock, and specify that move/removeEntry require that there are no locks on the file (as opposed to stating that they take an exclusive lock). We also would have to specify the interaction between Unsafe locks and Exclusive/Shared locks - can you take an Unsafe lock if there are existing Exclusive/shared locks? Can you take an Exclusive/Shared lock if there are Unsafe locks? The discussion at TPAC implied that Unsafe was the equivalent to not locking, which would mean that Exclusive/Shared and Unsafe wouldn't notice each other. (and that move/RemoveEntry would happily take an exclusive lock and move/remove). I agree getFile() can cause interesting issues if the file is modified/moved/deleted while the blob exists. We can't reasonably lock, due to it being a GC'd object with no close(), unless we wrap it with an API with close(). |
For For multiple writer support, or if developers need finer-grained control in the future (e.g. implementing byte-range lock), it might be easier to introduce an "unsafe" mode on the file, not on the handle, where no lock is required for operations (including |
I guess I'm puzzled about the use cases for multiple writers. The only case I can think of is if there is range locking, so that one context to lock a range and modify it (or append or truncate) while another context has locked a non-overlapping range. |
In sqlite3, for example, any number of clients may open a given db file in read/write mode, but that mode doesn't mean much until sqlite3 actually has to write to the file. When sqlite3 needs to lock, it locks the whole file, but only for the time required for the given operation (which may be just a few small I/O ops or "forever," depending on what the client is doing). Its public API which clients use to plug in new storage backends (as we've just done with OPFS) does not model byte-range locking. In the case of sqlite3's OPFS support, we open the FileSystemFileHandle at the earliest opportunity, use |
The argument for multiple writers would be that it allows a simpler (and perhaps more efficient) emulation of POSIX file operations by e.g. emscripten/etc - POSIX doesn't disallow multiple opens. If emscripten emulates POSIX file locking, then operations like @sgbeal describes would simply work. Also any application which wants to implement it's own locking/range-locks could do so. The argument against it would be that multiple writers is generally error-prone, and racy if done from multiple tabs (though with care that would be ok, much as it is in POSIX applications), and that most applications don't try to write from multiple handles. Unlike native apps, however, it's easier in a browser for multiple instances of an app to be "running" at once (in separate tabs), which increases the changes of apps hitting open() (createSyncAccessHandle) failures, which they may not deal with well. If we have shared read-only mode (not just a shared lock, but a true read-only mode), that may reduce the likelihood of this sort of failure, though apps may open files in read/write mode by default in case they may need to write to it. Note that POSIX/linux by default has no locking on file operations, and applications deal with that, and either use the advisory locking primitives or their own synchronization methods. The primary argument comes down to flexibility & ease of posix emulation compared to blocking potentially error-prone usages. There are arguments for both. |
Thanks for summarizing the pros and cons of multiple writers, @jesup. All valid points, and given that it is OPFS, an app/site should be able to manage error-prone scenarios pretty well. I can see that the flexibility here for POSIX-like file operations can be quite beneficial here. For example, in SQLite, being able to keep multiple files open via multiple writers could help the overhead caused by async open, once it bears the one-time cost of opening all SAHs in the beginning. (The recent changes for all sync methods within SyncAccessHandle has helped SQLite performance but opening SAH is still asynchronous.)
These are great questions. I think having an unsafe sync access handle take “no lock” could cause unexpected data read by other sync access handles in safe mode. So, my inclination here to your questions above are, no to both. Unfortunately, the current lock types {“exclusive”, “shared”} won’t be able to support some scenarios. For example, Writable with a shared lock and read-only sync access handle with a shared lock can coexist, which we probably don’t want. Another example is if we do not want read-only sync access handle and unsafe sync access handle to coexist, using shared locks for both won’t work; lock will need to know about the “mode”. As @a-sully mentioned above, there are two dimensions here:
Perhaps, we will need something like lock type and lock mode to handle (1) and (2). What about something like this?
To acquire a lock of type A and mode of B, the file’s associated lock must:
I propose we should also lock ancestor handles, in order to avoid removing a directory while there is an open writable or SAH. This lock needs to be a “shared-read-only” mode so that more than one child handle can be granted locks at a time. So, putting it all together... Not all type/mode combinations would be valid. The valid ones could be:
On the user-facing side, the new SyncAccessHandleLockMode (and possibly WritableFileStreamLockMode) can map to the lock modes in the spec. That way, users do not need to worry about the “lock” concept; they will just need to specify what behavior they want.
In summary:
Thoughts? |
Exclusively locking to a single type of file access primitive then having various locking schemes available within each primitive makes a lot of sense to me.
I'm not sure I understood this part correctly, but I think you mean that the ancestor directories up to the root above any file with an open SAH should be locked for shared reading. I don't think that would be a good idea, since it would unnecessarily inhibit directory operations like creating a new sibling or deleting a sibling of an open file. Preventing open files from being deleted seems reasonable, but ideally that would be implemented as a check in the file deleting procedure so it doesn't affect any other operations. Note that POSIX allows open files to be unlinked and further allows reading and writing to unlinked files, so in a perfect world it would be great if the file system APIs allowed that as well, but I understand that this is probably infeasible. FWIW, a very large user ran into a bug related to this difference in behavior, but I think they were able to work around it. |
Hmm yeah I can see how that that would confusing. API locks are specified to correspond to exactly one file entry. The intention here is that the "shared-read-only" lock on the ancestors would only prevent the directory entry itself from being modified (so no renaming/moving/deleting the directory), but modifications within the directory, such as creating a new sibling or deleting a sibling of an open file, would be fine since those operations will take a lock corresponding to a different entry.
It also seems reasonable to specify this as part of the deletion (and possibly move - see below) procedure. There are a few reasons why an explicit
Outside of the OPFS, we do not have the luxury of mapping files ourselves. Currently we don't support SyncAccessHandles outside of the OPFS either, but presumably this could be an issue for a future async alternative (#41). Our ability to support move() outside of the OPFS would be reliant on the latter point being true, which simply may not be the case on all file systems. All that aside, it seems brittle for a cross-platform web API to rely on an implementation quirk like this?
Curious to hear other thoughts here! |
I have not been able to follow the full conversation. Our codebase runs on Windows, Mac, Linux, iOS as well as WebAssembly. The file system and file operation abstraction layers in our code attempt to smooth over the differences. Most of our code uses the posix apis. We frequently use the boost::filesystem apis. We avoid multiple writers in our codebase. Given our large code base, we prefer that the default behavior be mapped as close as possible to the posix apis and the underlying operating system. |
I think we generally should pick a target - IMO posix APIs. Implementing posix on top of anything is pretty well defined. |
Describes a new locking scheme to address #34
There is now a dev trial for new File System Access locking modes. See the announcement here. This is targeted for milestone 122. |
I have been using this for three weeks, I believe since the day it reached Chrome Canary. I'm only using the "readwrite-unsafe" mode; it's exactly what I wanted (along with JSPI) for building SQLite virtual file systems. Supporting access handles that can always both read and write is a critical feature for performant SQLite with multiple connections. Using the new mode I have built a prototype SQLite VFS that supports write-ahead logging, which should provide better performance and concurrency than currently possible with OPFS. It is described here and an online demo is here. I have written more than 10 SQLite VFS implementations for the web platform, using both OPFS and IndexedDB. FWIW I am highly in favor of this proposal, particularly "readwrite-unsafe". I appreciate the work that has gone into getting it this far! |
This is great news! What is the best way to feature-detect whether these options are supported? It would be great to experiment with the new modes in Emscripten, but we'll need to be able to gracefully fall back to the current behavior if the new modes are not supported. |
This looks great! Keep us up to date with any feedback or questions you have.
The best way would be to check for the const FSALockingModesEnabled =
FileSystemSyncAccessHandle.prototype.hasOwnProperty('mode');
// OR
const FSALockingModesEnabled =
FileSystemWritableFileStream.prototype.hasOwnProperty('mode'); |
Our web app downloads large application specific static data files. They range in the 10kb to several hundred kb size. We use a variety of approaches to cache this data. We would prefer to store them in OPFS, and, to be able to share them between multiple tabs of our app. For example, if the user opens two tabs to our app, we want to have a single copy of the data. We would like to download the data once, and then be able to read using the OPFS access handles from multiple tabs. We prefer access handles for their good performance.
The text was updated successfully, but these errors were encountered: