-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emphasising the importance of in-place writes #260
Comments
I agree that not supporting in-place writes does limit what use cases this API can be useful for. On the other hand the model of writing to a temporary file which is then atomically moved in place after writing finishes is a long established method of saving files employed by many native applications to avoid data loss. As such I disagree that this means there is (almost) nil value in this API. For use cases that really do require in-place modifications (like as you mention databases) we hope that Native IO will be a better fit. Since that API will be limited to website private storage that will of course not fit all use cases either, but I think with the two APIs the majority of use cases should be covered. (some cases like logs/streams that are append-only are somewhere in between; that seems like something that could plausible be supported by this API in some form, but as you point out is kind of the worst case scenario for this API today). |
That is a benefit to writing whole and atomically moving into place, but to force it invalidates the case for having write/seek calls; as they can now only achieve the same things as using the existing ArrayBuffer API. The real value this API has the potential to introduce is the ability to incrementally get data out of the browser onto the filesystem; something that's impossible right now. Serious multimedia use is mentioned, but that is going to need to append to files in O(1) time and update a header. |
I think there risks a fight over this gap. This API makes progress in solving the granting of filesystem access, but will always be pushing for I/O capability. That API will face a marass of fingerprinting and space allocation issues as it progresses, leaving it pushing for the capabilities of this API. |
I really really would like to see this. It's been hell without it. |
I could not disagree more. As other users have pointed out, we are trying to build apps in the browser that interop with other apps a user might have (ya know... like... real applications). The current behavior makes file watching useless. What if Adobe wanted to write photoshop as a PWA and let you edit files. Their PSD file formats can be huge. They would not like having to churn the files on the filesystem so frequently, and if THEY wanted to avoid dataloss by using a temp file... THEY could do that. Why are you taking away the choice? Do you really have a good reason? I don't think we need to force people to do one way or another... but let us choose to write |
No. There is actually almost nil value in the API as it is compared to |
here's a chrome bug for the issue |
I want something like inPlace too. I do not wish to see yet another storage layer like I think we can all agree on that something like inPlace is important to have and that it should be added & supported It would be better if they could scan the data that is being written to the file like a man in the middle rather than replacing the entire file. kind of: readable
.pipeThrough(fsAccess)
.pipeThrough(malwareScanner)
.pipeTo(destination) |
I didn't intend to just rally support for having in place writes (though that is welcome) I hoped to spell out the clear choice here. If the spec limits to writing whole files, then the entire Of course, like the others who have contributed, I would find a good use for in-place writes to incrementally get large data out to the filesystem. A good API would decouple policy (such as virus checking); here it has become intertwined. Policy like this differs over time, between platforms etc.. Whereas APIs are much more more difficult to change as by definition there are more users of the API than implementors of it. In the specific case of virus checking, there could be solutions which do not involve copying the file. Such as quarantining the existing file during the in-place writes. IMO it is important to get this as far out of an API spec as possible. And then design, wholistically, an API which benefits from the last 40+ years of work in that area. Or, simplify this API as a way to sync' Blobs to files on the user's filesystem. That's a high-level operation which gives the browser incredible scope to optimise the implementation. |
Adding something like |
On-topicI wonder if it's possible to take a page from the filesystem playbook, and use journaled transactions to provide crash-resilient file integrity? In so-called Commits would be expensive, sure, but so is copying the entire file to a temporary just to make it writable. And journaled writes would immensely improve the performance of small updates to large files, which sounds like a common use case for in-place writes. It wouldn't be at all helpful for random-access, mixed read/write use cases, but maybe Native IO is a better fit there. Not sure how this would interact with other concerns like malware scanning, though. Off-topic
That's actually a great example, really, because the PSD format is not merely huge — it's ancient, famously convoluted1, and completely unsuited to incremental updates. Photoshop on the desktop doesn't update PSD files incrementally, in fact that would be antithetical to Photoshop's non-destructive editing paradigm. Photoshop uses a scratch file for live updates, and only serializes to PSD on save — which writes a new file, rather than updating the original. What's more, Adobe did want to write Photoshop as a web app, they have, and to make it work they had to admit to PSD's failings as a format (on occasion it was defended2,3 somewhat Quixotically) and introduce a completely new document format, the Photoshop cloud document, just to add incremental saving to Photoshop. Meanwhile, MS Office introduced an incremental-write version of the Word document format back in Office 2003, and they ended up having to patch it out again due to concerns about deleted information being retained after save. Many data-interchange formats geared towards the enduser sacrifice the ability to make speedy incremental updates in favor of data-integrity assurances. My point here isn't that in-place edits aren't potentially useful to app developers — of course they are. But they're rarely what users actually want for their files (nor should they be required to possess the technical savvy to make that decision themselves), and the problems solved by incremental writes can often be addressed in other ways, without resorting to live-modifying user data. And to @jimmywarting 's point about Google driving the conversation about in-place writes: Agreed they shouldn't be the only voice in the discussion. But neither should app developers' desires to do what's most expedient for them be the overriding concern. It doesn't feel unreasonable or inappropriate, to me, that the browser vendors' overriding concern for local-file access would be ensuring that when app developers push code into their browser, they aren't provided with APIs that expose potentially destructive operations. Even if the app's developers "want" to have that option. After all, it's not their own files we're talking about, it's the user's files. Notes
|
In these discussions, there is a regular assertion that developers and users are somehow in opposition in what they want. I strongly challenge this. The whole point of software is to make a machine do something of value for the user. It is also asserted that the user agent/spec knows better than the users and developers as to what the user wants to do. This of course is impossible on the whole, as the user agent is a general platform, not a purpose-built solution. If the user wants to do something with their files, they should be able to. That requires applications to access those files, in whatever mode is required to do the task at hand. |
I want to touch a related issue i faced earlier... onunload = () => writable.close() Here the senario is that you would download something large using say something like WebTorrent and in the middle of it all the user decideds to close the tab/browser. |
For this particular use case, I think something like the autoClose option proposed in #236 would also work? Having said that, we realize the importance of in-place writes. Specifically for files in the origin private file system the proposed access handle API in #310 aims to solve that use case. We have no current plans to extend that to also allow in-place writes outside of the origin private file system though. |
@mkruisselbrink Why? Why intentionally limit the user from using files on their own filesystem? |
☝️ i 2nd this. Why limit it to just the sandboxed origin private file system?
Auto close would be good... But I also see it as unnecessary if we could just call It's good that we even get a duplex stream / AccessHandle at all, it's unfortunately that it's limited to the sandboxed origin... |
This has now become a specification within a specification, and the one specification that,
is also:
This seems extremely backwards. The Extensible Web Manifesto starts off by saying, "Browser vendors should provide new low-level capabilities that expose the possibilities of the underlying platform as closely as possible. They should seed the discussion of high-level APIs through JavaScript implementations of new features." That sounds like the opposite of what is happening here: there is a high level API that abstracts & restricts file-system access, and the low level tools are only being made available in a very small & limited capacity. It'd probably be better for everyone to drop the classic API altogether & only spec & ship Access Handles to start, as high-level techniques like atomic writes &c could be implemented performantly atop the low-level capabilities, were they exposed. Short of that radical rewrite, it would be horrible to only have performant write access to these concealed, non-user-facing origin-private sandboxes. Users need ways to write bit PSD files, big torrent files, big sqlite files. We can't tell them no. |
A horrible workaround would be to first write to the sandboxed origin first and then move them to HDD (either with write access or save dialog). Only so you can have duplex stream... this is a costly ops that require copying the data and manintaing state of what is newest. |
please bring accessHandle support to things outside of the OPFS |
I wanted to second some of the comments above: I'm curious why in-place writes aren't planned to be supported outside OPFS? To motivate the value, here's an example use case where it could be valuable to add support outside of OPFS. I'm developing Riffle, a relational persistence framework based on SQLite. Riffle is available in both desktop apps and web apps. On desktop, we write to a SQLite file visible on the user's filesystem, which has a number of benefits. The user can see the files, back them up, share them with others, and even edit the files through other apps. On web, we currently use absurd-sql and persist to IndexedDB. This approach lacks those benefits of having the SQLite files be visible to the user. We're very excited about the ongoing work by the SQLite team to create a WASM build that persists to OPFS. But this work would be much more valuable to us if it could also persist to user-visible files. Would appreciate any insight you can offer, @mkruisselbrink |
(Reflecting what I said on Twitter: “There’s nothing that would stop you from bridging the OPFS and the regular file system as a one-off. At a convenient time and when writes are committed, you could ask the user if they want to back up the database to a regular file, or let them initiate the process when they want.” This addresses the immediate use case that motivated the feature request in #260 (comment).) |
There is always going to be certain times when both users and developers wants to write directly to a folder instead of OPFS. if you have a database and it's being used for something else like a server. then you are mostly always going to want it to be in sync with the real database while you use the browser as a simple GUI tool. |
Right, this is definitely not supposed to meet any and all use cases. I was mostly just pointing out that it’s perfectly feasible to cross the boundaries between the two file systems. |
I saw #236 was closed in favor of whatwg/fs#19, then whatwg/fs#19 is closed in favor of a really unrelated issue. Now it seems I'm building a Web app that allows users to record their Android phone screens to WebM files. If the user hit stop button on my page, I need to finalize the WebM file by seeking and writing to multiple positions, so File System Access API is the only option. But if they don't, for example they closed the tab/the browser crashed/system crashed/power failed, I need the file to exist on their file systems. The file is playable, only not having the duration field updated. Like OBS, when OBS was killed while recording, the output file is on my file system, not in some invisible OPFS! |
Ah, thank you for bringing this up! whatwg/fs#19 had two parts - some proposals around file locking and in-place writes, and the As for this issue, it's related to whatwg/fs#41, which proposes an async alternative to SyncAccessHandles, though not quite a duplicate because presumably we'd still have issue (much more performant!) writes to a swap file. I still don't expect we'll support in-place writes for files outside of the Origin Private File anytime soon (for all the reasons @mkruisselbrink mentions above). That being said, there's a lot of room for optimization even with that constraint. The most straightforward optimization would be to implement the swap file as a copy-on-write file (as I mentioned in a recent presentation with other browsers), so that specifying
I can't see us (Chromium) prioritizing this in the near future, especially since it doesn't help Windows at all |
Best I can tell, this refers to this quote from @mkruisselbrink:
So, the justification for not having moderately performant file-access to actual files (as opposed to the occluded invisible non-user-files present in OPFS) is that some use cases preferred avoiding data loss? And avoiding data-loss continues to be seen with such arch & vital importance that we won't even consider giving webapps moderately high speed access to files? What do I have wrong here? Are we ok saying that image editors, databases, git clients, & other would be web-apps must be confined to non-user-files, to being site-only tools? Why would limiting user-agency to only un-performant regimes ever be acceptable? I feel like @mkruisselbrink was trying to use a sometimes occasional/safe practice for non-performant concerns to override the ask of those who wanted a moderately performant general solution. I must say, this feels like impressively awful conclusion for this specification to have, seemingly, ended at. We deeply mis-serve the user, & the asks of the community, by limiting ourselves to these ends, and it's unapparent in extreme why working with actual files users can see must have such incredibly poor speeds, via entirely different apis. I understand the limitations of Windows & other filesystems not offering snapshots, why |
i haven't super digested the thread here, but one thing stands out:
this might be true for small files (like <<100MB?), but it certainly is not beyond that. Blob's are in-memory only, and doing anything 100MB+ or even GB is completely not feasible with Blob's. it will OOM the tab/window/system. that is the primary use case for me -- downloading large files from a remote server e.g. ftp/sftp client. uploading isn't an issue because the existing input/Files API makes it easy to chunk up and not require mmapping the entire thing. the lack of inplace appends is annoying when resuming large downloads, but it's livable. |
Your assertions here aren't correct in practice, mostly blobs are very usable beyond this case. The Blob API is opaque, which has allowed the implementations to evolve. In practice they aren't tied to in-memory. That's the benefit of its high-level API. Low-level flexible APIs also have their benefits, but there's no place for complex APIs that actually only allow high-level things to happen. |
from what i have heard / read / understood, is that if you create small blobs a little bit at a time, then implementer could offload some of that data to the disc, then combining them with so something like this could work: const chunks = []
for await (const uint8_chunk of stream) {
chunks.push(new Blob([uint8_chunk]))
}
new Blob(chunks) // final blob this is just a theory. (just don't create too many small blobs at a time) tough i can verify that if you write small blob-chunks to IndexedDB and then later join them, then you can also create huge blobs that are GB in size. |
this feature is very important to my simple large file transfer service |
I'm a bit late to the party, and for some time have been looking forward to porting an application which logs large media files to this API.
But I'm finding the spirit of the API to be undermined in practical use.
Writes go to a copy of the original file then moving in place, effectively by spec which demands atomicity. This reduces the value of the API in practice to almost nil, because the resulting behaviour almost as constrainted as using Blobs and downloading them. Performance for repeatedly appending to a file is O(n^2).
I'd like to emphasise the importance of in-place writes -- it seems that "ISSUE 6" is pivotal for this API?
Because it defines whether it falls one side of this watershed, or the other:
high-level API, for navigating the filer and synchronising whole files between JavaScript objects and disk, abstracting away as much as possible to fit a clear (and limited) use case mainly operating out of RAM, like text editors.
a conventional (like POSIX) API, embodies decades of prior art and capable of almost any practical use, up to databases, logs/streams and more; held behind a permissions model suitable for the web.
With elements of both that I can see, a lengthy period of iterative development is likely to ensue, with case-by-case discussion that often leads to APIs becoming like a series of patches. My concern is it'll be a long time before I can really make use of this (if at all).
Whereas a design around in-place writes (or expressely ommitting them) is an essential principle in understanding the future of this API, and would accelerate the design.
The text was updated successfully, but these errors were encountered: