Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat:async implemented async (for COG) #245

Closed
wants to merge 5 commits into from

Conversation

feefladder
Copy link

YES, I WILL FEATURE-GATE

Not sure if it should be in this crate, but I just finished working on this and at least wanted to offer it here :). feature-gating is based on initial response.

The use-case is reading Cloud-Optimized Geotiffs in Rust. When I tried in Bevy using georaster crate, it used a lot of requests. In src/decoder_async/readme.md, I say it's becasue we're reading in all tags (including offsets). ChatGPT tells me SRTM 30m resolution world map would have ~66MB of chunk_offsets, which would make a naive reader clear its buffer multiple times to read.

This is a start, but I'm not entirely sure if you want the maintenance burden of also maintaining corresponding async versions of the crate. Implementation inspiration here in parquet2

Notable design decisions/unglyness:

  • All IFD data reading is done asynchronously, for reading in chunks, the full compressed chunk is loaded into a Vec<u8>->cursor->{LZWReader, etc}. I added a trait with blanket implementation for this, so web readers (doing range requests) can implement this properly, where they don't have to clear their (IFD/values reading) buffer.
  • I made everything that doesn't read in the normal decoder public so I could use it. This introduced naming collisions, where some things AsyncImage do have Async and others don't. This was a result of things getting mixed up sometimes and other times not. I think the following folder structure should work:
    src
    |-decoding
    | |-mod.rs <- holds common functionality
    | |-sync
    | | |-mod.rs
    | | |-ifd.rs
    | | |-...
    | |-async
    | | |-mod.rs
    | | |-...
    
    however, it's a bit finnicky, since the reader (and its sync-ness) is quite low in the stack.
  • For example, I didn't re-use EndianReader trait, since all its functions were using sync
  • There is still commented-out code floating around. Current PR is mainly to get a little bit of an insight on if this is the right place.

wishlist for faster COG-ness

Mainly explains floating comments. Also shows the direction I want to go and whether such a thing would be its own crate or these convenience functions would be merged here (in future).

  • allow reading from the first image:

    • all tags that don't have offset further into the tiff
    • geotags (in a parent crate) <- this would want Image to be public, so we can easily query its tags.
  • allow skipping over the rest and further ifds (in a COG, to read the nth overview)

  • reading only offsets, bytes that are required. I did try a ChunkData holding this, but then found out that the current implementation has zero references to the decoder/reader, since all data is directly read in to the image. this is a bit ugly to implement, since we cannot call a get(chunk_index) from a ChunkData, since for 1 out of 2 cases we'd need to read into the tiff.

  • Maybe even add some extra caching in a ChunkData, or add functions to RangeReader with blanket implementations, so they can optimize the caching strategy () That would thwart a "normal" caching reader, but if we tell the reader it is chunk_offsets/bytes, it will cache it (since those will be frequently accessed).

  • For reading chunks, it would actually be beneficial for the reader to cache n*request_length around a chunk, assuming that we are in a mapping application and will move slowly around.

unsure:

  • ?allow holding on to multiple images? <- Allows for quickly changing zoom levels/LODded terrain as in bevy_terrain.

Cargo.toml Outdated Show resolved Hide resolved
@HeroicKatora
Copy link
Member

HeroicKatora commented Oct 1, 2024

I think the implementation is confusing asynchronicity and multi-threading, quite significantly. The main solution to the bottleneck for cloud images would be overlapping IO to different parts of the file. That would imply that operations are re-ordered in such a way that file regions of interest can be calculated ahead of time, long before the data in those regions becomes required for progress. That's not what the implementation is achieving here. We still have almost the same long dependency chain for walking IFDs and its buffers almost sequentially. Reading independent tags for instance should definitely be happening concurrently in the async reader. I'd expect the (currently documented as optional) idea of storing required and past offset data in a separate data structure before it is being used as the main difference in async vs. synchronous decoder. It should then also be possible to share code much more tightly between the two. For instance, read_chunk_to_buffer_async is sharing much core computation with its synchronous counter part and isn't doing any asynchronous work apart from determining and fetching the offset range.

Speaking of, efficient async often leads an inversion of control. It's no longer the decoder itself doing IO but rather 'just' communicating its requirements for progress and then yielding; being re-triggered as soon as there is reasonable expectation of allowing further progress. The dependency on ehttp is a sign that this inversion has not informed the design enough. It should be a pure dev-dependency, it's just a transport. Without the inversion, I'd rather just spawn a thread that presents a virtual Read + Seek object and relays the actual tasks—but do so completely outside the library.

@fintelia
Copy link
Contributor

fintelia commented Oct 6, 2024

A bigger problem is that this is a multi-thousand line PR, without any prior buy-in from maintainers on the design or whether this is a feature we even want. The proper way to add a big feature would be starting with an issue to discuss the proposed design, then if there is favorable reception for maintainers, making a series of moderately sized PRs to implement it. Though that also assumes sufficient maintainer bandwidth, which at the moment is stretched very thin.

@fintelia fintelia closed this Oct 6, 2024
@feefladder
Copy link
Author

The dependency on ehttp is a sign that this inversion has not informed the design enough.

ehttp is only used for the example, reading from a cog, not even for testing or further development purposes.

@feefladder
Copy link
Author

As I understand, a COG looks like:

|-IFD1-|-IFD2-|-IFD3-|--IFD1Tags--|--IFD2Tags--|--IFD3Tags--|----Image3Data---|---Image2Data---|---Image1Data---|

Where IFD1-IFD3 is quick, this can be fetched in a single range request. The main problem is when the decoder requests IFD1Tags, the buffer (outside this crate) gets cleared to make place for tag data and then the IFDs are requested again.

That is not solved in this PR, otherwise it'd become too big (which it is). This PR allows for non-blockingly requesting image data, for when loading the tiff is a background job for the actual application. Should I make an issue about this then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants