Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous GeoTIFF reader #13

Open
weiji14 opened this issue Jul 19, 2024 · 7 comments
Open

Asynchronous GeoTIFF reader #13

weiji14 opened this issue Jul 19, 2024 · 7 comments

Comments

@weiji14
Copy link
Member

weiji14 commented Jul 19, 2024

Rewrite https://github.com/geospatial-jeff/aiocogeo in Rust!

We'll probably want to leave all the non-filesystem I/O to object_store, and focus on decoding IFDs asynchronously.

Help is most appreciated.

@kylebarron
Copy link
Member

I started a prototype here: https://github.com/developmentseed/aiocogeo-rs

It's able to read all TIFF and GeoTIFF metadata and decode a JPEG tile, though general read/decompressor support is not yet implemented. I definitely plan to implement full-tile-decoding support, so that we can use it from Python efficiently.

It's not at all reliant on the tiff crate (it does use some enums that I could later copy over).

One question is how to marry both synchronous and async read APIs. One good reference is the parquet crate, which has both sync and async APIs to read Parquet.

@feefladder
Copy link

feefladder commented Sep 17, 2024

Hmmm.... I wanted to use tiffs in bevy_terrain, so modified the image/tiff crate to do async.
Some nice discussions on how to merge:

Basically: Put all non-io-related code (working on [u8]s) in a top-level module and then put (async)-io-related code in a submodule. What I did in image/tiff is to create a cursor over the byte array so I modified as little code as possible. I admit it's somewhat arcane, because I didn't know if the image/tiff folks actually wanted to maintain async support 😁. But now I see (busstoptaktik/geodesy#86) that dev efforts are concentrated here rather than georaster, I'll see if I can give it a shot....

@weiji14
Copy link
Member Author

weiji14 commented Sep 17, 2024

Oh wow, thanks @feefladder for all that work (and the PR at image-rs/image-tiff#245)!

I didn't know if the image/tiff folks actually wanted to maintain async support 😁. But now I see (busstoptaktik/geodesy#86) that dev efforts are concentrated here rather than georaster, I'll see if I can give it a shot....

Not sure either, but I'd definitely like it if most of the async code could be put in image/tiff, especially if there are no 'geo'-specific parts. If I'm not mistaken, image/tiff only supports full reads and not partial reads (e.g. it can't read specific tiles within a Cloud-Optimized GeoTIFF based on a bounding box) right now? There's some experimentation going on at aiocogeo-rs that might support partial reads at some point (based on original Python implementation at https://github.com/geospatial-jeff/aiocogeo/tree/0.3.0?tab=readme-ov-file#partial-read).

Eventually though, it would be great to have less fragmentation on different ways of reading GeoTIFFs in Rust, and have all the core implementations live upstream in image/tiff or here in georust/geotiff. But there might be a period where we'll need to try different styles and see what sticks 🙂

@feefladder
Copy link

feefladder commented Sep 18, 2024

Actually, I did some work on georaster based on the async stuff... lemme see if I can make that into a PR...bEDIT: ah it turns out my files didn't save properly...
image-tiff does support partial reads, it has a load_chunk(chunk_index: u32) function. Not sure if that is in release though 😁

@feefladder
Copy link

My pr over at image-rs/image-tiff#245 got rejected because it was too big and I didn't create an issue first. Then a proposal I had for more control (image-rs/image-tiff#250) there is no maintainer bandwith to support my proposed changes (also here). So I don't know if it is feasible in the near future to have async on top of image-tiff.

Anyone here have ideas for what is a good way forward?

@weiji14
Copy link
Member Author

weiji14 commented Oct 12, 2024

@feefladder, first off, I really wish I had your energy to do such deep technical work in the (Geo)TIFF Rust ecosystem. I can understand both your frustration as an open source contributor wanting to make meaningful changes, but also feel for the image-tiff maintainer who has to review a 1000+ line PR for a crate used by 47k+ of downstream projects...

Personally, I'd prefer not to fragment the TIFF ecosystem by having X different implementations in X repos, which is why I've suggested pushing things to image-tiff. That said, me and my colleagues at DevSeed are aware that the image-tiff maintainer is stretched rather thin, which is why the aiocogeo-rs repo Kyle mentioned above was created. The idea is to have an independent implementation for an async GeoTIFF reader in that aiocogeo-rs repo (that doesn't rely on image-tiff), and once that experimental work is stabilized, we will slowly upstream parts of it to image-tiff or geotiff crate as needed. This means a lot more duplicate work, but it can be good to have a proof of concept to show and get people excited about (and hopefully the image-tiff maintainers will be more receptive to a contribution that has been battle-tested in the wild).

Of course, I don't expect anyone to trust my obviously biased intentions on getting aiocogeo-rs working since it is currently hosted under the organization I'm working for. I'm aware @feefladder that you'd maybe prefer a different arrangement/license based on georust/meta#34 (comment) and some Discord discussions, and respect it if you'd like to carve out a different space for async GeoTIFF. Keen to hear your ideas on a way forward.

@kylebarron
Copy link
Member

To add to this, I would love to figure out a good architecture such that we could use some of the decoding parts of image-tiff while managing our own data fetching. This is hard however because of image-tiff's architecture, and the fact that it lazily reads tiff tags on access instead of up front. (the reading and the decoding of the tags is combined).

But maybe there's a way to use aiocogeo-rs for the data fetching and tag decoding, but reuse image-tiff for the image decoding? That's what I was heading towards before I ran out of time (I'm doing a lot of work on open source rust vector projects)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants