Exif support attempt number 2 #242

spoutn1k · 2024-08-23T20:33:56Z

Rebase @seaeagle1's PR and try to pull together multiple definitions of IFDs and entries. A lot has changed from the use of common structures.

… (ie. decode it's own Exif output)

feefladder · 2024-10-02T00:20:13Z

Hey, I saw your comment on #243 and had a look around in your code, where I saw you added generics to Image. In my pr #245, I also had a deep look at the Image struct and how it works with the decoder. Also, @fintelia said in multiple places that the encoding API needs an overhaul. Also in #222, someone agreed with me that private fields in the Image make inspection difficult. The main point is that I think that overhaul would mean using a more config/property type of design, rather than a generics design e.g. #205. I don't know what that would mean for this PR, and it should belong in its own PR, let's say we have these "problems":

Decoder/Encoder harmonization: The API is different, the structs are different, everything is different.
Extensibility: This library is made to build things on top of, and sort of provides API at different levels, but imho not as clearly as inspired by "libtiff: provides interfaces to image data at several layers of abstraction (and cost). At the highest level image data can be read [...] without regard for the underlying data organization, colorspace, or compression scheme. Below this high-level interface the library provides scanline-, strip-, and tile-oriented interfaces that return data decompressed but otherwise untransformed. [...] At the lowest level the library provides access to the raw uncompressed strips or tiles, returning the data exactly as it appears in the file."

Now, I think it would be a good idea to have three levels of wrapping structs that hold data:

/// Hold all Images, possibly decoded. Also provides convenience functions
/// Just use u64, since we're not targetting super memory-constrained things
struct TIFF {
   // ifd_offsets: Option<Vec<u64>> // was seen_ifds in Decoder, but I would like a scan_ifds() function
   images: Vec<enum {Image(Image), Offset(u64)}> // indices should match, or enum 
   bigtiff: bool,
   byte_order: ByteOrder,
}

/// IFD that contains an image as opposed to a sub-ifd (which also may contain an image, but we don't necessarily check there
struct Image<'a> {
  // useful fast-access tags
  // _not_ necessarily bigtiff or endianness
  sub_ifds: Vec<IFD>
}

/// IFD that doesn't necessarily hold all tags, or even image data
struct<'a> IFD {
  sub_ifds: Vec<IFD>, // vec ?is? a pointer, so ?no? infinity otherwise Vec<Box/Arc<IFD>>
  inner: BTreeMap<Tag, Entry>
}

Additionally, I think the Decoder could use static methods for reading IFDs, like so:

impl Decoder {
  /// static method for reading in an IFD, are these on Image? If they are, I think they should be exposed there
  pub fn read_ifd<R: Read + Seek>(reader: R, offset: u64) -> IFD {todo!;}
  pub fn read_ifd_async<R: AsyncRead + AsyncSeek + Unpin> (reader: R, offset: u64) -> IFD {todo!().await}
  /// scans over the linked list of ifds without reading them
  pub fn scan_images(&mut self) {
    // same as next_ifd, but doesn't read and goes on completely
    self.ifd_offsets = todo!();
  }
  /// read in all ifds of all images
  pub fn read_image_ifds<R: Read + Seek>(&self) {
    if self.ifd_offsets.is_none() {self.scan_ifds();} //should be unnecessary
    let Some(offsets) = self.ifd_offsets else { unreachable!(); }
    let self.images = offsets.map(|offset| self.read_image_ifd(offset)).collect();
  }
}

Document high-level vs mid/low-level api, which - I think - would
Publish Image for debugging(Private fields and interfaces make file inspection difficult #222)/wrapping purposes(Expose Image and friends to allow for Multithreading implementations #246)
- For this, it should be clear what Image is. Now, there are multiple use-cases for image:
  - (partially) decoding a tiff <- the Directory<Tag, Entry> holds offsets and/or Values.
  - encoding a tiff or further processing <- the Directory<Tag, DecodedEntry> holds only Values
    - How to handle sub-IFDs in this case? Should we have something like:
      pub struct DecodedIfd<'a> { ifd: BTreeMap<Tag, DecodedEntry>; sub_ifds: vec<&'a DecodedIfd> } /// An Image that has retrieved all IFD info when decoding, or for encoding pub struct DecodedImage<'a> { sub_ifds: Vec<&'a DecodedIfd> // or a more "basic" sub-ifd type }
      Then DecodedEntry holds a Value::IFD(sub_ifds.find(the_ifd)) or Value::IFD8(vec_index). Allows for recursion? Or Box something?
      I would say: "the [Image] struct holds all necessary metadata for decoding an image from an (async) decoder" <- tying Image and Decoder closer together, basically I'm trying to avoid a big maintenance
More control over the decoding process: Allow reading an image without directly reading all tags (Cloud compatibility #85) or read them all (Provide a way to retrieve all tags defined within a file #243 Exif support attempt number 2 #242 yay tiff magic).
- This would also allow reading just an IFD, without needing to read in all things

It's getting all a bit messy now and I can't save drafts. Also should this be its separate issue/discussion? (big refactor + API overhaul)

spoutn1k · 2024-10-02T05:11:19Z

Decoder/Encoder harmonization: The API is different, the structs are different, everything is different.

That's essentially what is being attempted here. The Directory type is now at the root and used in both encoder and decoder. However it expects to be tagged with the tiff format and that is wy generics are being passed down. I would sleep well knowing them gone, but this is a draft PR to see where this can be taken.

I appreciate the amount of research that shows through your comment, but I am swamped at the moment and cannot reciprocate. Please do create a discussion and we can discuss the implementation there.

feefladder · 2024-10-02T17:29:10Z

Hey, thanks for your reply, I'll see if I can make a discussion tomorrow. I think harmonization is a bit different than exif and this PR addresses both and my PR also overhauls internal structures. So I made this comment to separate that, after gathering what is needed for exif support. As I understand, it's also not entirely clear, but at least the following:

smooth suppport for sub-ifds. (As I could see in your PR now, it writes to the same offset it came from?)
- reading/writing of said sub-ifds
- Currently, you use the decoder to read in the EXIF sub-ifd. In my async implementation, asyncness function signatures bubble up all the way, so the general idea is to have generic "decoder::read_ifd()" functions that work on buffers (or memreaders for ergonomics) also on the decoding side. Then if that is useful for encoding exif, that's great. <- the synergy 🌈

The Directory type is now at the root and used in both encoder and decoder.

Yes, and I'm suggesting 2 "types", images and "raw" ifds. Ill see if I can polish my post more better tomorrow.

However it expects to be tagged with the tiff format and that is wy generics are being passed down.

In the decoder, this information is a boolean flag, rather than a generic. I think essentially bigtiffness should only have to be used when reading/writing tags or headers, after which this information is redundant (if we store everything in u64 in our structures). Also ""TileByteLength"" tag in bigtiff doesn't have to be u64 per se (I think it doesn't make sense, since you'd have a very weird tiled tiff if it has tiles so big they don't fit into u32, you've done something wrong with the tiling then), which the boolean-flag-design allows for more easily.

fintelia · 2024-10-06T01:03:37Z

This PR is far too large, with for too much all rolled up into it to meaningfully review. The proper way to add a big feature would be starting with an issue to discuss the proposed design, then if there is favorable reception for maintainers, making a series of moderately sized PRs to implement it. Though that also assumes sufficient maintainer bandwidth, which at the moment is stretched very thin.

spoutn1k · 2024-10-06T05:36:14Z

Why is #221 not closed then ?

fintelia · 2024-10-06T18:47:26Z

I don't fully recall the state of that other PR, but it is more than 5x smaller than this one.

TBH I haven't figured out how to handle multi-thousand line PRs. Seems to be lose-lose regardless. Essentially every time I've tried to review such a large PR, it stalls out after several rounds of feedback, either because the author gets tired of making more and more changes (understandable!) or due to design disagreements that cannot be reconciled. Ends up being a waste of everyone's time and very demoralizing.

On the other hand, other options aren't much better. Closing such a huge PR comes across as harsh given the work that went into it, but not closing is often mistaken as an encouragement to do more work. Requesting a PR to be broken into several pieces comes across as agreeing to all the feature changes. Leaving some PR feedback at the same time as one of these other options leaves very mixed messages.

spoutn1k · 2024-10-07T05:33:23Z

I get your point. It did seem unfair and preferrential, especially as this one is a draft for exposure on an experiment and not submitted for review. I want to add that the numbers are inflated by moving code around, but that does not change the ultimate point you rightfully make. @fintelia one of the first things I did in this PR is centralize code implemented in both the encoder and decoder. Would that be a worthwhile PR to look at ?

fintelia · 2024-10-07T06:52:16Z

Do you mean the "Created global ifd definition" change? Haven't looked in detail, but from skimming:

Making Decoder object to be generic over normal vs bigtiff: I don't think we should do this. It makes the API more inconvenient for callers, who now have to know in advance whether they've got a normal or bigtiff file
Switches Encoder to working on typed IFD entries rather than bytes: Not sure I understand the motivation here. Is this to make it easier to encode nested IFDs later?

(Also reopening the PR, so we can continue in the same PR thread if we figure out how to pivot this to a sufficiently self contained change)

seaeagle1 and others added 21 commits January 19, 2024 16:59

Support Exif data export

8b15380

Clean up decoder interface to allow decoding of Exif-only TIFF blocks…

f449d51

… (ie. decode it's own Exif output)

Exif encoding support

44f2da6

Make tests actually test output integrity

fd30240

Improve comments

ec8452b

Merge remote-tracking branch 'upstream/master' into exif

d844ebd

Created global ifd definition

fd64bb1

Add type aliases to avoid generics mess

3f4b59c

Avoid moving symbols for no reason

ace1b18

Added ProcessedEntry

9e3d000

Avoid panic on missing ifd

c7a8bff

Remove hardcoded exif tag list

4248d9d

Fix endianness using crate methods

b884111

Impl From<BufferedEntry> for ProcessedEntry

3d95771

Added more errors, automatic formatting of defined tags

ed14fb0

Add tags and tag formats

530fe1f

Refactor formatting to access entire entry

5bc9394

Interchangeable Buffered/Processed entries

290ba08

Refactor DirectoryEncoder

13a11ee

Refactor

b3d9853

DirectoryEncoder accepts u16 internally

9746037

spoutn1k mentioned this pull request Sep 18, 2024

Provide a way to retrieve all tags defined within a file #243

Open

feefladder mentioned this pull request Sep 30, 2024

Partial tag #249

Draft

feefladder mentioned this pull request Oct 3, 2024

Extensibility: mid-level API and harmonizing encoder/decoder structs. #250

Open

fintelia closed this Oct 6, 2024

fintelia reopened this Oct 7, 2024

feefladder mentioned this pull request Oct 11, 2024

Asynchronous GeoTIFF reader georust/geotiff#13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exif support attempt number 2 #242

Exif support attempt number 2 #242

spoutn1k commented Aug 23, 2024 •

edited

Loading

feefladder commented Oct 2, 2024

spoutn1k commented Oct 2, 2024

feefladder commented Oct 2, 2024

fintelia commented Oct 6, 2024

spoutn1k commented Oct 6, 2024

fintelia commented Oct 6, 2024

spoutn1k commented Oct 7, 2024

fintelia commented Oct 7, 2024

Exif support attempt number 2 #242

Are you sure you want to change the base?

Exif support attempt number 2 #242

Conversation

spoutn1k commented Aug 23, 2024 • edited Loading

feefladder commented Oct 2, 2024

spoutn1k commented Oct 2, 2024

feefladder commented Oct 2, 2024

fintelia commented Oct 6, 2024

spoutn1k commented Oct 6, 2024

fintelia commented Oct 6, 2024

spoutn1k commented Oct 7, 2024

fintelia commented Oct 7, 2024

spoutn1k commented Aug 23, 2024 •

edited

Loading