Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

additional dimensions (excitation and emission wavelength) #14

Open
MaartenBransen opened this issue Jan 29, 2021 · 14 comments
Open

additional dimensions (excitation and emission wavelength) #14

MaartenBransen opened this issue Jan 29, 2021 · 14 comments

Comments

@MaartenBransen
Copy link

MaartenBransen commented Jan 29, 2021

In addition to the currently supported dimensions, data may be recorded with varying detection and excitation wavelengths. These are denoted by DimID 5 and 9 in the xml metadata. It would be nice and I think fairly straightforward to implement these. I am not aware what DimID's 6-8 are used for, but it may be documented somewhere.

DimID dimension unit
1 x-axis m
2 y-axis m
3 z-axis m
4 time s
5 detection wavelength m
6-8 ?
9 illumination/excitation wavelength m
10 mosaic tile

I can provide example files for wavelength sweep data, if this would be helpful.

@nimne
Copy link
Contributor

nimne commented Jan 29, 2021

An example with additional dimensions would be great! I'll see if I can find information on what 6-8 are used for. I've been unsuccessful finding much documentation about the format, so I welcome help.

In terms of implementation - where do you think these units should go, and what should they be called?

At the moment, I'm in favor of leaving the attributes relating to the image data block in LifImage.dims (x,y,c,z,t,m) and having the 'capture settings' / etc in a new attribute with more descriptive names?

@MaartenBransen
Copy link
Author

Thanks for the very quick response! I recorded some example_data today: one file where I imaged in the x-z plane rather than x-y, and another one where I used the 'spectroscopy' option of imaging as a function of excitation or emission wavelength.

At the moment, I'm in favor of leaving the attributes relating to the image data block in LifImage.dims (x,y,c,z,t,m) and having the 'capture settings' / etc in a new attribute with more descriptive names?

I fully agree with this, but to clarify: in the case of the excitation/emission wavelength I referred to, an image series is recorded as a function of wavelength, in exactly the same way that an xyz dataset is xy images as a function of z position. These are treated in identical manner in the metadata as x, y, time, etc., which I why I think it makes most sense it would go in LifImage.dims. It is different than the part of the xml data where it stores for example which laser line was used in a 'normal' xy image.

As for the name: the Leica software we use lists these options with the small and capital Greek letter lambda for the detection and illumination wavelengths respectively. In the xml data they are written out as lambda and Lambda. I don't think these are particularly clear labels, and using greek characters is probably not a good idea either.

@nimne
Copy link
Contributor

nimne commented Feb 1, 2021

Thanks for the clarification. It makes sense that the data in the file might be organized in the same order of the dimensions listed above. The confusing thing is where 'channel' fits in to the organization system. From the previous images I've built this from, it's right after X and Y.

I think it will be fairly straightforward to figure out the offsets for these dimensions. but I'm not sure exactly how to adapt the API to handle all of the dimensions. I don't know if having all of the options in get_frame makes sense? It would be something long like get_frame(z=0, t=0, c=0, em=0, ex=0, m=0). I haven't looked through your files yet, but is excitation/emission a 'dimension' (in that it goes 0, 1, 2, etc) or is it the actual wavelength (or wavelength range)?

Would it be necessary to make the respective iterators for each dimension?

@MaartenBransen
Copy link
Author

I'm not sure exactly how to adapt the API to handle all of the dimensions. I don't know if having all of the options in get_frame makes sense?

I assume that at least dimID 6-8 also exist as valid dimensions, and there may be more from 11 onwards we don't know about, so this would potentially give a very long list of optional arguments to get_frame, and a very large amount of iterators. Also the concept of a frame becomes more muddy when x and y are not guaranteed to be present and/or the fastest axes. I wrote some code ages ago that worked on these data (except exported as .tif image series), and there I ended up with a sort of get_stack function that returns a numpy.ndarray with however many dimensions the dataseries has, but I don't think that was a more convenient solution (except for my own specific use cases). For that I took the channel as the 0th dimension.

What I personally did like about that, is that it automatically checked which dimensions were present and all functions worked with an arbitrary number and combination of the possible dimensions (but always in a fixed order, following the table above). Instead of optional arguments for each possible dimension (like get_frame), that took a tuple with indices or slice objects for each dimension, if that makes sense. So for a 2D time series (xyt) time would be the 3rd dimension, but for a 3D time series (xyzt) time would be the 4th. Something along those lines may work, maybe with the first two dimensions (in byte order) defining in which plane a 'frame' is or so?

I haven't looked through your files yet, but is excitation/emission a 'dimension' (in that it goes 0, 1, 2, etc) or is it the actual wavelength (or wavelength range)?

Basically what we can do is take a series of images where e.g. the detection wavelength is systematically varied in steps, like in a spectroscope. You then get something like a video where wavelength is varied rather than time, so this is really a different kind of 'experiment' than just setting up the detection wavelength and then recording a time series. When taking a spectroscopy series, it is treated as a dimension in the same way that the others (time, z-position, etc) are, and is listed in the part of the xml data with dimensions where your code finds logicalSize and such. For one of the measurements (nx=64, ny=64, 20 detection wavelengths) for example the dimension metadata looks like this:

{'DimID': '1', 'NumberOfElements': '64', 'Origin': '1.376765e-020', 'Length': '6.603309e-006', 'Unit': 'm', 'BitInc': '0', 'BytesInc': '1'},
{'DimID': '2', 'NumberOfElements': '64', 'Origin': '1.376765e-020', 'Length': '6.603309e-006', 'Unit': 'm', 'BitInc': '0', 'BytesInc': '64'},
{'DimID': '5', 'NumberOfElements': '20', 'Origin': '5.500000e-007', 'Length': '1.900000e-007', 'Unit': 'm', 'BitInc': '0', 'BytesInc': '4096'}

Would it be necessary to make the respective iterators for each dimension?

I haven't really looked at the iterators yet, but depending on what you think about the stuff above there could be a single iterator function where you can specify which dimension to iterate over. I don't know how doable that would be to implement, but API wise it could be nice.

Thanks again for your time and for putting these codes up, it's very useful for me and some of my colleagues to be able to work with lif files in python directly. If you want, I'd also be happy to get involved and help out where I can with some of these things.

@nimne
Copy link
Contributor

nimne commented Feb 5, 2021

Thanks for taking the time to provide the detailed feedback and information. I've thought about this for a bit and I think I know how to begin to approach this problem in a few parts.

  1. Generalize the reader to read through the image data in the order of the dimensions described in the metadata. (This is the simple part)
  2. Expand the API / reader to read from arbitrary dimensions. (This is the complicated part)

Without the benefit of consistently located sliders in a UI, I'm having a hard time coming up with a consistent API to access (up-to) 10 dimensions + channels.

I think I'll leave the current API alone because it covers the most common scenarios: imaging XY planes in some number of Z-dimensions and M-tiles. This also preserves capabilities for people / packages using the current setup.

The less (but perhaps increasingly) common scenarios include your example of an XZ image, or images spectroscopic images. For these scenarios, I think creating a generalize-able function makes sense. A design choice/limitation is that the function will always need to return a 2-dimension 'image' (even if that is a n x 1 dimension image) in order to use PIL. This is similar to how ImageJ/Bioformats seems to work.

This function may be something like get_arbitrary_frame() (or some other name) that accepts a few options:

  • display_dims accepting a two value tuple setting which dimensions to return, default being whatever two dimensions are listed first in the metadata. The downside here is that behavior of the code changes depending on the input data - but given the nature of microscopy, that might not really be a problem.
  • fetch_dims accepting a dictionary of dimensions to return, allowing one value per dimension: {1: 0, 2: 0, 5: 1} etc.. with defaults as 0 if not specified. I'm not entirely sure what to do about the dimensions set as display_dims here, but maybe this would be a good chance to specify an arbitrary range for a dimension. For example {1: (0,200), 2:(10,50)}, but because the output is limited to 2D, only the range of the display_dims could be specified.
  • c for channel? I don't think this fits in the dims dictionary. While I could put it in the '0' spot, this seems non-standard.

This might also need a new numbered tuple for the dimensions, because the order of these might change. Perhaps dims_n.

An example usage might be:

from readlif.reader import LifFile
new = LifFile('./path/to/file.lif')
img_0 = new.get_image(0)
for i in range(img_0.dims_n[3]):
    img_0.get_arbitrary_frame(display_dims=(0,1), c = 0, fetch_dims={3: i}).show()

For most experiments, I can't imagine that all of the dimensions would be changing / iterated - but the code above isn't super readable either.

Perhaps this strategy is getting too complicated, but I'll see if I can make a prototype in the dev branch. These ideas are still a work in progress, and I'd be happy to get feedback.

Another (possible) option is to just make a new LifImage-like class that just returns everything in a n-dimension numpy array, but there is some other tricky decisions about how to keep the memory footprint low. Time-lapse imaging can easily end up being hundreds of GB, which is why I made this whole thing in the first place!

@MaartenBransen
Copy link
Author

A design choice/limitation is that the function will always need to return a 2-dimension 'image' (even if that is a n x 1 dimension image) in order to use PIL. This is similar to how ImageJ/Bioformats seems to work.

I was thinking about this after my last comment as well, and I think the easiest thing to do is always return a 2D image but take whatever the first two dimensions are in dimension metadata or byte order for these images (kind of what you suggested for the default of get_arbitrary_frame). This will be xy in 99% of cases and not change with respect to the current API, but also fixes my (less common) case of xz-plane imaging and seems to be how Bioformats does it as well. I guess it is safe to assume there are always at least 2 dimensions?

I guess that would also cover the majority of use cases, and I like the idea of a function for arbitrarily oriented images that covers the rest. The specific form of fetch_dims that would be intuitive is the hardest part...

c for channel? I don't think this fits in the dims dictionary. While I could put it in the '0' spot, this seems non-standard.

For what it is worth, the lif metadata treats channels as a separate thing to the dimensions, so it would be more consistent to keep it as a separate parameter.

Not sure if a function which returns a multidimensional numpy array will really add much to these. While that is what I often use for further processing, it is easy to construct by iterating over a get_frame-like function if you need it. Also like you pointed out, this can get very memory hungry when you don't think about what you're loading in.

@nimne
Copy link
Contributor

nimne commented Mar 3, 2021

I haven't forgotten about this! I have a working example up in the dev branch. This branch isn't current with main, it's based on 0.4.1.

Instead of calling the function get_arbitrary_frame, I called it get_extended_frame, although I'm not sure I like that any better. Right now, this is really slow - in part, because I calculate the offset position of each pixel, then seek to that position, then read each pixel in a big nested loop. The benefit of this is that all of the features described above work! The downside is that it is slow.

A couple of new attributes were added to LifImage:

  • dims_n, a dictionary of the numbered dims (in your table above)
  • display_dims essentially the first two dimensions (what would be displayed by ImageJ)

To use this, get_extended_frame() has two options, display_dims which is a tuple of length two describing the two dimensions to show (defaults to the ones ImageJ will display) to do fun things like displaying slices, and dims_dict which is a bit more complicated. Essentially, if you want to display the first frame on dim 4, set dims_dict = {4: 0}. Based on my previous comment, I should probably rename this to fetch_dims.

Right now, it doesn't support channels - do you happen to be able to make a multi 'channel' (detector?) image similar to the one you made before? I can probably make some inferences from the other images I have.

This still needs code cleanup, documentation, and speed optimizations. Things will change, but if you have a chance to check this on your usecase, I would appreciate it!

@MaartenBransen
Copy link
Author

Hi Nick,

This looks great already. I'll have a look and see if I can test this on some of my data tomorrow morning.

I called it get_extended_frame, although I'm not sure I like that any better.

If you end up changing it, maybe something like get_slice or get_arbitrary_slice? I suppose it depends on who you ask, but to me the word frame implies something recorded by the camera directly, while slice can be any orientation in a multidimensional dataset

Right now, it doesn't support channels - do you happen to be able to make a multi 'channel' (detector?) image similar to the one you made before?

I can record some multichannel test data tomorrow, but was that for the wavelength sweep measurements, or for the xz plane imaging?

@nimne
Copy link
Contributor

nimne commented Mar 4, 2021

If you end up changing it, maybe something like get_slice or get_arbitrary_slice? I suppose it depends on who you ask, but to me the word frame implies something recorded by the camera directly, while slice can be any orientation in a multidimensional dataset

Excellent idea! Perhaps get_plane()? I think that suggests that the result will be 2D, where as 'slicing' has more flexible meanings in other python datatypes.

I can record some multichannel test data tomorrow, but was that for the wavelength sweep measurements, or for the xz plane imaging?

The type of image doesn't really matter. I'm mostly interested in testing that the data is organized in the same way, where 'channel' acts like dimension 11. I think the X-Z-Y image would be sufficient.

@MaartenBransen
Copy link
Author

Good point that a slice isn't necessarily 2D, get_plane sounds pretty clear to me. Here's some 2-channel test data, with an xz image, xzt series and xzy series.

@nimne
Copy link
Contributor

nimne commented Mar 4, 2021

Thanks for the test data! I'll see if I can get channels working this weekend. Before I forget to ask, would I be able to include all of these test files in the github repo / project as part of the unit tests?

The other challenge here will be solving the IO problem. Right now, the code sets the offset for each pixel, reads that pixel, and moves to the next pixel. This is simply a bad approach for efficiency. This is relatively simple to speed up when loading the first two dimensions by just doing a long sequential read.

This gets harder when arbitrary dimensions are called - there is no way to make this sequential (as far as I can tell?). This starting to get outside of my current python knowledge, so there are probably approaches I'm overlooking. The common theme from what I'm finding, is that every approach will trade memory for speed. Perhaps using mmap would be a simple way to improve the seek problem? I may also need figure out the minimum and maximum bounds for the requested data, do a sequential read on that, and pick out which parts are needed. Solving this may take a while.

To accommodate large data, I think I would be able to add an option to get_slice(), perhaps something like on_disk with the default being False to keep with the "fast" part in the readme, while preserving the option to spend IO and save memory.

@MaartenBransen
Copy link
Author

Before I forget to ask, would I be able to include all of these test files in the github repo / project as part of the unit tests?

Yes, absolutely.

Right now, the code sets the offset for each pixel, reads that pixel, and moves to the next pixel.

When one of the display_dims is the first dimension in the data, it could be loaded as a block at least along that dimension, but that only helps a little and only in those cases. Ultimately though, i think it is always going to be a trade of between memory and speed, and having the option is great. I regularly work with very large datasets (100's of GBs), and in that use case memory is everything while the loading time of the data is completely negligible compared to the time it takes to subsequently process the data. I think that it is okay to trade some speed for the flexibility of a (niche) function that can load in arbitrary planes.

@nimne
Copy link
Contributor

nimne commented Mar 15, 2021

In the latest commit to the dev branch, I've handled the situation where the image is ultimately read in one big block - that really helped with the speed.

Try as I might, I can't figure out a good way to get the 'arbitraty access' faster without involving numpy. It might be worthwhile to do.. given that most of the downstream processing that would happen with this data would involve numpy, but I'll need to think about that.

Reading in different channels also works in this update, thanks for the example files! I've also renamed the new function to get_plane(display_dims=None, c=0, req_dims_dict=None) and modified the names of the arguments.

This is probably close to done feature-wise. After some code cleanup (this is all pretty ugly), updates to the documentation, and tests, this will probably make it up on to the main branch.

@nimne
Copy link
Contributor

nimne commented Apr 20, 2021

Version 0.6.1 is up with get_plane(display_dims=None, c=0, requested_dims=None). There is an issue with specifying arbitrary display_dims that needs to be solved, but I think it is related to #19. The functionality is mostly there, and is probably similar to what is possible with the leica software. Arbitrary slices can still be made after the image is loaded, but can't be directly read from disk yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants