Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relabel block #664

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

ArneDefauw
Copy link
Contributor

Follow up PR for #588, see discussion #597 for relabeling in case output data is a labels layer (i.e. dims does not contain c) via use of a bit shift.

Copy link

codecov bot commented Aug 7, 2024

Codecov Report

Attention: Patch coverage is 95.34884% with 2 lines in your changes missing coverage. Please review.

Project coverage is 91.85%. Comparing base (b80b364) to head (cad5ce3).

Files with missing lines Patch % Lines
src/spatialdata/_core/operations/map.py 97.56% 1 Missing ⚠️
src/spatialdata/_types.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #664      +/-   ##
==========================================
+ Coverage   91.83%   91.85%   +0.02%     
==========================================
  Files          44       44              
  Lines        6781     6824      +43     
==========================================
+ Hits         6227     6268      +41     
- Misses        554      556       +2     
Files with missing lines Coverage Δ
src/spatialdata/_core/operations/map.py 97.70% <97.56%> (-0.13%) ⬇️
src/spatialdata/_types.py 75.00% <50.00%> (-3.58%) ⬇️

@ArneDefauw ArneDefauw marked this pull request as draft August 7, 2024 14:05
@ArneDefauw
Copy link
Contributor Author

Now also added helper function _relabel_sequential, see https://github.com/ArneDefauw/spatialdata/blob/633a132ddfb6353925bc099114b893df43ad3f2e/src/spatialdata/_core/operations/map.py#L211, as discussed here #597 (comment)

I think the PR is ready for review now.

@ArneDefauw ArneDefauw marked this pull request as ready for review August 8, 2024 09:18
Copy link
Member

@giovp giovp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @ArneDefauw , thanks a lot, super useful. I went back to this function in squidpy, and remember that we also did an additional step to harmonize the labels with dask image

from dask_image.ndmeasure._utils._label import (
            connected_components_delayed,
            label_adjacency_graph,
            relabel_blocks,
        )

# max because labels are not continuous (and won't be continuous)
label_groups = label_adjacency_graph(img, None, img.max())
new_labeling = connected_components_delayed(label_groups)
relabeled = relabel_blocks(img, new_labeling)

any reason for leaving this out? Otherwise for me it is good to go!

@ArneDefauw
Copy link
Contributor Author

ArneDefauw commented Aug 26, 2024

hi @ArneDefauw , thanks a lot, super useful. I went back to this function in squidpy, and remember that we also did an additional step to harmonize the labels with dask image

from dask_image.ndmeasure._utils._label import (
            connected_components_delayed,
            label_adjacency_graph,
            relabel_blocks,
        )

# max because labels are not continuous (and won't be continuous)
label_groups = label_adjacency_graph(img, None, img.max())
new_labeling = connected_components_delayed(label_groups)
relabeled = relabel_blocks(img, new_labeling)

any reason for leaving this out? Otherwise for me it is good to go!

Hi @giovp ,

thanks for the review!

Yes, I did not include the step to harmonize the labels across chunks due to some issues I observed when cell density is high, see also my previous comments.
#597 (comment)
#597 (comment)

I make it a little bit more concrete with an example:

So this is map_raster on a labels layer without harmonizing labels:

#I use the harpy package (https://github.com/saeyslab/harpy) for downloading of some test data
from sparrow.datasets import mibi_example
sdata = mibi_example()
def copy_labels( arr ):
    return arr

results=map_raster( sdata[ "masks_whole" ].chunk( 212 ), func=copy_labels, blockwise=True, relabel=True, )
from spatialdata import read_zarr

sdata[ "masks_whole_copied_relabeled_no_merging" ] = results
sdata.write_element( "masks_whole_copied_relabeled_no_merging" )
sdata = read_zarr( sdata.path )

Then without using the harmonizing of labels I get:
Screenshot 2024-08-26 at 11 35 57

and with harmonizing of the labels using

from dask_image.ndmeasure._utils._label import (
            connected_components_delayed,
            label_adjacency_graph,
            relabel_blocks,
        )

# max because labels are not continuous (and won't be continuous)
label_groups = label_adjacency_graph(img, None, img.max())
new_labeling = connected_components_delayed(label_groups)
relabeled = relabel_blocks(img, new_labeling)

I get:

Screenshot 2024-08-26 at 11 30 33

Which is not great, cells on the boundary of the chunks are merged together.

A solution for this, is using this approach:
https://github.com/gletort/NeubiasPasteur2023_AdvancedCellPose/blob/main/distributed_segmentation.py?rgh-link-date=2024-07-08T08%3A01%3A09Z for merging chunks back together.
Also implemented here https://github.com/saeyslab/harpy/blob/e4966aaf12ff0c36e301ff27df675c2c96f87ed2/src/sparrow/image/segmentation/_segmentation.py#L494,
which fixes much of these issues.

But I do not know if you want this to be included in this PR, see also comment of @LucaMarconato #597 (comment).
I would be happy to prepare a follow up PR.

@giovp
Copy link
Member

giovp commented Sep 2, 2024

hi @ArneDefauw , sorry for late reply.

I took a look at

gletort/NeubiasPasteur2023_AdvancedCellPose@main/distributed_segmentation.py?rgh-link-date=2024-07-08T08%3A01%3A09Z

it's a nice approach I think, and I wonder if it should be included already here. Meaning, I am not sure how useful it is the output of relabelling without the across-block label harmonization. Yes, relabelling fix one step of the chunked segmentation, but the result is not really actionable/useful right? Or am I missing something?

also @LucaMarconato wdyt? you can take a look at the output of the excellent post from @ArneDefauw before to understand visually what is the problem

@ArneDefauw
Copy link
Contributor Author

ArneDefauw commented Sep 17, 2024

hi @ArneDefauw , sorry for late reply.

I took a look at

gletort/NeubiasPasteur2023_AdvancedCellPose@main/distributed_segmentation.py?rgh-link-date=2024-07-08T08%3A01%3A09Z

it's a nice approach I think, and I wonder if it should be included already here. Meaning, I am not sure how useful it is the output of relabelling without the across-block label harmonization. Yes, relabelling fix one step of the chunked segmentation, but the result is not really actionable/useful right? Or am I missing something?

also @LucaMarconato wdyt? you can take a look at the output of the excellent post from @ArneDefauw before to understand visually what is the problem

Yes, indeed, if you do not know how to do the across-block label harmonization, the result will not be very useful. On the other hand, there are probably many ways to do it.
Maybe spatialdata should at least implement one method to do this label harmonization.
But I think it is up to you guys to decide :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants