Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft SpatialData.filter() #626

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aeisenbarth
Copy link
Contributor

(In reference to #620)

This PR imlements an more advanced filtering options than subset, allowing to create a new SpatialData object that contains only specific tables, layers, obs keys, var keys.

Use cases

  • From a concatenated SpatialData, one can extract parts of it.
  • When testing an operation that adds elements or table columns, one can extract from an expected reference dataset the input data and pass it to the operation, then compare the processed data against the reference.

Closes #280
Closes #284
Closes #556

Copy link

codecov bot commented Jul 8, 2024

Codecov Report

Attention: Patch coverage is 10.71429% with 25 lines in your changes missing coverage. Please review.

Project coverage is 91.59%. Comparing base (95d69ff) to head (d9c1e0e).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #626      +/-   ##
==========================================
- Coverage   91.93%   91.59%   -0.35%     
==========================================
  Files          44       44              
  Lines        6661     6688      +27     
==========================================
+ Hits         6124     6126       +2     
- Misses        537      562      +25     
Files Coverage Δ
src/spatialdata/_core/spatialdata.py 88.51% <10.71%> (-2.47%) ⬇️

@aeisenbarth
Copy link
Contributor Author

In the current state, it does not yet complete the issues that were aimed to resolve.

  • Subset spatialdata by list of cell ids #556:
    • A parameter instances could be added. If provided, rows of these instances will be selected in the table, if not provided, all instances are returned. But it should only allow a single region/element (otherwise it gets complicated).
    • Shapes/points elements should also be filtered (easy)
    • Still unanswered is what effect it should have on the labels. Shall we create a new labels image with the instances eliminated (0 = background), or leave the labels unchanged and just have no reference to them in the table?
  • Filter spatialData  #280:
    • This involves a condition. In my opinion, implementing a parameter with a condition as function or query expression is out of scope due to its complexity. I would do this in two steps, users use Pandas to get a list of instances, then pass the instances to the SpatialData filter function.
    • The user also asked about adjusting shapes to match the filtered instances in the table.
  • Feature request: spatial cropping from select table rows #284:
    • Filtering labels/shapes/points elements

@LucaMarconato
Copy link
Member

Thanks @aeisenbarth, after discussing with @melonora, we are going to first turn the code #627 into an internal function, merge, and then continue working on your PR. The idea is to provide a single entry point for filtering filter() and use for instance subset() or the function from Wouter internally.

@LucaMarconato LucaMarconato marked this pull request as draft July 12, 2024 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants