Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for iterative pruning based on user-specified discard conditions #214

Draft
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

ns-rse
Copy link
Contributor

@ns-rse ns-rse commented Oct 16, 2023

First stab at iterative pruning.

ns-rse and others added 8 commits July 26, 2023 16:47
Introduces the function `iteratively_prune_paths()` which will prune paths of type 1 and 3 until there are a user
specified (via `min_skeleton`) number of paths left.

* Documentation on pruning skeleton included.
* Retains the `keep_images` option by adding it as an attribute if supplied. Required because the `.prune_paths()`
  method when applied returns the pruned skeleton but didn't retain this option (as far as I could work out).
* Tests on dummy skeletons are included
Refines the steps taken in pruning such that large loops are retained as the target if present or if skeletons are
linear then the longest path is retained.

Tests are included for a range of dummy skeletons.
Refines the steps taken in pruning such that large loops are retained as the target if present or if skeletons are
linear then the longest path is retained.

Tests are included for a range of dummy skeletons.
@jni jni changed the title Ns rse/iterative pruning Add support for iterative pruning based on user-specified discard conditions Oct 16, 2023
@jni jni marked this pull request as draft October 16, 2023 13:55
@jni
Copy link
Owner

jni commented Oct 16, 2023

I've made some changes to the code, still needs work but hopefully enough to get the gist.

Ultimately, I think we need to be able to go from Skeleton to Graph and back losslessly. Like sparse array formats, each representation has strengths and weaknesses and by "allowing" either format we enable different algorithms.

I was looking to convert NetworkX objects back to Skeletons and stumbled on the `nx.to_numpy_array()` function and the
related `nx.from_numpy_array()`.

Since `csr.skeleton_image` are typically Numpy Arrays I figured this would be a convenient and efficient method of
converting between the two formats so have attempted to replace `skeleton_to_nx()` with this.

I had written some tests for `skeleton_to_nx()` first before making undertaking refactoring.

I found that `nx.from_numpy_array()` doesn't handle non-square arrays, ok not a problem we can pad them with
zeros (wrote a small function to do this along with tests), but on applying `nx.from_numpy_array()` to these padded
arrays the tests I wrote to originally failed, which surprised me as, in my naivety, I wouldn't expect padding with
zero's to adjacency graphs to affect the resulting network representation in terms of the number of nodes and the number
of edges.
@jni
Copy link
Owner

jni commented Nov 4, 2023

@ns-rse looking at the description and implementation (not tests) of 0ad90df, I don't see how it could ever work, because a skeleton image is not what to_numpy_array is producing: it produces an adjacency matrix — and loses all our metadata (such as the path between junctions and tips) in the process.

The naive way to produce the image is to make an empty image, then iterate over all the edges and set the edge coordinates to 1 in the image, before passing that to Skeleton. I have vague hopes of finding something faster by going directly from the edges to the csr_matrix for paths, but that sounds complicated — and can be left to a later optimisation.

@ns-rse
Copy link
Contributor Author

ns-rse commented Nov 8, 2023

Yes, I'm realising I have deficiency in my udnerstanding of graph theory NetworkX and how Skan should be interacting with it. The tests are hopefully going to be useful and your code to go from Skeleton to NetworkX graph work but I need to rethink how to get back from NetworkX to Skeletons whilst preserving the metadata (and also be mindful that we might prune the graph under NetworkX). Hope to spend some more time on it later this week.

Introduces conversion of Numpy arrays (_not_ adjacency matrices) to NetworkX graphs.

This is done by assuming that each element of a 2-D boolean array[1] is a node connected by edges, we look at what
values the neighbours are and > 0 the two points form an edge which is added to a NetworkX graph.

The shape of the original Numpy array is saved within the `Networkx.graph` object which makes
reconstructing the original Numpy array straight-forward (as we may be pruning some of the original co-ordinates may be
removed).

Thanks to @MaxGamill-Sheffield for guiding me to this approach and as noted a solution found on SO for how to find the
co-ordinates of neighbouring cells.
@ns-rse
Copy link
Contributor Author

ns-rse commented Nov 8, 2023

@jni Spent some time on this today and have a method of going back and forth from Numpy <--> NetworkX (thanks to my colleague @MaxGamill-Sheffield who shared some code that I refactored abit).

One of the problems I had cognitively was reconciling what the TopoStats work calls a skeleton with the skan / NetworkX representation. The solution in fbc0988 treats every element of a 2-D array as a node and makes going back and forth between the two straight-forward.

Let me know what you think.

@jni
Copy link
Owner

jni commented Nov 23, 2023

The solution in fbc0988 treats every element of a 2-D array as a node and makes going back and forth between the two straight-forward.

Ah, sorry, I didn't have time to review this at the time.

I don't think that's going to fly unfortunately. Actually, a bit of lore: all the main implementations of skan are in a file called csr.py because the very earliest implementations were in networkx. But we found that was suuuuper slow so I went about methodically converting everything to scipy.sparse.csr_matrix and writing out skan as it is today. But while I was working on that it was siloed in this little csr.py file, until that file became the whole thing. 😊

So, I think that the 2D pixel graph is too low level to try to do a round trip — we will be swapping one performance issue for another. The correct level of abstraction for the Skeleton, imho, is the junction-junction graph, which is what you get when you make use the node id pairs (src, dst) from the summarize table as an edge list. Then it's a matter of putting the right metadata on the networkx graph. For example, each row of the summarize table has a row index, which corresponds to path_coordinates(idx) on the Skeleton object — so we can store the coordinates (or maybe just the pixel IDs) as an edge attribute. Then, when you prune, you need to merge two edges, and the path coordinates of the new edge is just np.concatenate([edge1['coords'], edge2['coords'][1:]], axis=0), for example. All of this gives you enough information to reconstruct a new Skeleton object, in the worst case by writing out the pixels and recomputing a csr skeleton graph — but maybe faster, I need to think about it.

@ns-rse
Copy link
Contributor Author

ns-rse commented Dec 9, 2023

I've been having a play around with this on ns-rse/networkx and had a go at passing the coordinates as edge attributes to NetworkX and reonconstructing paths between co-ordinates using Bresnham's Line algorithm.

It works in some instances but in writing tests I discovered there is insufficient data retained. For example skeleton1 in src/skan/_testdata.py has a loop in it. This results in two nodes with the same co-ordinates but there is no information about what the points are connecting them.

from skan import Skeleton, summarize
from skan._testdata import skeleton1

summarize(Skeleton(skeleton1))[['skeleton-id', 'coord-src-0', 'coord-src-1', 'coord-dst-0', 'coord-dst-1']]

   skeleton-id  coord-src-0  coord-src-1  coord-dst-0  coord-dst-1
0            0            2            1            3            3
1            0            2            1            3            3
2            0            2            1            4            0
3            0            3            3            4            6

...and so we've no idea how those two points differ in terms of the path taken between them. This is going to be true of any non-linear edge and so whilst we could recombine the path co-ordinates of edges after pruning in Networkx as you suggested we would have no idea the path taken between those points.

The only solution I can think of so far that doesn't involve using every point as a node is to pass the co-ordinates of the points that make up an edge as an attribute to NetworkX and concatenate those after pruning. Not tried this yet and have been asked to work on something else in the coming weeks but will think about it and try and find some time to try this out.

@jni
Copy link
Owner

jni commented Jan 4, 2024

@ns-rse sorry about the long wait here, as it has been very busy trying to get napari 0.4.19 out the door (still not done) and also family stuff. But I haven't forgotten this!

Quick thoughts:

reonconstructing paths between co-ordinates using Bresnham's Line algorithm.

I don't think this should be needed because I think we should keep the path coordinates and they should be pixel-perfect — there should be no need to infer any pixels. You need to keep the info from both the summary and the skeleton using path_coordinates(i) where i is the path ID of each row in the summary. iirc I got those matched up exactly at some point.

The only solution I can think of so far that doesn't involve using every point as a node is to pass the co-ordinates of the points that make up an edge as an attribute to NetworkX and concatenate those after pruning. Not tried this yet and have been asked to work on something else in the coming weeks but will think about it and try and find some time to try this out.

Yes like that. But that still vastly reduces the number of nodes in the graph.

@jni
Copy link
Owner

jni commented Jan 4, 2024

I will try to come back to this in the next couple of weeks but didn't want to leave you hanging any longer, again, sorry about the wait!

@ns-rse
Copy link
Contributor Author

ns-rse commented Jan 4, 2024

Don't worry at all @jni we all have lives. Hope you managed some time away from the computer.

That is however super-useful I hadn't clocked path_coordinates() method and think that would be the perfect tool to for adding the co-ordinates as an attribute.

I'll have a tinker next week and see how I get on.

Thank you and Happy New Year

@jni
Copy link
Owner

jni commented May 1, 2024

@ns-rse would you mind rebasing this on main? I can do it if you are too busy right now though.

Adapting to NetworkX graphs.

Currently tests fail for src/skan/test/test_csr.py::test_iteratively_prune_[paths.multiple_paths] as there is no
'discard' argument. Will revisit and try and work this out, been so long I don't understand how this argument evaluates
to boolean and how to construct the tests.
@ns-rse
Copy link
Contributor Author

ns-rse commented May 1, 2024

@jni I merged main in and solved the conflicts, tests failed and so I tried to sort those out locally. Fixed the failing ones but there were more that failed and I haven't got them all sorted yet though as I don't understand how the discard option evaluates to boolean for the itteratively_prune_paths() function. Will try and take a look over the coming days and work that out.

jni added 5 commits June 29, 2024 10:05
Even when there is only one edge, for consistency, every edge in a multi-graph also contains a key (starting at 0). We use the key everywhere, which makes the code a bit more complicated if you don't expect multigraph conventions, but more consistent overall.
@jni
Copy link
Owner

jni commented Jun 29, 2024

I'm almost there, but I think my current approach is too gung-ho... 😂

Screenshot 2024-06-29 at 11 57 46 AM

@jni
Copy link
Owner

jni commented Jun 29, 2024

Ah, the hardest problem in computer science — off by one errors. This is one of the intermediate steps:

Screenshot 2024-06-29 at 5 19 26 PM Screenshot 2024-06-29 at 5 18 30 PM

So when removing some of the branches we are breaking up the inner loop, at which point we're already doomed. 😂 Otherwise it's looking pretty good though. Just need to figure out how to avoid cutting open the middle branches...

@jni
Copy link
Owner

jni commented Jun 29, 2024

btw @ns-rse it's a cool data sample — can you tell me what the xy pixel spacing/resolution is? We can incorporate that into the example.

@ns-rse
Copy link
Contributor Author

ns-rse commented Jul 1, 2024

Hi @jni

Ah, the hardest problem in computer science — off by one errors. This is one of the intermediate steps:

😆

I encountered this with the linear molecules though, it just kept on pruning everything back. I think I solved it by looking for the longest path, but not sure how that transfers to loops/closed structures where a break is then introduced.

btw @ns-rse it's a cool data sample — can you tell me what the xy pixel spacing/resolution is? We can incorporate that into the example.

The pixel to nanometre scaling on these images varies depending on the resolution used in the scanning. I think this image is from minicircle.spm where the scaling is 0.4940029296875

Thanks for continuing to hack away at this 👍

Let me know if there is anything I can help with.

@jni
Copy link
Owner

jni commented Jul 1, 2024

0.4940029296875

... nm/pixel? ... That's cool/crazy! 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants