Add support for iterative pruning based on user-specified discard conditions #214

ns-rse · 2023-10-16T06:44:22Z

First stab at iterative pruning.

Introduces the function `iteratively_prune_paths()` which will prune paths of type 1 and 3 until there are a user specified (via `min_skeleton`) number of paths left. * Documentation on pruning skeleton included. * Retains the `keep_images` option by adding it as an attribute if supplied. Required because the `.prune_paths()` method when applied returns the pruned skeleton but didn't retain this option (as far as I could work out). * Tests on dummy skeletons are included

Refines the steps taken in pruning such that large loops are retained as the target if present or if skeletons are linear then the longest path is retained. Tests are included for a range of dummy skeletons.

jni · 2023-10-16T13:57:06Z

I've made some changes to the code, still needs work but hopefully enough to get the gist.

Ultimately, I think we need to be able to go from Skeleton to Graph and back losslessly. Like sparse array formats, each representation has strengths and weaknesses and by "allowing" either format we enable different algorithms.

I was looking to convert NetworkX objects back to Skeletons and stumbled on the `nx.to_numpy_array()` function and the related `nx.from_numpy_array()`. Since `csr.skeleton_image` are typically Numpy Arrays I figured this would be a convenient and efficient method of converting between the two formats so have attempted to replace `skeleton_to_nx()` with this. I had written some tests for `skeleton_to_nx()` first before making undertaking refactoring. I found that `nx.from_numpy_array()` doesn't handle non-square arrays, ok not a problem we can pad them with zeros (wrote a small function to do this along with tests), but on applying `nx.from_numpy_array()` to these padded arrays the tests I wrote to originally failed, which surprised me as, in my naivety, I wouldn't expect padding with zero's to adjacency graphs to affect the resulting network representation in terms of the number of nodes and the number of edges.

jni · 2023-11-04T05:56:30Z

@ns-rse looking at the description and implementation (not tests) of 0ad90df, I don't see how it could ever work, because a skeleton image is not what to_numpy_array is producing: it produces an adjacency matrix — and loses all our metadata (such as the path between junctions and tips) in the process.

The naive way to produce the image is to make an empty image, then iterate over all the edges and set the edge coordinates to 1 in the image, before passing that to Skeleton. I have vague hopes of finding something faster by going directly from the edges to the csr_matrix for paths, but that sounds complicated — and can be left to a later optimisation.

ns-rse · 2023-11-08T09:57:54Z

Yes, I'm realising I have deficiency in my udnerstanding of graph theory NetworkX and how Skan should be interacting with it. The tests are hopefully going to be useful and your code to go from Skeleton to NetworkX graph work but I need to rethink how to get back from NetworkX to Skeletons whilst preserving the metadata (and also be mindful that we might prune the graph under NetworkX). Hope to spend some more time on it later this week.

@MaxGamill-Sheffield

Introduces conversion of Numpy arrays (_not_ adjacency matrices) to NetworkX graphs. This is done by assuming that each element of a 2-D boolean array[1] is a node connected by edges, we look at what values the neighbours are and > 0 the two points form an edge which is added to a NetworkX graph. The shape of the original Numpy array is saved within the `Networkx.graph` object which makes reconstructing the original Numpy array straight-forward (as we may be pruning some of the original co-ordinates may be removed). Thanks to @MaxGamill-Sheffield for guiding me to this approach and as noted a solution found on SO for how to find the co-ordinates of neighbouring cells.

ns-rse · 2023-11-08T17:15:31Z

@jni Spent some time on this today and have a method of going back and forth from Numpy <--> NetworkX (thanks to my colleague @MaxGamill-Sheffield who shared some code that I refactored abit).

One of the problems I had cognitively was reconciling what the TopoStats work calls a skeleton with the skan / NetworkX representation. The solution in fbc0988 treats every element of a 2-D array as a node and makes going back and forth between the two straight-forward.

Let me know what you think.

jni · 2023-11-23T03:32:20Z

The solution in fbc0988 treats every element of a 2-D array as a node and makes going back and forth between the two straight-forward.

Ah, sorry, I didn't have time to review this at the time.

I don't think that's going to fly unfortunately. Actually, a bit of lore: all the main implementations of skan are in a file called csr.py because the very earliest implementations were in networkx. But we found that was suuuuper slow so I went about methodically converting everything to scipy.sparse.csr_matrix and writing out skan as it is today. But while I was working on that it was siloed in this little csr.py file, until that file became the whole thing. 😊

So, I think that the 2D pixel graph is too low level to try to do a round trip — we will be swapping one performance issue for another. The correct level of abstraction for the Skeleton, imho, is the junction-junction graph, which is what you get when you make use the node id pairs (src, dst) from the summarize table as an edge list. Then it's a matter of putting the right metadata on the networkx graph. For example, each row of the summarize table has a row index, which corresponds to path_coordinates(idx) on the Skeleton object — so we can store the coordinates (or maybe just the pixel IDs) as an edge attribute. Then, when you prune, you need to merge two edges, and the path coordinates of the new edge is just np.concatenate([edge1['coords'], edge2['coords'][1:]], axis=0), for example. All of this gives you enough information to reconstruct a new Skeleton object, in the worst case by writing out the pixels and recomputing a csr skeleton graph — but maybe faster, I need to think about it.

ns-rse · 2023-12-09T09:57:23Z

I've been having a play around with this on ns-rse/networkx and had a go at passing the coordinates as edge attributes to NetworkX and reonconstructing paths between co-ordinates using Bresnham's Line algorithm.

It works in some instances but in writing tests I discovered there is insufficient data retained. For example skeleton1 in src/skan/_testdata.py has a loop in it. This results in two nodes with the same co-ordinates but there is no information about what the points are connecting them.

from skan import Skeleton, summarize
from skan._testdata import skeleton1

summarize(Skeleton(skeleton1))[['skeleton-id', 'coord-src-0', 'coord-src-1', 'coord-dst-0', 'coord-dst-1']]

   skeleton-id  coord-src-0  coord-src-1  coord-dst-0  coord-dst-1
0            0            2            1            3            3
1            0            2            1            3            3
2            0            2            1            4            0
3            0            3            3            4            6

...and so we've no idea how those two points differ in terms of the path taken between them. This is going to be true of any non-linear edge and so whilst we could recombine the path co-ordinates of edges after pruning in Networkx as you suggested we would have no idea the path taken between those points.

The only solution I can think of so far that doesn't involve using every point as a node is to pass the co-ordinates of the points that make up an edge as an attribute to NetworkX and concatenate those after pruning. Not tried this yet and have been asked to work on something else in the coming weeks but will think about it and try and find some time to try this out.

jni · 2024-01-04T13:21:15Z

@ns-rse sorry about the long wait here, as it has been very busy trying to get napari 0.4.19 out the door (still not done) and also family stuff. But I haven't forgotten this!

Quick thoughts:

reonconstructing paths between co-ordinates using Bresnham's Line algorithm.

I don't think this should be needed because I think we should keep the path coordinates and they should be pixel-perfect — there should be no need to infer any pixels. You need to keep the info from both the summary and the skeleton using path_coordinates(i) where i is the path ID of each row in the summary. iirc I got those matched up exactly at some point.

The only solution I can think of so far that doesn't involve using every point as a node is to pass the co-ordinates of the points that make up an edge as an attribute to NetworkX and concatenate those after pruning. Not tried this yet and have been asked to work on something else in the coming weeks but will think about it and try and find some time to try this out.

Yes like that. But that still vastly reduces the number of nodes in the graph.

jni · 2024-01-04T13:21:46Z

I will try to come back to this in the next couple of weeks but didn't want to leave you hanging any longer, again, sorry about the wait!

ns-rse · 2024-01-04T13:45:00Z

Don't worry at all @jni we all have lives. Hope you managed some time away from the computer.

That is however super-useful I hadn't clocked path_coordinates() method and think that would be the perfect tool to for adding the co-ordinates as an attribute.

I'll have a tinker next week and see how I get on.

Thank you and Happy New Year

jni · 2024-05-01T06:07:50Z

@ns-rse would you mind rebasing this on main? I can do it if you are too busy right now though.

Adapting to NetworkX graphs. Currently tests fail for src/skan/test/test_csr.py::test_iteratively_prune_[paths.multiple_paths] as there is no 'discard' argument. Will revisit and try and work this out, been so long I don't understand how this argument evaluates to boolean and how to construct the tests.

ns-rse · 2024-05-01T21:20:51Z

@jni I merged main in and solved the conflicts, tests failed and so I tried to sort those out locally. Fixed the failing ones but there were more that failed and I haven't got them all sorted yet though as I don't understand how the discard option evaluates to boolean for the itteratively_prune_paths() function. Will try and take a look over the coming days and work that out.

Even when there is only one edge, for consistency, every edge in a multi-graph also contains a key (starting at 0). We use the key everywhere, which makes the code a bit more complicated if you don't expect multigraph conventions, but more consistent overall.

jni · 2024-06-29T02:02:27Z

I'm almost there, but I think my current approach is too gung-ho... 😂

jni · 2024-06-29T07:21:37Z

Ah, the hardest problem in computer science — off by one errors. This is one of the intermediate steps:

So when removing some of the branches we are breaking up the inner loop, at which point we're already doomed. 😂 Otherwise it's looking pretty good though. Just need to figure out how to avoid cutting open the middle branches...

jni · 2024-06-29T07:49:56Z

btw @ns-rse it's a cool data sample — can you tell me what the xy pixel spacing/resolution is? We can incorporate that into the example.

This is useful for debugging - can make a parameter later

ns-rse · 2024-07-01T08:26:50Z

Hi @jni

Ah, the hardest problem in computer science — off by one errors. This is one of the intermediate steps:

😆

I encountered this with the linear molecules though, it just kept on pruning everything back. I think I solved it by looking for the longest path, but not sure how that transfers to loops/closed structures where a break is then introduced.

btw @ns-rse it's a cool data sample — can you tell me what the xy pixel spacing/resolution is? We can incorporate that into the example.

The pixel to nanometre scaling on these images varies depending on the resolution used in the scanning. I think this image is from minicircle.spm where the scaling is 0.4940029296875

Thanks for continuing to hack away at this 👍

Let me know if there is anything I can help with.

jni · 2024-07-01T09:31:36Z

0.4940029296875

... nm/pixel? ... That's cool/crazy! 😃

ns-rse and others added 8 commits July 26, 2023 16:47

Pruning loops and linear skeletons to retain longest path

64e8593

Refines the steps taken in pruning such that large loops are retained as the target if present or if skeletons are linear then the longest path is retained. Tests are included for a range of dummy skeletons.

Loops and linear skeletons retain longest path

fd02995

Refines the steps taken in pruning such that large loops are retained as the target if present or if skeletons are linear then the longest path is retained. Tests are included for a range of dummy skeletons.

Adding examples for pruning

ae0bf51

Adding examples for pruning

b2fb6cc

Merge branch 'main' into ns-rse/iterative-pruning

010bda3

Add function to make nx Graph from Skeleton

5b6d5a3

Add function to compute main branches in nx directly

482d2d1

jni changed the title ~~Ns rse/iterative pruning~~ Add support for iterative pruning based on user-specified discard conditions Oct 16, 2023

Initial work on networkx-based pruning

eac5da3

jni marked this pull request as draft October 16, 2023 13:55

ns-rse added 2 commits October 17, 2023 13:10

Tweak to text

19a61cc

ns-rse added 4 commits November 8, 2023 22:12

Correcting doc string

e2d5110

Merge branch 'main' into ns-rse/iterative-pruning

9eb836b

Correcting import order

db48a41

Correcting test, had chopped part of key fixture resolving conflicts

b1c4127

ns-rse mentioned this pull request Jan 30, 2024

Add function to make nx Graph from Skeleton #224

Merged

Merge branch 'main' into ns-rse/iterative-pruning

a44dfbd

jni added 5 commits June 29, 2024 10:05

Add full data dict to graph attributes

643cea7

Add example predicate with is_endpoint

4c2f37f

Handle self-edges / redundant edges

8969838

Updates to iterative pruning example

20d6470

Merge branch 'main' into ns-rse/iterative-pruning

cfa9d36

jni added 4 commits June 29, 2024 17:57

Use minimum buffer size of 1 to support empty ims

b642c29

Use keep_images=True when creating skeletons

01f338f

This is useful for debugging - can make a parameter later

Include values an indices in merged edges

141ccb4

Add todo note for remaining attributes

669eeba

jni mentioned this pull request Oct 18, 2024

numpy 2.0-compatible skan package not on PyPI #230

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for iterative pruning based on user-specified discard conditions #214

Add support for iterative pruning based on user-specified discard conditions #214

ns-rse commented Oct 16, 2023

jni commented Oct 16, 2023

jni commented Nov 4, 2023

ns-rse commented Nov 8, 2023

ns-rse commented Nov 8, 2023

jni commented Nov 23, 2023

ns-rse commented Dec 9, 2023

jni commented Jan 4, 2024

jni commented Jan 4, 2024

ns-rse commented Jan 4, 2024

jni commented May 1, 2024

ns-rse commented May 1, 2024

jni commented Jun 29, 2024

jni commented Jun 29, 2024

jni commented Jun 29, 2024

ns-rse commented Jul 1, 2024

jni commented Jul 1, 2024

Add support for iterative pruning based on user-specified discard conditions #214

Are you sure you want to change the base?

Add support for iterative pruning based on user-specified discard conditions #214

Conversation

ns-rse commented Oct 16, 2023

jni commented Oct 16, 2023

jni commented Nov 4, 2023

ns-rse commented Nov 8, 2023

ns-rse commented Nov 8, 2023

jni commented Nov 23, 2023

ns-rse commented Dec 9, 2023

jni commented Jan 4, 2024

jni commented Jan 4, 2024

ns-rse commented Jan 4, 2024

jni commented May 1, 2024

ns-rse commented May 1, 2024

jni commented Jun 29, 2024

jni commented Jun 29, 2024

jni commented Jun 29, 2024

ns-rse commented Jul 1, 2024

jni commented Jul 1, 2024