-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implications of compression algorithms #17
Comments
I agree that the issue you're describing is somewhat agnostic to the TRX format. But given there's potential consequences for TRX, it's probably a reasonable place to have the discussion. There's a few discrete concepts in here, so I'm going to try to tease them apart and enumerate so that we don't all immediately start talking cross-purposes. I don't think that "degree of lossiness" is the right dimension of classification to be using.
So my personal conclusion would be:
|
Hi Robert, Thanks for the quick, thorough, and thoughtful reply. As for my somewhat arbitrary designation of “lossyness” categories, I think the aim there was to divide up the various forms of compression into categories that I would intuitively think don’t really impact segmentation outcomes (i.e. “Insignificantly to mildly lossy”) and compression methods which I have good reason to believe would impact segmentation outcomes (“Moderately to appreciably lossy“). As such, I don’t really have concerns about the “Insignificantly to mildly lossy” methods (e.g the QFib-like approaches, which you indeed note to be a result in a distinct source of error). Also, my apologies for the terminological sloppiness with my use of streamlines vs vertices (or ordered sequences thereof). Though it involves a bit of speculation on my part, I think my (former) lack of a well defined delineation between these concepts stems from the manner in which I am used to interacting with these representations/data objects and the toolboxes I typically use. As I will note later, much of the tractography segmentation code I am familiar with doesn't really operate in voxel space (insofar as I can tell). I think there may be some non-trivial implications for this. In any case, I am glad to see that there are methods for ensuring the acquisition of volumetrically stored quantifications for tractography. That being said, I’m not sure that the upsampling you describe can ensure that compressed tractography is robust to some of the ROI-based segmentation issues that may arise (particularly in the case of planar ROIs). I’m almost certainly stepping beyond my understanding here (I don’t program in C), but from the code provided it appears that the upsampling is performed based on a check of the tractography header (though it may be that individual streamlines’ step sizes are being checked by “step_size && std::isfinite (step_size))”. If so, it seems to me that:
I may be misunderstanding some of this though, as the relation to image voxel size seems as though it may be a separate issue. Insofar as the segmentation implementations I am familiar with (described later) I’m not certain the “streamline to voxel” mapping is really a factor for tractography segmentation (i.e. ROI - vertex intersection assessment) While I would certainly agree that it would be ideal if downstream software were sufficiently robust so as to be capable of being agnostic to this issue, I’m not sure if, in practice, they are. A quick scan of some segmentation (i.e. ROI-vertex/streamline intersection assessment) methods seems to reveal to me that they are all implicitly dependent on an assumption of at least semi-regular spacing (i.e. not dropping a multitude of adjacent/collinear vertices) and are therefore vulnerable to violations of this assumption and/or particularly widely spaced vertices: (again, apologies if I am misunderstanding these code instances or the overarching issues at hand)
I agree that there are even further nuanced issues relating to the equidistant spacing of vertices, and appreciate your consideration of these (under "Vertices may or may not be equi-distant"). However, to my consideration of the existent downstream consumer packages of tractography–at least insofar as ROI based tractography segmentation is concerned–it seems that there’s a pervasive (and preexisting) susceptibility to this issue. My concern is that, if this tractography format makes the relevant forms of compression prominent or easily accessible either (1) these consumer packages are going to begin to encounter a lot of difficulties (that they may not be encountering now due to the relative infrequent use of these compression tools) and/or (2) they may need to be rewritten in such a way that their mechanisms become opaque and more difficult for less experienced users to build upon (cdist may be inelegant, but it is interpretable and intuitive). |
I think that what I'm trying to point at (admittedly trying to maximise accuracy and maybe not being up-front) is that, to me, the important distinction is not between a small vs. large change in vertex position, but between changes in vertex positions along the streamline vs. changes in vertex positions orthogonally to the local streamline trajectory. Influence of the former can be mitigated almost to the point of disappearing through good software design, whereas the latter is genuine "noise".
Agreed that mapping streamlines to voxels is only one specific use case. Part of the reasoning for my argument above is however that the distinction is agnostic to application; displacing vertex locations along the streamline trajectory should ideally not result in changes in outcomes, no matter what analysis you are performing (as long as it is "designed well"), whereas displacements orthogonally to this can be expected to have some consequence regardless of the analysis being performed. An example that comes to mind is streamline clustering, where you have two streamline trajectories and want to compute some measure of "distance" between them. There's a lot of ways to do this, and it can be expensive depending on how you do it. One solution is to just resample each streamline to have some fixed number of (approximately?) equidistant points, and then compute the distance between vertex 1 between the two streamlines, the distance between vertex 2 between the two streamlines, etc., and then take e.g. the mean or maximum of these distances across vertices. So there's a resampling operation within the algorithm itself, and so the consequences of such are "deferred" to the algorithm rather than placing stringent requirements on the input data.
I'm not sure if it can ever "ensure" such; but we can at least "mitigate" it to a known degree. E.g. In the linked code, we look at the ratio between the voxel size and the inter-vertex distance, and perform upsampling to what's deemed an adequate degree. For instance, if we want the distance between two vertices to be no more than 1/3 the length of a voxel, and the tractography step size with 1/2 the length of a voxel, we will upsample the streamline by a factor of 2 prior to mapping to the voxel grid. So if a ROI is genuinely a single plane of selected voxels, it's mathematically impossible for the streamline to cross that plane without having at least one vertex assigned to a voxel within that plane. It may however be possible, if you have a ROI consisting of a single voxel, to have a streamline cut across the corner of that one voxel but not have a vertex assigned to that voxel; this is where the "precise" mapping mechanism is advantageous, because it explicitly catches those cases.
No, there's no "step size per streamline" information; but e.g. if you concatenate two tractography files with different step sizes, "the step size" is no longer a finite number, so sometimes that requires special handling.
I attempted to deal with this case by having separate header entries for "
We might need to flesh this detail out fully to make sure we're not talking cross-purposes. One interpretation of your description here is that you have an image, let's say with dimensions 128x128x1, where the value of all voxels is 1. If the image header transform is approximately an identity matrix, this is an axial slice. But testing streamline vertices against such an image is not all that different to a 128x128x64 image where only those voxels within a single slice are 1: the only difference is whether vertices outside of that slice are classified as "within the FoV but outside of the ROI" vs. "outside of the FoV". The consequences should be the same. Reason I want to be clear is that one could reasonably interpret "a 2D planar ROI" as being defined as an infinitessimally-thin plane, defined by e.g. an origin position and a normal vector. A streamline vertex cannot be "within the ROI" in this case, it can only be "above or below" the plane. Internal handling of such a criterion is quite substantially different.
Even if that is the case, I do not think that that should be used as justification for TRX to enforce equi-distant vertices. Indeed it may be the case that some streamlines tractography algorithms don't operate based on equi-distant vertices, and such a proposal would make such algorithms incompatible with TRX. I think that we can consider different ways in which TRX can encode relevant information about vertex spacing, based on streamline generation / resampling / compression / etc., and any software performing conversion or import of such data can then issue a warning if considered relevant. For instance, writing a converter to take TRX data into a format readable by some other package, knowing that that package possesses this weakness in the presence of gross downsampling, and failing to issue a warning to the user if the TRX data has undergone such, would be poor software engineering. But protocols around this would need to be considered pretty carefully. E.g. in an example I gave earlier, streamlines data had a fixed & known step size, but once these underwent a non-linear spatial transformation this was no longer true. So at least having some centralised document detailing the problem & possible solutions is warranted.
Yeah, our code is not the most accessible. As linked previously, there are checker functions here for deciding whether a streamline needs to be upsampled prior to mapping to a voxel grid; the actual upsampling is done by this class; as far as mapping vertices to a voxel grid, it can occur in a lot of places via simply image interpolation, so hard to link; but for a complete streamline, in particular for the |
Hi all,
My apologies for getting to this conversation late, but I had some questions about the implications of the compression/linearization capabilities afforded by this new implementation. There is an existing discussion here, but to my reading it largely deals with the proximal details as they relate to implementation and immediate consequences. My considerations relate to implications that are a bit further downstream.
This may be an arbitrary delineation, but to me, it seems like the discussed streamline-focused, lossy compression methods can be divided into two tiers of “lossyness”:
To me, these two categories are distinct in that the first corresponds to a slight loss in precision (and thus the introduction of a slight bit of noise), while the latter actually modifies the data representation itself through the removal of nodes/vertices.
The implications of these latter strategies are partially determined by how “decompression” is implemented (if at all). For example:
I understand that there are significant efficiencies to be gained through some of these methods, and that for some of us (e.g. those doing connectomics) the quantifications are unaltered. Moreover, I understand that some of us have gone so far as to quantify the impact of such approaches to varying degrees (e.g here and here). That being said, I have some concerns about the implications of such compression methods for contemporary tractogram segmentation methods themselves (explored to some extent in the later of the aforementioned papers), as opposed to tractometry or derived measurements.
Specifically, there seem to be a plentiful number of cases wherein these methods would impact the outcome of a segmentation non-trivially (as discussed in the caveats at the bottom of this page). Namely:
To some extent, I recognize that these concerns aren’t actually arising from the compression methods being proposed here, but rather downstream consequences of their application. As such, it may be beyond the scope of concerns we are intending to address with this format implementation. That being said, if the goal is to implement a tractography standard that can be used across the community, then it stands to reason that we are designing for a range of potential users, some non-trivial proportion of which won’t be familiar with the nuanced implications discussed above. As such it is quite possible that such users could unintentionally run afoul of these issues and/or encounter difficulty replicating results using “the same data set”.
Are we at all concerned about these possibilities, or are we adopting a more laissez-faire approach to use of this format and its features? If we are concerned about these possibilities, how do we make them salient, or how do we attempt to shape users’ default dispositions?
The text was updated successfully, but these errors were encountered: