Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize cross correlation histogram function #597

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

morales-gregorio
Copy link
Collaborator

Hi,

I was using the CCH function for some analysis and was very limited by the fact that it only accepted BinnedSpiketrains, because I also wanted to use it to compute CCH of non-spike data.

I went ahead and started modifying the function to enable using numpy arrays and encountered many many issues that I think made the code more difficult to understand and prone to unexpected behaviour. Here is the list of the things I found confusing and changed in this PR, we can roll some back if you still think they are worth having:

  1. A lot of the code is there to deal with cross correlating arrays of different shape. But I cannot think of any use case where anyone would want to do that. The neurons should have been recorded simultaneously and it makes no sense to me to correlate one time series with another after one of them has finished. I removed all functionality relating to this, which already simplified the code greatly.
  2. Another big issue is the use of spiketrains that have non-aligned t_start. I think the resulting statistics from spikes jumping around bins is not necessarily linearly predictable. The best practice here should be to slice the spiketrains before binning to the same t_start, hence I removed the functionality and included the strict criterion of having the same t_start.
  3. Why include a kernel function in this function? Seems like a postprocessing step that people should do to their result if they want to. I suggest removing it, if someone wants to use a kernel, they can do so with the output of the function.
  4. Same goes to binarize. This is a preprocessing step, and an option of BinnedSpikeTrain, the user should apply it before feeding the data to this function, seems ad hoc and dangerous to have this kind of functionality in multiple places.
  5. Finally, it was unclear to me what the border correction was doing. It would only kick in for the different length spiketrains case (which I don't allow anymore), so I removed it.

Stylistically, there was a class with helper functions. But the class was only instantiated once and the helper functions therein only used once. I see no point in scattering the code around in this way for functions that are only used once. Since all the options I removed have significantly shortened the code I put it all into the main function, hoping that it is clear what the code does at any given time without having to jump back and forth between helpers and the main function.

Also there were examples interleaved in the code comments, which made the code more confusing to me and I have never seen in any scientific programming language, so I also removed those. I added comments where I saw fit to clarify what certain code blocks are supposed to do.

I hope all these changes improve the usability and clarity of the cross_correlation_histogram function. I know you worked hard to include some of this functions, but to me they made the function more difficult to understand and use, while introducing a lot of uncertainty about what really was happening with some of the pre/post-processing steps.

Let me know what you think, I will work on the tests once the main function is agreed upon.

Probably of interest to @Kleinjohann

Best,
Aitor

@pep8speaks
Copy link

pep8speaks commented Oct 16, 2023

Hello @morales-gregorio! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-02-05 14:53:44 UTC

@Moritz-Alexander-Kern
Copy link
Member

Moritz-Alexander-Kern commented Oct 17, 2023

Hey Aitor,

thank you for contributing to Elephant.

This is what I understand from your description and my "first glance review":

  1. Simplification of Code:

    • Removed code related to cross-correlating arrays of different shapes, as this scenario is not expected.
    • Eliminated functionality for handling spiketrains with non-aligned t_start, emphasizing the need for the same t_start.
    • Suggested removing the kernel function and binarization, as they should be post-processing or preprocessing steps respectively.
  2. Border Correction:

    • Removed the border correction feature, as it only applied to different-length spiketrains (which are no longer allowed).
  3. Code Organization:

    • Integrated helper functions into the main function for improved code clarity.
    • Removed interleaved examples from code comments for better readability.
    • Added comments where needed to clarify code blocks.

I'll need some time to conduct a more thorough review of this PR, and, importantly, I believe it's essential for @Kleinjohann and @mdenker to also examine it.

Several of these points are not directly related to software development or Python programming; instead, they pose questions of a neuroscientific nature e.g. A lot of the code is there to deal with cross correlating arrays of different shape. But I cannot think of any use case where anyone would want to do that. Consequently, I'm hesitant to make comments or decisions regarding these particular aspects.

todo:

  • change unit tests accordingly

@Moritz-Alexander-Kern Moritz-Alexander-Kern added the enhancement Editing an existing module, improving something label Oct 17, 2023
@morales-gregorio
Copy link
Collaborator Author

Hi, any updates? I would wait for feedback and approval of changes before putting work on updating the unit tests

@Moritz-Alexander-Kern Moritz-Alexander-Kern modified the milestone: v1.1.0 Jan 17, 2024
@mdenker mdenker added this to the v1.1.0 milestone Jan 17, 2024
@morales-gregorio
Copy link
Collaborator Author

We discussed this in the meeting on 2024.02.05

  • It is not clear if there is a good use case for differently shaped arrays.
    • Numpy allows this in its function
    • @mdenker will think about this use case and whether we really need it.
  • There exists both cross_correlation_histogram() and cross_correlation_function(); cohesive framework merging these two functions and possibly also spike triggered average. @mdenker supports merging these functions in the new elephant version.

@mdenker mdenker self-requested a review February 8, 2024 14:30
@Moritz-Alexander-Kern Moritz-Alexander-Kern modified the milestones: v1.1.0, v1.2.0 Mar 18, 2024
@Moritz-Alexander-Kern
Copy link
Member

When refactoring the cross-correlation functionalities, also inspect TSPE (Total Spiking Probability Estimation) implementations in elephant.functional_connectivity.

More specifically in elephant/functional_connectivity_src/total_spiking_probability_edges.py:

def normalized_cross_correlation(
spike_trains: BinnedSpikeTrain,
delay_times: Union[int, List[int], Iterable[int]] = 0,
) -> np.ndarray:
r"""
Normalized cross correlation using std deviation
Computes the normalized_cross_correlation between all
Spiketrains inside a BinnedSpikeTrain-Object at a given delay_time
The underlying formula is:
.. math::
NCC_{X\arrY(d)} = \frac{1}{N_{bins}}\sum_{i=-\inf}^{\inf}{
\frac{(y_{(i)} - \bar{y}) \cdot (x_{(i-d) - \bar{x})}{\sigma_x
\cdot \sigma_y}}}
The subtraction of mean-values is omitted, since it offers little added
accuracy but increases the compute-time immensely.
"""

@Moritz-Alexander-Kern Moritz-Alexander-Kern modified the milestones: v1.2.0, v1.3.0 Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Editing an existing module, improving something
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants