Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing Restrict Corpus #65

Closed
wants to merge 1 commit into from
Closed

Conversation

alyosama
Copy link
Contributor

@alyosama alyosama commented Nov 5, 2024

Hi Jean,

After further investigation, I found the cause of this code failing with Nanostring DSP data.

The issue arises because this type of data has a high proportion of important genes present in all cells (or spots), and a relatively low number of spots (around 59).

In my Python code, I use scanpy with the following filter:

sc.pp.filter_genes(adata_spatial, max_cells=int(removeAbove * len(adata_spatial)), inplace=True)

This approach does not discard genes if they meet the removeAbove threshold.

In your function, however, the "greater than or equal" condition discards genes in this edge case. To address this, I modified the function so that removeAbove=1 and removeBelow=0 will not remove any genes

Let me know if you agree with this approach. If you’re okay with it, feel free to merge!

Best,
Aly

@alyosama
Copy link
Contributor Author

alyosama commented Nov 5, 2024

Hi Jean,

It seems that the recent change caused all test cases to fail, as it alters the number of output genes in the MOB dataset.
I understand that implementing this change would require extensive updates, so I'll go ahead and close the pull request.

@alyosama alyosama closed this Nov 5, 2024
@JEFworks
Copy link
Collaborator

JEFworks commented Nov 5, 2024

Dear Aly,

Great investigation! I'm glad you found the source of the issue for the Nanostring DSP data.

However, in this case, we actually do like using the >= for removing genes present in 1 (ie. 100%) of spots. It wouldn't make sense to use the threshold to only remove genes in >100% of spots, which would be none. If we wanted to avoid filtering out genes present in all spots (due to a low number of spots for example), we would use removeAbove=Inf. I'm sure there is an equivalent in Python.

Best,
Jean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants