Issues with recent scRNA-seq Pseudo-bulk workflow merged #603

pcm32 · 2024-11-18T17:57:03Z

Hi guys, I think it is great that a variation of the Persist-seq pseudo-bulk workflow was merged, it took me quite a long time (nearly a year - and still is being improved) to develop it and test it, and as such I would appreciate if this could be reflected in the README in some reasonably way.

Also, that workflow, as the original version I shared with Diana and others, has an important issue (which I resolved in later versions), it doesn't filter out genes per contrasts when those genes are seldomly expressed (so expressed only in very few cells). This is not taken care by the option in EdgeR either, because sometimes you have genes that get expressed seldomly in a contrast, but reasonably expressed in other contrasts. As a result, EdgeR will define that they are active, but they will still pollute the contrasts where they are seldomly expressed. You can find the solution in more up-to-date version I uploaded to usegalaxy.eu (search for Persist-seq among shared workflows and should be a 0.5.x workflow).

dianichj · 2024-11-18T22:12:34Z

Hi Pablo,

Thank you for sharing your concerns here and in our earlier private discussion on Element. To summarize:

Acknowledgment: This workflow was based on Persist-seq and significantly adapted to fit the needs of most Galaxy users and the PBMC clustering tutorial. Your contributions to the original Persist-seq workflow are acknowledged, and you are listed as a co-author.
Citation Request: In the original PR, you mentioned that citing Persist-seq in either the README or tutorial "would be fine," leaving the choice open. However, no specific details or preferred wording for the citation were provided at the time. If you’d like to propose exact wording or submit a pull request with the preferred citation, we’d be happy to review and incorporate it.
Technical Feedback: Regarding the filtering of genes that are seldomly expressed in certain contrasts but reasonably expressed in others, I understand your concern about how this could introduce noise into the analysis. However, adding extra filtering steps also risks removing genes that might still be biologically relevant, even if their expression is low in some contrasts. Striking the right balance is important, as over-filtering could impact both the results and their biological interpretation.

While this is an important point, it might not be a critical issue in every case, especially since the workflow is designed to be user-friendly and adaptable for various use cases. I’d be interested in exploring how the updated Persist-seq workflow addresses this concern. If there are specific changes or solutions you’d like to propose for this workflow, I’d encourage you to submit a pull request or share additional details here.

I’d like to keep this process collaborative and focused on ensuring the workflow meets user needs while properly acknowledging its origins. Please let me or the rest of the team know if there’s anything specific you’d like to address or propose moving forward.

pcm32 · 2024-11-19T12:14:02Z

Hi Diana,

Could you clarify where is this happening?:

Acknowledgment: This workflow was based on Persist-seq and significantly adapted to fit the needs of most Galaxy users and the PBMC clustering tutorial. Your contributions to the original Persist-seq workflow are acknowledged, and you are listed as a co-author.

because I don't see it in the current version of the readme. Should I be looking elsewhere? Also, your wording above seems to imply that the "significant adaptations" are more relevant than the scientific part of the workflow, to which I would disagree. I hope that we can agree on a fairer wording.

Citation Request: In the original PR, you mentioned that citing Persist-seq in either the README or tutorial "would be fine," leaving the choice open. However, no specific details or preferred wording for the citation were provided at the time. If you’d like to propose exact wording or submit a pull request with the preferred citation, we’d be happy to review and incorporate it.

I think the best would be if this appears in both places. I can PR that, that is not a problem.

Technical Feedback: Regarding the filtering of genes that are seldomly expressed in certain contrasts but reasonably expressed in others, I understand your concern about how this could introduce noise into the analysis. However, adding extra filtering steps also risks removing genes that might still be biologically relevant, even if their expression is low in some contrasts. Striking the right balance is important, as over-filtering could impact both the results and their biological interpretation.

I would suggest that you try it and see the difference first hand to understand why this is essential. You will notice that in most cases signals of GSEA improve markedly (most likely the same in other less powerful methods) and you avoid a lot of highly DE callings that when looking at the details really shouldn't be there. These realisations of course only come after using the workflow again and again on multiple datasets over a long period of time. Genes are not deleted from contrasts where they are relevant as you imply, and that is why this is a parameter, so that you can strike the balance. I'm compelled to write that this is a deficiency in the current IWC workflow (when I PR the desired credits) and people using it should be aware of it so that they can interpret their results correctly.

pavanvidem · 2024-11-19T13:38:35Z

@pcm32 The primary reason for this workflow is to enhance and build upon the existing Clustering PBMC 3k tutorial. We included the only steps of your original workflow that are relevant to the tutorial and worked on that.

Of course we agree that Diana's workflow was inspired from your workflow and you are a co-author of this workflow.
Please check the dockstore.yml and workflow .ga file. Please let us know if something else to be added. I don't know what are the IWC guidelines about adding author details in the README file.

Here we want to work together as a community. The original PR was open for a month for review by experts like you. You might be busy during that time. But no problem, for the gene filtering that you suggested, please create a PR so that we can review and include that too.

pcm32 · 2024-11-19T14:21:00Z

I have added what I think are the adequate credits on the readme through a PR. Months ago I suggested to Diana that we could collaborate on a version that aligned closer to what I had, but she decided to go for a simplified version. That is all good, but then I don't have the bandwidth to maintain the workflow I need for everyday work and the workflow that Diana wanted to use here. But I have the obligation, having my name here, to say that if you're using the pipeline in the current form you might get some undesirable results.

I'm only asking for credit to be visible, because to get to a point where Diana could modify it, it took a lot of my time.

pavanvidem · 2024-11-19T15:35:43Z

Sorry, we really did not know what to cite back then (for the same reason that we do not want to cite a shared but unversioned Galaxy workflow).
The maintenance of Diana's workflow is not solely your responsibility. We are also co-authors and I believe the community supports the maintenance. Similar to the maintenance of Galaxy tools on tools-iuc.
Regarding your authorship of this workflow, I can propose the following options:

Include an optional filtering step that you suggested and keep you as a co-author if it is ok with you.
If you do not want to be on the authors' list, we can remove and extend the acknowledgment with your name.

Any of this is fine with us. We truly respect your contribution.

pcm32 · 2024-11-19T16:38:02Z

I'm happy to stay as an author as long as the note is made (as I wrote on the PR) that that might affect results and should be taken into consideration. If someone wants to add that filtering (it's not that straightforward unfortunately), by all means, and we can remove then the note. Unfortunately, I have very limited time myself and I can't be making changes on both pipelines on every new feature/correction.

…

On Tue, 19 Nov 2024, 15:36 Pavankumar Videm, ***@***.***> wrote: Sorry, we really did not know what to cite back then (for the same reason that we do not want to cite a shared but unversioned Galaxy workflow). The maintenance of Diana's workflow is not solely your responsibility. We are also co-authors and I believe the community supports the maintenance. Similar to the maintenance of Galaxy tools on tools-iuc. Regarding your authorship of this workflow, I can propose the following options: 1. Include an optional filtering step that you suggested and keep you as a co-author if it is ok with you. 2. If you do not want to be on the authors' list, we can remove and extend the acknowledgment with your name. Any of this is fine with us. We truly respect your contribution. — Reply to this email directly, view it on GitHub <#603 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACZ6XUJU5VAA5UALI7AZY32BNLGNAVCNFSM6AAAAABSAHOEKWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBWGA2TCOJUGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mvdbeek · 2024-12-10T11:19:04Z

Is this resolved now ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with recent scRNA-seq Pseudo-bulk workflow merged #603

Issues with recent scRNA-seq Pseudo-bulk workflow merged #603

pcm32 commented Nov 18, 2024

dianichj commented Nov 18, 2024

pcm32 commented Nov 19, 2024

pavanvidem commented Nov 19, 2024

pcm32 commented Nov 19, 2024

pavanvidem commented Nov 19, 2024

pcm32 commented Nov 19, 2024 via email

mvdbeek commented Dec 10, 2024

Issues with recent scRNA-seq Pseudo-bulk workflow merged #603

Issues with recent scRNA-seq Pseudo-bulk workflow merged #603

Comments

pcm32 commented Nov 18, 2024

dianichj commented Nov 18, 2024

pcm32 commented Nov 19, 2024

pavanvidem commented Nov 19, 2024

pcm32 commented Nov 19, 2024

pavanvidem commented Nov 19, 2024

pcm32 commented Nov 19, 2024 via email

mvdbeek commented Dec 10, 2024