Correct method for performing Differential expression across samples #6127

Tommy0398 · 2022-06-28T14:04:07Z

Tommy0398
Jun 28, 2022

Hi,

This is just to get thoughts on a fairly recent paper that aimed to compare different statistical methods for single cell differential expression analysis. The paper is here: Confronting false discoveries in single-cell differential expression. I'm not sure if its appropriate to ask this kind of discussion here but understanding the results of the paper could be important for choosing how to perform DE in Seurat and supposedly impact the results of a study.

The reason why I'm asking this here is because it suggests that the common method for performing single cell DE analysis performs worse than alternative method (more false positive and negative results). Therefore I want to better understand what the results of this study mean to help inform which DE approach to use to analyze single cell data.

Some of the statements in their Results:

"Findings imply a systematic tendency of single cell methods to identify highly expressed genes as differentially expressed, even when their expression remained unchanged"
- Pseudobulk methods appeared to avoid that bias
The advantages of the psudobulk methods appear to be that they account for variance between the biological replicates as psudobulk methods ran without aggregation results in them no longer be "superior" to the single cell methods
- "Accounting for this variability allows pseudobulk methods to correctly identify changes in gene expression caused by a biological perturbation. In contrast, failing to account for biological replicates causes single-cell methods to systematically underestimate the variance of gene expression."
"These experiments demonstrated that the variability between biological replicates can confound the identification of genes affected by a biological perturbation"

This would then suggest that the DE analysis methods that performed better supposedly "accounted" for variance between biological replicates.(1) Is this suggested flaw still present in the SCTransform DE pipeline and would this mean that the default Wilcoxon would be worse for performing DE when you have multiple biological replicates of a condition under comparison.

This paper performed the analysis with an older version of Seurat and performed DE on counts normalized using NormalizeData rather than the newer SCTransform approach, which may give different results.

In this paper they used bulk RNA-Seq data from the same experiments as the scRNA-seq data as a ground truth for the comparisons of DE methods.(2) My initial thought with their results in using this approach would be that this may have biased the DE result comparisons towards favoring the pseudobulk approaches as they are the same as what was used to produce the bulk DE results?

In the "False discoveries in single-cell DE" results, single-cell data was simulated with different degrees of heterogeneity between replicates in the absence of difference across groups. This found that single cell methods identified DE genes across groups where there is "no perturbation" with the highest expressed genes being the "falsely called DE".(3) Would this result have occurred due to DE genes being found between specific replicates between the two groups where there would not be a difference in gene expression if the group was aggregated together? This is suggesting that in this instance the DE genes being found are just due to the biological and technical variability across replicate.

This suggests that single cell methods create an excess of false positive DE genes due to this biological and technical variability.
These false positives were abolished in pseudobulk methods
Does Seurat not account for this variability when performing DE analysis?

So my main question of this discussion would be asking which method is considered to be best for performing differential expression across conditions.(4*) This paper suggests that pseudobulk methods are better than that performed by the default Wilcox test performed by Seurat's findMarkers, does anyone have any experience trying these methods to agree with or refute that claim?

Regards,

denvercal1234GitHub · 2024-08-06T18:29:01Z

denvercal1234GitHub
Aug 6, 2024

This might help: https://nbisweden.github.io/excelerate-scRNAseq/session-de/session-de-methods.html, in addition to some benchmarking papers.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct method for performing Differential expression across samples #6127

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Correct method for performing Differential expression across samples #6127

Tommy0398 Jun 28, 2022

Replies: 1 comment

denvercal1234GitHub Aug 6, 2024

Tommy0398
Jun 28, 2022

denvercal1234GitHub
Aug 6, 2024