clusterProfiler: BgRatio N value does not equal set gene universe size #723

fafaris39 · 2024-09-13T13:22:24Z

I am performing a GO enrichment analysis using genes from the transcriptome as the reference universe, consisting of 23,761 genes. I then conducted an enrichGO analysis on my differentially expressed genes (DEGs).

However, I noticed that only 17,800 genes from my universe are annotated in the org.Mm.eg.db database, as shown in the results below:

ID	Description	GeneRatio	BgRatio	pvalue	p.adjust	qvalue
GO:0001505 GO:0001505	regulation of neurotransmitter levels	25/207	208/17800	2.039483e-18	4.729560e-15	3.896485e-15
GO:0006836 GO:0006836	neurotransmitter transport	25/207	218/17800	6.415275e-18	7.438512e-15	6.128276e-15
GO:0007269 GO:0007269	neurotransmitter secretion	21/207	166/17800	4.292311e-16	2.488467e-13	2.050143e-13

To understand this difference, I checked how many genes from my universe are properly annotated in the org.Mm.eg.db database using the following code:

# list of gene symbols from the dataframe, representing the universe of genes.
universe <- df$mgi_symbol

# list of all mouse gene symbols from the org.Mm.eg.db database.
gene_symbols <- keys(org.Mm.eg.db, keytype = "SYMBOL")

# Retrieve the associated GO terms for each of the mouse gene symbols.
go_terms <- mapIds(org.Mm.eg.db,
                   keys = gene_symbols,
                   column = "GO",
                   keytype = "SYMBOL",
                   multiVals = "list")

# Filter genes that do not have valid or non-empty GO terms.
empty_or_na_go_terms <- go_terms[sapply(go_terms, is_empty_or_na)]

# Function to check if a vector of GO terms is valid (not empty, non-null, and non-NA).
has_valid_go_terms <- function(go_terms_vector) {
  !is.null(go_terms_vector) && !all(is.na(go_terms_vector)) && length(go_terms_vector) > 0
}

# Filter genes with valid GO terms by applying the function to the list of GO terms.
valid_go_terms <- go_terms[sapply(go_terms, has_valid_go_terms)]

# Extract the names of genes that have valid GO terms from the db.
genes_with_go_terms <- names(valid_go_terms)
print(length(genes_with_go_terms))  #29,891.

# Find the intersection between the genes in the universe and the genes that have valid GO terms in db.
common_genes <- intersect(universe, genes_with_go_terms)
print(length(common_genes))  #18,359.

I found 18,359 genes from my universe that are properly annotated in the org.Mm.eg.db database. However, the enrichGO results indicate only 17,800 annotated genes in the analysis.
Where could this difference come from?

The text was updated successfully, but these errors were encountered:

guidohooiveld · 2024-09-13T14:18:30Z

Cross-posted: https://support.bioconductor.org/p/9159850/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clusterProfiler: BgRatio N value does not equal set gene universe size #723

clusterProfiler: BgRatio N value does not equal set gene universe size #723

fafaris39 commented Sep 13, 2024

guidohooiveld commented Sep 13, 2024

clusterProfiler: BgRatio N value does not equal set gene universe size #723

clusterProfiler: BgRatio N value does not equal set gene universe size #723

Comments

fafaris39 commented Sep 13, 2024

guidohooiveld commented Sep 13, 2024