Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SelectGenes use other genome #17

Open
liyang24 opened this issue Mar 13, 2023 · 2 comments
Open

SelectGenes use other genome #17

liyang24 opened this issue Mar 13, 2023 · 2 comments

Comments

@liyang24
Copy link

Hi!
i noticed that SelectGenes Available genome are: hg19, hg38, mm9, and mm10
can i use the genomes of other species?
If so, how exactly should I do it

@Dalhte
Copy link

Dalhte commented Jul 28, 2023

Hello @liyang24
I am not a specialist but here is what I did to implement the rat genome (rn6)

create a gene annotation for RN6

#get the library

library(org.Rn.eg.db)
library(TxDb.Rnorvegicus.UCSC.rn6.refGene)

geneAnnotationRN6 <- createGeneAnnotation(TxDb = TxDb.Rnorvegicus.UCSC.rn6.refGene, OrgDb = org.Rn.eg.db)

I had a "chr" prefix problem (chromsome are named "chr1" in the annotation but only "1" in my data) to remove the "chr" and obtain a file with adequate formate, I did, quite laboriously :

#separation of the gene, exon and tss info
geneAnno <- geneAnnotationRN6$gene
exonAnno <- geneAnnotationRN6$exon
TSSAnno <- geneAnnotationRN6$TSS

#cleaning the format

geneAnno <- keepStandardChromosomes(geneAnno, pruning.mode="coarse")
exonAnno <- keepStandardChromosomes(exonAnno, pruning.mode="coarse")
TSSAnno <- keepStandardChromosomes(TSSAnno, pruning.mode="coarse")

geneAnno <- renameSeqlevels(geneAnno, c(chr1="1", chr2="2", chr3="3", chr4="4", chr5="5", chr6="6", chr7="7", chr8="8", chr9="9", chr10="10", chr11="11", chr12="12", chr13="13", chr14="14", chr15="15", chr16="16", chr17="17", chr18="18", chr19="19", chr20="20", chrX="X", chrY="Y"))
exonAnno <- renameSeqlevels(exonAnno, c(chr1="1", chr2="2", chr3="3", chr4="4", chr5="5", chr6="6", chr7="7", chr8="8", chr9="9", chr10="10", chr11="11", chr12="12", chr13="13", chr14="14", chr15="15", chr16="16", chr17="17", chr18="18", chr19="19", chr20="20", chrX="X", chrY="Y"))
TSSAnno <- renameSeqlevels(TSSAnno, c(chr1="1", chr2="2", chr3="3", chr4="4", chr5="5", chr6="6", chr7="7", chr8="8", chr9="9", chr10="10", chr11="11", chr12="12", chr13="13", chr14="14", chr15="15", chr16="16", chr17="17", chr18="18", chr19="19", chr20="20", chrX="X", chrY="Y"))

#creation of a new annotation file

geneAnnotationRN6.2 <- createGeneAnnotation(
genome = NULL,
TxDb = NULL,
OrgDb = NULL,
genes = geneAnno,
exons = exonAnno,
TSS = TSSAnno,
annoStyle = NULL
)

then i copied the peaktogene to run it locally (because I don't known how to modify it directly) with some modifications:
-I had "rn6" to the genome list and the annotation :
if (genome == "rn6") {
gene_anno <- geneAnnotationRN6.2

then I run SelectGenes also locally, with some modifications :
I replaced RNA assy with SoupXRNA because I had to run SoupX to clean the data
I added genome = "rn6" in the selectgenes function and in the calling of the peaktogene function

And I have to admit that, to my surprise, it did worked :)

PS I had to implement the motif names call solution proposed by @AmandaKedaigle in the issue " Motif matching when using mouse data #18 " every time needed (in the GetTFGeneCorrelation function, in the colnames(motif.matching)) if not the different format of the motif names blocked the analysis.

Moreover, I tried to used the jaspar2022 instead of the Jasper2020 and there is a problem with mofit name redundancy

I hope it will help and also that I did not do something stupid, but the results seems consistent with my prior knownledge of the biological system so....

Best
David

@wangb1119
Copy link

Hello, according to your method, I also created a gene annotation for macaques, and SelectGenes by using the self-defined gene annotation file. However, after that, I don't know how to calculate and define the rest of this function, especially how to use the gene annotation information. Only the selection logic of genome annotation is written, but the operation of selecting genes is not carried out.

            sel.tfs <- SelectTFs(object = pbmc.t.cells,
             return.heatmap = TRUE,
            cor.cutoff = 0.4)

df.cor <- sel.tfs$tfs
ht <- sel.tfs$heatmap
pdf("./heatmap_SelectTFs.pdf")
t = draw(ht)
dev.off()

annotation <- genes(EDB, return.type = "GRanges")
exon_annotation <- exons(EDB, return.type = "GRanges")
TSS_annotation <- promoters(EDB, upstream=0, downstream=1, return.type = "GRanges")

annotation <- genes(EDB, return.type = "GRanges")
exon_annotation <- exons(EDB, return.type = "GRanges")
TSS_annotation <- promoters(EDB, upstream=0, downstream=1, return.type = "GRanges")

geneAnno <- annotation
exonAnno <- exon_annotation
TSSAnno <- TSS_annotation

ucsc.levels <- str_replace(string = paste("chr", seqlevels(annotation), sep = ""), pattern = "chrMT", replacement = "chrM")
seqlevels(annotation) <- ucsc.levels
seqlevels(geneAnno) <- ucsc.levels
seqlevels(exonAnno) <- ucsc.levels
seqlevels(TSSAnno) <- ucsc.levels

geneAnnotationMmu <- createGeneAnnotation(
genome = NULL,
TxDb = NULL,
OrgDb = NULL,
genes = geneAnno,
exons = exonAnno,
TSS = TSSAnno,
annoStyle = NULL
)
SelectGenes <- function(object,
atac.assay = "ATAC",
rna.assay = "RNA",
var.cutoff.gene = 0.9,
trajectory.name = "Trajectory",
distance.cutoff = 2000,
groupEvery = 1,
cor.cutoff = 0,
fdr.cutoff = 1e-04,
return.heatmap = TRUE,
labelTop1 = 10,
labelTop2 = 10,
genome = "mihou") {
if (is.character(genome)) {
if (genome == "mihou") {
gene_anno <- geneAnnotationMmu
} else {
stop("Unsupported genome annotation")
}
} else if (is(genome, "GRanges")) {
gene_anno <- genome
} else {
stop("Unsupported genome annotation format")
}
#Based on the selection logic of genome annotation, I want to calculate something for gene selection. I don't know if it is correct. I want to ask your suggestion. How should I do it? tfs <- data.frame(
gene = c(),
cor = c(0.5, 0.6, 0.7)
)
heatmap_data <- matrix(rnorm(100), nrow = 10, ncol = 10)
rownames(heatmap_data) <- paste0("Gene", 1:10)
colnames(heatmap_data) <- paste0("Cell", 1:10)

heatmap <- Heatmap(heatmap_data,
name = "Gene Expression",
show_row_names = TRUE,
show_column_names = TRUE)
return(list(p2g = tfs, heatmap = heatmap))
}
#Next, I selected the genes, but I didn't get p2g and heatmap. sel.genes <- SelectGenes(
object = pbmc.t.cells,
atac.assay = "ATAC",
rna.assay = "RNA",
var.cutoff.gene = 0.9,
trajectory.name = "Trajectory",
distance.cutoff = 2000,
groupEvery = 1,
cor.cutoff = 0,
fdr.cutoff = 1e-04,
return.heatmap = TRUE,
labelTop1 = 10,
labelTop2 = 10,
genome = "mihou"
)

df.p2g <- sel.genes$p2g
ht <- sel.genes$heatmap
In addition, I also want to ask, when I am executing GetTFGeneCorrelation:
tf.gene.cor <- GetTFGeneCorrelation(object = pbmc.t.cells,
tf.use = df.cor$tfs,
gene.use = unique(df.p2g$gene),
tf.assay = "chromvar",
gene.assay = "RNA",
trajectory.name = "Trajectory")
there will be an error:
Find 553 shared features!
Creating Trajectory Group Matrix..
Some values are below 0, this could be the Motif activity matrix in which scaleTo should be set = NULL.
Continuing without depth normalization!
Smoothing...
Creating Trajectory Group Matrix..
Smoothing...
Error in tf_activity[tf.use, ] : subscript out of bounds
Calls: GetTFGeneCorrelation
My question may be stupid, and I look forward to your answer.Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants