You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you for providing such a nice tool! I am currently using the tool to analyze ATAC-seq and it works beautifully. I have some questions on the results from genomicElementDistribution.
In "genomicEelementDistribution", there are options named as "promoterRegion" and "geneDownstream". I wonder if promoterRegion is calculated based on the transcription start site (5'end of gene) of gene. Second, I wonder if either promoterRegion and geneDownstream option consider strand (+ or - strand) of gene.
so for example, let's say there is a gene name as "Tubulin" whose coordinates of start and end are 1000 and 2000 (start and end coordinates specified in GTF file), respectively, and strand is "-" and options for promoterRegion and geneDownstream are set as below
If the strandness of gene is considered, promoter region would be 1500-2500 (5'end of Tubulin is 2000 considering that the strand is "-") and downstream region would be 800-1000 (3'end of Tubulin in 1000 considering that the strand is "-").
So it would be great if you can confirm that genomicElementDistribution consider the strandness of gene or not.
Lastly, as far as I understand, If the peaks overlap multiple features (exon, intron, etc), single feature is assigned to the peak based on the pre-determined priority. It would be great if you can tell me how the features are prioritized.
In my R code, I make txdb from gencode gene annotation using the makeTxdbFromGFF as shown below
Thank you for trying ChIPpeakAnno to annotate your data.
Yes, the promoter region and downstream region will consider the strand information. The ignore.strand=TRUE is designed for the peaks to be annotated.
About the priority, it is determined by the order of element in labels parameter.
for example:
Exons=c(utr5="5' UTR", utr3="3' UTR", CDS="CDS", otherExon="Other exon")
will behavior different from
Exons=c(CDS="CDS", utr5="5' UTR", utr3="3' UTR", otherExon="Other exon")
hi @jianhong ,
First of all, thank you for providing such a nice tool! I am currently using the tool to analyze ATAC-seq and it works beautifully. I have some questions on the results from genomicElementDistribution.
In "genomicEelementDistribution", there are options named as "promoterRegion" and "geneDownstream". I wonder if promoterRegion is calculated based on the transcription start site (5'end of gene) of gene. Second, I wonder if either promoterRegion and geneDownstream option consider strand (+ or - strand) of gene.
so for example, let's say there is a gene name as "Tubulin" whose coordinates of start and end are 1000 and 2000 (start and end coordinates specified in GTF file), respectively, and strand is "-" and options for promoterRegion and geneDownstream are set as below
promoterRegion=c(upstream=500, downstream=500),
geneDownstream=c(upstream=0, downstream=200))
If the strandness of gene is considered, promoter region would be 1500-2500 (5'end of Tubulin is 2000 considering that the strand is "-") and downstream region would be 800-1000 (3'end of Tubulin in 1000 considering that the strand is "-").
So it would be great if you can confirm that genomicElementDistribution consider the strandness of gene or not.
Lastly, as far as I understand, If the peaks overlap multiple features (exon, intron, etc), single feature is assigned to the peak based on the pre-determined priority. It would be great if you can tell me how the features are prioritized.
In my R code, I make txdb from gencode gene annotation using the makeTxdbFromGFF as shown below
R code
setwd(dir)
library(ChIPpeakAnno)
library(GenomicFeatures)
gtf <- "gencode.v29.annotation.gtf" ## genocde annoataion
txdb <- makeTxDbFromGFF('gencode.v29.annotation.gtf')
test_peak_file <- "ATAC-seq_peaks.bed"
test_peak <- toGRanges(test_peak_file, format="BED")
out<- genomicElementDistribution(test_peak,
TxDb = txdb,
promoterRegion=c(upstream=5000, downstream=5000),
geneDownstream=c(upstream=0, downstream=2000))
Please let me know if there is any flaw in the code.
Thank you and have a nice day!
The text was updated successfully, but these errors were encountered: