-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specification for GTF file #30
Comments
Hey !
Glad you enjoy the tool.
I've had another user having similar issue with arabidopsis annotation.
It would be nice to have some safety checks to deal with those issues.
Have you finally managed to fix this problem?
If yes, would share what you have done so that we can integrate it?
…On Mon, Apr 23, 2018, 14:50 Dylan Kotliar ***@***.***> wrote:
Hi There, thanks for putting all of this together!
I am just putting a few comments together as I seek to run this pipeline
on a Rhesus Macaque sample. I'm downloading the GTF from ENSEMBL:
ftp://ftp.ensembl.org/pub/release-92/gtf/macaca_mulatta
Initially this file didn't include a gene_name in the attribute column.
This was leading to a NullPointerException in
org.broadinstitute.dropseqrna.annotation.ReduceGTF
I manually added a gene_name attribute but now I'm getting an exception
"Missing transcript_name"
Problems:
Missing transcript_name
at
org.broadinstitute.dropseqrna.annotation.GTFParser.next(GTFParser.java:97)
at
org.broadinstitute.dropseqrna.annotation.GTFParser.next(GTFParser.java:39)
at htsjdk.samtools.util.PeekableIterator.advance(PeekableIterator.java:71)
at htsjdk.samtools.util.PeekableIterator.next(PeekableIterator.java:57)
at
org.broadinstitute.dropseqrna.utils.FilteredIterator.next(FilteredIterator.java:71)
at
org.broadinstitute.dropseqrna.annotation.ReduceGTF.writeRecords(ReduceGTF.java:166)
at
org.broadinstitute.dropseqrna.annotation.ReduceGTF.doWork(ReduceGTF.java:112)
at
picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at
org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:42)
It would be helpful if there was a similar error message for gene_name
missing and if there was documentation in the "reference files" section
indicating that the transcript_name and gene_name attributes are required
and may not be included directly in the ensembl download.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#30>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABNXaCXqC84rNG6RXtCtXQCH8jlWBKA9ks5trc3zgaJpZM4Tf1KX>
.
|
Hi. Yep, I fixed it by manually appending gene_name and transcript_name entries to the rows that were missing them in the GTF. I simply used the gene_id and transcript_id for those that were lacking them. I then found that gene_name (which is used by the pipeline as the gene identifier) was not unique in the Macaque annotation. So I overwrote the gene_name entry in the attribute column in the format geneid__genename so it would propagate information for both features through the pipeline. I'm not sure which of these features you would want to add into the package or if you would want to just indicate that gene_name and transcript_name are required attributes. In any case, I'm copying some relevant code here below: `
` |
Hey @dylkot ! Would you be able to help integrate this once the new version is out? |
Sure! I'd be happy to help with integration for Rhesus Macaque. By 'new version', are you referring to incorporating Drop-seq tools 2.0? I was just starting to wonder about how I will use that going forward because I would especially like to have quantitation of the intronic reads in my data |
Version 0.4 and many more things. Which functionnality would you specifically need? |
Exciting! Really just the ability to generate a count matrices for reads falling in introns in addition to reads falling in the coding sequence. |
Hey! If you want to try out the new feature for automatic mixed species download, merging and generation, you can test out the new 0.4 version! |
Awesome, I just got a new load of data and so I'll run it with the new version! I'll let you know how it goes. |
Is there documentation anywhere on how to setup the binaries to run the new version? When I try to run it with the environment that was working for the previous version of dropSeqPipe, I get the error: /bin/bash: cutadapt: command not found I can install cutadapt but I imagine there might be several changes to the environment including downloading the next version of Drop-seq_tools and I just want to make sure I do all of that before I try to run things. |
are you adding |
Nope, I wasn't. Sorry I missed that! Will give it a try. |
Hi There, thanks for putting all of this together!
I am just putting a few comments together as I seek to run this pipeline on a Rhesus Macaque sample. I'm downloading the GTF from ENSEMBL:
ftp://ftp.ensembl.org/pub/release-92/gtf/macaca_mulatta
Initially this file didn't include a gene_name in the attribute column. This was leading to a NullPointerException in org.broadinstitute.dropseqrna.annotation.ReduceGTF
I manually added a gene_name attribute but now I'm getting an exception "Missing transcript_name"
Problems:
Missing transcript_name
at org.broadinstitute.dropseqrna.annotation.GTFParser.next(GTFParser.java:97)
at org.broadinstitute.dropseqrna.annotation.GTFParser.next(GTFParser.java:39)
at htsjdk.samtools.util.PeekableIterator.advance(PeekableIterator.java:71)
at htsjdk.samtools.util.PeekableIterator.next(PeekableIterator.java:57)
at org.broadinstitute.dropseqrna.utils.FilteredIterator.next(FilteredIterator.java:71)
at org.broadinstitute.dropseqrna.annotation.ReduceGTF.writeRecords(ReduceGTF.java:166)
at org.broadinstitute.dropseqrna.annotation.ReduceGTF.doWork(ReduceGTF.java:112)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:42)
It would be helpful if there was a similar error message for gene_name missing and if there was documentation in the "reference files" section indicating that the transcript_name and gene_name attributes are required and may not be included directly in the ensembl download.
The text was updated successfully, but these errors were encountered: