extracting cds or merged exons plus upstream and downstream regions #418
-
Hi I have a genome annotation for a non-model organism, where we do not yet have the UTRs annotated, i.e. the transcripts/mrna features correspond to the cds for most genes. For ribosomal footprint profiling, I need to look at reads mapping just after/before the start and stop codons (to determine that the footprints map primarily to CDS). To do this, I need a transcript fasta file that have the CDS but with e.g. 100 bp added at both ends (thereby "faking" the UTRs). Just to clarify, the easiest would have been to map to the genome, but of the many different tools for riboseq qc'ing, the simplest one (that I can also get to work on my system) takes alignments to the transcriptome and not the genome. I have looked at the AGAT options for extracting sequences ( My suggested command:
Cheers |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
It seems it works as intended :-) except for a few of the transcripts, where AGAT throws a warning about the extracted sequence being different than intended, likely if a transcript is closer than 100nt to the boundaries of the scaffold. |
Beta Was this translation helpful? Give feedback.
Your command and interpretation sounds correct.