Skip to content

Commit

Permalink
update help
Browse files Browse the repository at this point in the history
  • Loading branch information
Juke34 committed Mar 16, 2020
1 parent ff26d3f commit 8a6352b
Showing 1 changed file with 19 additions and 12 deletions.
31 changes: 19 additions & 12 deletions bin/agat_convert_sp_gff2gtf.pl
Original file line number Diff line number Diff line change
Expand Up @@ -475,21 +475,20 @@ =head1 DESCRIPTION
The script aims to convert any GTF/GFF file into a proper GTF file.
Full information about the format can be found here: https://github.com/NBISweden/GAAS/blob/master/annotation/knowledge/gxf.md
The last descrption of the fomat specify only 9 acctepeted feature type (3rd colum):
gene, transcript, exon, CDS, Selenocysteine, start_codon, stop_codon, three_prime_utr and five_prime_utr
Nevertheless if your file contains other type of features they will not be removed,
as long as the parser can deal with them.
To be fully GTF compliant all feature need to have a gene_id and a transcript_id attribute.
You can choose among 6 different GTF types (1, 2, 2.1, 2.2, 2.5, 3).
Depending the version selected the script will filter out the features that are not accepted.
For GTF2.5 and 3, every level1 feature (e.g nc_gene pseudogene) will be converted into
gene feature and every level2 feature (e.g mRNA ncRNA) will be converted into
transcript feature.
You can even produce a GFF-like GTF using the --relax option. It allows to keep all
original feature types (3rd column).
To be fully GTF compliant all feature have a gene_id and a transcript_id attribute.
The gene_id is unique identifier for the genomic source of the transcript, which is
used to group transcripts into genes.
The transcript_id is a unique identifier for the predicted transcript,
which is used to group features into transcripts.
Keep in mind that some bioperl versions forget to add the header (##gff-version 2) in the output.
Check the output to add it if missing, it will avoid you troubles during your downstream analyses.
=head1 SYNOPSIS
agat_convert_sp_gff2gtf.pl --gff infile.gtf [ -o outfile ]
Expand All @@ -509,18 +508,26 @@ =head1 OPTIONS
=item B<--gtf_version>
version of the GTF output. Default 3 (for GTF3)
GTF3 (9 feature types accepted): gene, transcript, exon, CDS, Selenocysteine, start_codon, stop_codon, three_prime_utr and five_prime_utr
GTF2.5 (8 feature types accepted): gene, transcript, exon, CDS, UTR, start_codon, stop_codon, Selenocysteine
GTF2.2 (9 feature types accepted): CDS, start_codon, stop_codon, 5UTR, 3UTR, inter, inter_CNS, intron_CNS and exon
GTF2.1 (6 feature types accepted): CDS, start_codon, stop_codon, exon, 5UTR, 3UTR
GTF2 (4 feature types accepted): CDS, start_codon, stop_codon, exon
GTF1 (5 feature types accepted): CDS, start_codon, stop_codon, exon, intron
=item B<--relax>
Relax option allows to not follow the strict GTF format rules. All feature type will be kept.
No modification e.g. mRNA to transcript
Relax option avoid to apply strict GTF format specification. All feature type will be kept.
No modification e.g. mRNA to transcript.
No filtering i.e. feature type not accepted by GTF format are kept.
gene_id and transcript_id attributes will be added, and the attributes will follow the
GTF formating.
=item B<-o> , B<--output> , B<--out> , B<--outfile> or B<--gtf>
Expand Down

0 comments on commit 8a6352b

Please sign in to comment.