scripts and intermediate files used to annotate TEs in Jiao et al. 2016
LTR Retrotransposons
scripts in ltr
Software needed:
-
ncbi blast+
-
genometools, (download), need to pass
64bit=yes with-hmer=yes threads=yes
to make, make install for ltrdigest hmm searches in parallel. I also had to passcairo=no
as well because I didn't have the right cairo libraries and it wouldn't compile otherwise -
silix, (download), need to compile with
--enable-mpi
and--enable-verbose
-
hmmer (genometools with download and compile hmmer2 if you run
make with-hmmer=yes
)
Files needed, can be downloaded by get_tRNA_hmm_dbs.sh
in ltr
directory:
-
download hmms of TE protein coding domains from gydb in directory
gydb_hmms
, will be used to identify protein coding domains of TE models-need to fix a hmm with name ty1/copia because this is used as a filename by ltrdigest. to remove the forward slash:
sed -i "s#ty1/copia#ty1-copia#g" gydb_hmms/GyDB_collection/profiles/AP_ty1copia.hmm
-
download tRNAs of all eukaryotes
SINEs
Scripts in sine/
Software needed:
-
SINE-Finder, download (This is a supplemental file at The Plant Cell; need to make executable, and rename to sine_finder.py)
- I cannot make SINE-Finder function on reverse sequences. So I'm reporting SINEs only on the forward stand here, and will pick up sequences on the reverse strand with RepeatMasker.
LINEs
Scripts in line/
Software needed:
- MGEScanNonLTR I use the version generated for Galaxy here.
TIR including MITEs
Scripts in tir/
Software needed:
-
mTEA, genometools (see above, already installed for ltr annotation)
- mTEA needs fasta36 (specifically ggsearch36), bioperl, blast, muscle, supplied blogo directories to be put into PERL5LIB and PATH
Helitrons
Scripts in helitron/
Software needed:
Software needed:
- RepeatMasker, with prerequisites here