Skip to content

abims-sbr/drap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DRAP : De novo RNA-seq Assembly Pipeline

This is a fork of DRAP v1.91: http://www.sigenae.org/drap adapted to the cluster of the Station Biologique de Roscoff.

Modifications are quick and dirty and not intensively tested. At the moment only runMeta works correctly with sge set in the cfg/drap.cfg file . Everything else must use local option (but can be submitted as a job with a qsub).

License: GNU GPLv3

Description

Short read RNASeq de novo assembly is a well established method to study transcription of organisms lacking a reference genome sequence. Available software packages such as Trinity and Oases have proven to be able to build high quality contigs from short reads. But there is still room for improvement on different points such as:

  • compactness: they often produce different contigs which are included in one another or overlapping one another,
  • chimerism: the contigs contain different kinds on chimera such as duplicated open reading frames,
  • substitution, insertion, deletion errors: the consensus sequences build by the assembler contain errors which can be partly corrected using the read alignments.

DRAP includes three modules:

  • runDrap chains an Oases or Trinity assembly of reads from a given sample with several compaction and correction steps. It produces several assembly files with different FPKM threshold for total contigs or contigs comprising an open reading frame. A report file presents the resulting assembly and alignment metrics.
  • runMeta gathers all the samples assemblies and fusions the results in a unique representative contig set. It also removes the redundancy between sets and produces a general reports including assembly and alignment metrics.
  • runAssessment processes different contigs sets build from the same read sets to generate assembly and alignment metrics which are collected in report. It helps to choose the best assembly.

Docker install

Go to the original install page: http://www.sigenae.org/drap/install.html

Local install

Dependencies:

Details about how those softwares are used can be see in doc/third_party_tools.html

Configure & Test

see the doc/install.html and doc/quick_start.html pages.

Hacks

Run runMeta without having used runDrap before.

You must use a command similar to the following:

#!/bin/bash

DRAP_PATH="/usr/local/genome2/drap"
WORKING_DIR="$(pwd)"
OUT_FOLDER="$WORKING_DIR"

$DRAP_PATH/runMeta \
 --cfg-file $WORKING_DIR/cfg/drap.cfg \
 --drap-dirs $OUT_FOLDER/trinity_splA,$OUT_FOLDER/trinity_splB \
 --ref $DRAP_PATH/test/data/Danio_rerio.pep.fasta \
 --outdir $OUT_FOLDER/meta_trinity \

where: --drap-dirs are the folders obtained from runDrap. Each of those folders must contains at least le following contents in order to successfully run runMeta:

  • .drap_conf.json (used in the steps 06-meta_index.sh, 07-meta_rmbt.sh and 09-meta_postprocess.sh of runMeta): a json file containing at least the following elements:
{
  "alignR1" : [
     "/path/to/sampleA_R1.fastq.gz"
  ],
  "alignR2" : [
     "/path/to/sampleA_R2.fastq.gz"
  ],
  "coverages" : [
     "1",
     "3",
     "5",
     "10"
  ],
  "paired" : 1,
  "strand" : null
 }
  • transcripts_fpkm_X.fa (used in 01-meta_merge.sh): which is the file of transcripts to be mapped. The X in transcripts_fpkm_X.fa name must be the minimal value in the list associated to the "coverages" key in the file .drap_conf.json. The value of X is the coverage cutoff used by express when transcripts (from transcripts_fpkm.fa) are filtered.

Notes:

  • The alignR2 key can be ommited when the paired key is set to 0.
  • If we set, for example, "coverages" : ["2"] and transcripts_fpkm_2.fa, the file .drap_conf.json produced by runMeta will put back "coverages" : ["1", "3", "5", "10"].