GitHub - akramdi/ASAP: ASAP -- ATAC-Seq Analysis Pipeline

README for ASAP v2.1

Introduction

ASAP is a flexible bioinformatic pipeline for ATAC-seq data analysis. Starting from raw ATAC-seq sequencing reads, ASAP outputs raw and filtered mapping files, coverage files (reads coverage ; tn5 insertion events coverage), fragment length distribution, read exraction based on fragment length, and peak calling results.

Overview of major steps

Mapping
Post-mapping processing and filtering:
- Filter (or not) reads that fall into user-defined blacklisted regions
- Select reads that do not carry more than minMismatch and filter by minimum mapping quality (MAPQ)
- Mark duplicated pairs
- Select concordant, non-duplicated pairs.
- Shift reads by 4bp as described in Schep et al.,2015: shift by 4bp toward to center of the transposition event.
Compute read coverage
Compute insertion events coverage
Fragment length distribution
Extract reads pairs based on a fragment length range and compute arcs between fragment extremities (protection visualization)
Peak calling

ASAP is:

User-friendly: requires a single configuration file. Thus, only one option is required when running the command line (see Usage of ASAP)

Flexible: provides the possibility to skip a given step(s) and target specific post-processing step(s).

Dependencies

Bowtie2 version >= 2.2.9
MACS2
Samtools version >=1.2
GenomeCoverageBed version >= v2.20.1

Usage of ASAP

A configuration file required to execute the pipeline.

bash ASAP.sh [-h] [-v] [-c]

Options

-c CONFIGFILE

This is the only REQUIRED parameter for ASAP. The configuration file is a text file that gathers the full set of parameters required to execute the pipeline. (check the example ASAP_configFile_example.conf in distribution)

-h/-v

Print out the help/current version

About the configuration file:

The configuration file gathers the parameters of each step. Note that, when running the pipeline, only the "turned on" steps will performed. A step is turned on by a yes/no argument.

Here we list the different set of parameters to be filled in the configuration file:

General parameters

General information option about the run. Must be always filled.

OUTDIR:	Main output directory where results are written. OUTDIR is created if does not exist
sampleName:	Name of the processed sample. No space is allowed: use _ or - to mimic space if needed
CHRLEN:	Chromosome info file (tab-delimited format: <Chr name><chr length>)
path:	Full path to the different dependencies, if not already added to $PATH

Mapping step parameters

map:	Set to "yes/no". If mapping is skipped (map=no), a BAM file must be provided to proceed. (see post-mapping steps).
FASTQ1:	fastq file (R1). File can be gizpped
FASTQ2:	fastq file (R2). File can be gizpped
bowtieIndex:	Prefix of bowtie2 indexes
mappingParameters:	Bowtie2 mapping parameters. Default: --very-sensitive -X 2000 -p 10

Post mapping steps

It is possible to skip the mapping step (map=no) and perform any of the post-mapping steps. To do so, aligned reads must be provided in a BAM file. If* map=yes*, the "turned on" post-mapping steps will performed on the internal mapping results.

BAM:	aligment file in BAM format

Filtering parameters

filter:	Set to "yes/no". If map=yes, filtering will be performed on internal mapping results, if map=no, filtering will be performed on the provided alignment file in BAM option.
maxMis:	Maximum number of mismatches allowed per read.
blacklist:	Set to "yes/no" if reads should be filtred based on a list of blacklisted regions. If "blacklist=yes", blacklisted regions must be provided in the next parameter.
blacklistedRegions:	Regions used to filter reads.(tab-delimited format: <Chr name><start><end>)
shift:	Set to "yes"/"no". If shift=yes, reads are shifted by 4bp so that read starts reflect the center of the Tn5 transposition event

Coverage

readCoverage:	Set to "yes/no" if read coverage should be computed or not
ieventsCoverage:	Set to "yes/no" if Tn5 insertion events coverage should be computed or not

Read extraction

extractReads:	Set to "yes/no" if read pairs should be extracted based on a given range of fragment length
lowBoundary:	Lower boundery of the range: [lowBoundary,upBoundary]. Default=100
upBoundary:	Upper boundery of the range: [lowBoundary,upBoundary]. Default=250
arcs:	set to "yes/no" if extracted fragments should represented as arcs (linked extremities)

Fragment length

fragDist:	Set to "yes/no" if fragment length distribution should be computed or not

Peak calling

callpeak:	Set to "yes/no" if peak calling should be computed or not.
control:	Control bam file. Note that peak calling can be performed without a control, however, one can provide a control such as ATAC-seq on genomic DNA. Leave option empty if no control is used.
MODE:	Peak calling mode: <broad/narrow>. Default=broad
modelParameters:	MACS2 shifting options
fdr:	Cutoff for peak detection. Default=0.01
gsize:	Effective genome size of tair10 (gsize=10e7)

Output files

ASAP outputs mapping files, coverage files, fragments distribution table/plot and MACS2 peak calling results.

Mapping output

*.mapped.sorted.bam:	Contains mapped reads (bowtie2 raw mapping results)

Filtering/post-processing outputs

*.(un)masked.(un)shifted.bam:	Contains the selected set of reads after filtering. Ideally, accessible peaks are called using this file.
*.csv:	Summary of filtering step is CSV format

Coverage outputs

*.(un)masked.(un)shifted.ievent.bam:	Contains Tn5 insertion events. Basically, instead of showing reads, only the position corresponding to Tn5 insertion event are shown)
*.(un)masked.(un)shifted.bw:	Genome-wide coverage of ATAC reads
*.(un)masked.(un)shifted.ievent.bw:	Genome-wide coverage of Tn5 insertion events

Read extraction

.subReads.f3.frag.bam:	Contains the set of extracted reads based on the given rage of fragment length
.subReads.f3.frag.bw:	Genome-wide coverage of the set of extracted reads based on the given rage of fragment length
.subReads.f3.frag.arcs.bed:	arcs between fragment extremities. This file is visualized on IGV

Fragment length distribution

*.TLEN.f3F16.txt:	Counts/frequencies of fragments length
*.TLEN.f3F16.png:	Plot of fragment length distribution

Peak calling outputs

Output are stored in an directory: peak_calling_<sampleName>. Check MACS2 output list

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
docs		docs
src		src
test_data		test_data
ASAP.sh		ASAP.sh
ASAP_configFile_example.conf		ASAP_configFile_example.conf
LICENSE		LICENSE
README.rst		README.rst

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README for ASAP v2.1

Introduction

Overview of major steps

Dependencies

Usage of ASAP

Options

-c CONFIGFILE

-h/-v

About the configuration file:

General parameters

Mapping step parameters

Post mapping steps

Filtering parameters

Coverage

Read extraction

Fragment length

Peak calling

Output files

Mapping output

Filtering/post-processing outputs

Coverage outputs

Read extraction

Fragment length distribution

Peak calling outputs

About

Releases 3

Packages

Languages

License

akramdi/ASAP

Folders and files

Latest commit

History

Repository files navigation

README for ASAP v2.1

Introduction

Overview of major steps

Dependencies

Usage of ASAP

Options

-c CONFIGFILE

-h/-v

About the configuration file:

General parameters

Mapping step parameters

Post mapping steps

Filtering parameters

Coverage

Read extraction

Fragment length

Peak calling

Output files

Mapping output

Filtering/post-processing outputs

Coverage outputs

Read extraction

Fragment length distribution

Peak calling outputs

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages