Skip to content

Calculate necessary raw read depth and complexity of FASTQ files

Notifications You must be signed in to change notification settings

guertinlab/fqComplexity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 

Repository files navigation

fqComplexity

The input FASTQ file should already have its adapters removed. The UMI should still be included in the FASTQ DNA sequence. The FASTQ file is subsampled into deciles. The intermediate file is deduplicated and the input and output numbers are logged. An asymptotic regression model is fit to the data and the total number of unique reads at 10 million read depth is printed on the resulting PDF plot.

The flags -x and -y are optional and they each default to 1. If the raw files were preprocessed in some way, -x should be set to the total raw reads divided by the resultant number after preprocessing. If your pipeline has filtering steps after FASTQ duplication, then -y should be set to final desired output (usually aligned reads) divided by the total number of deduplicated reads. An example of -x would be the value 2 if half the reads are prefiltered in a step that removes adapter/adapter ligation products. An example of -y would be the value 0.5 if half of the reads that are deduplicated align to the genome. If fqComplexity is run without the -x and -y options prior to a subsequent call that invokes these options, the first log file will be reused to save speed.

The complexity_pro.R script and fqComplexity program are within this repository.

wget https://raw.githubusercontent.com/guertinlab/fqComplexity/main/fqComplexity
wget https://raw.githubusercontent.com/guertinlab/fqComplexity/main/complexity_pro.R

Navigate to the directory containg fqComplexity and complexity_pro.R, then change permissions with:

chmod +x fqComplexity
chmod +x complexity_pro.R

Prior to running fqComplexity, you need to install the following dependencies and move to $PATH:

seqtk https://github.com/lh3/seqtk

fqdedup https://github.com/guertinlab/fqdedup

complexity_pro.R https://github.com/guertinlab/fqComplexity/blob/main/complexity_pro.R

Usage:

fqComplexity -i input.fastq 

    -i input fastq file, with adapter/adapter ligation products removed 
    -x factor for any preprocessing that occurred, such as removing adapter/adapter ligation products 
    -y a factor for any postprocessing that occurred

About

Calculate necessary raw read depth and complexity of FASTQ files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published