Skip to content

Library statistics plots

Buys de Barbanson edited this page Sep 1, 2020 · 4 revisions

Singlcellmultiomics can generate quality testing plots for your libraries.

First things first, First demultiplex, map and tag your files as described NlaIII,scCHIC .

Make sure to end up with a folder structure where every library is a folder which contains the demultiplexed and mapped reads:

Libraries
 ├─ LibraryA
 │  ├─ demultiplexedR1.fastq.gz
 │  ├─ demultiplexedR2.fastq.gz
 │  ├─ rejectsR1.fastq.gz
 │  ├─ rejectsR2.fastq.gz
 │  └─ /tagged/
 │     ├── sorted.bam
 │     └── sorted.bai
 │  
 ├─ LibraryB
 │  ├─ demultiplexedR1.fastq.gz
 │  ├─ demultiplexedR2.fastq.gz
 │  ├─ rejectsR1.fastq.gz
 │  ├─ rejectsR2.fastq.gz
 │  └─ /tagged/
 │     ├── sorted.bam
 │     └── sorted.bai

Then change directory to the root folder libraries and run the libraryStatistics.py script:

libraryStatistics.py LibraryA LibraryB

If your bam file is not called sorted.bam but something else, or the subfolder is not tagged use the -tagged_bam parameter and supply the relative path to the tagged file.

For example:

Libraries
 ├─ LibraryA
 │  ├─ demultiplexedR1.fastq.gz
 │  ├─ demultiplexedR2.fastq.gz
 │  ├─ rejectsR1.fastq.gz
 │  ├─ rejectsR2.fastq.gz
 │  └─ /nlatagged/
 │     ├── mapped.bam
 │     └── mapped.bai

For this structure use the command: libraryStatistics.py LibraryA -tagged_bam /nlatagged/mapped.bam

The script will add two directories to every library: ./plots and ./tables The plots directory contains plots of the various statistics and the tables directory contains files with the statistic data used for the plots in CSV format.

The following statistics are calculated:

            MethylationContextHistogram
            MappingQualityHistogram
            OversequencingHistogram
            FragmentSizeHistogram
            TrimmingStats
            AlleleHistogram
            RejectionReasonHistogram
            DataTypeHistogram
            TagHistogram
            PlateStatistic
            ScCHICLigation

The code for these statistics are defined at singlecellmultiomics/statistic


If you don't care about how many raw reads have been lost during the mapping and tagging process you can supply a tagged bam file directly into libraryStatistics.py

libraryStatistics.py ./LibraryA/nlatagged/mapped.bam