Skip to content

Commit

Permalink
updated citations and README
Browse files Browse the repository at this point in the history
  • Loading branch information
lianov committed Mar 8, 2024
1 parent a788dda commit 43600c0
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 32 deletions.
12 changes: 0 additions & 12 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,6 @@
## Pipeline tools

- [BEDTools](https://pubmed.ncbi.nlm.nih.gov/20110278/)

> Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PubMed PMID: 20110278; PubMed Central PMCID: PMC2832824.
- [BLAZE](https://www.biorxiv.org/content/10.1101/2022.08.16.504056v1)

> You Y, Prawer Y D, De Paoli-Iseppi R, Hunt C P, Parish C L, Shim H, Clark M B. Identification of cell barcodes from long-read single-cell RNA-seq with BLAZE. bioRxiv 2022 Aug .08.16.504056; doi: 10.1101/2022.08.16.504056.
Expand Down Expand Up @@ -48,10 +44,6 @@
- [pigz](https://zlib.net/pigz/)

- [Prowler](https://pubmed.ncbi.nlm.nih.gov/34473226/)

> Lee S, Nguyen LT, Hayes BJ, Ross E. Prowler: A novel trimming algorithm for Oxford Nanopore sequence data. Bioinformatics 2021 Sep 2 doi:10.1093/bioinformatics/btab630. PubMed PMID: 34473226
- [SAMtools](https://pubmed.ncbi.nlm.nih.gov/19505943/)

> Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002.
Expand Down Expand Up @@ -82,10 +74,6 @@

> Hao Y, Hao S, Andersen-Nissen E, Mauck WM, 3rd, Zheng S, Butler A, Lee MJ, Wilk AJ, Darby C, Zager M, Hoffman P, Stoeckius M, Papalexi E, Mimitou EP, Jain J, Srivastava A, Stuart T, Fleming LM, Yeung B, Rogers AJ, McElrath JM, Blish CA, Gottardo R, Smibert P, Satija R. Integrated analysis of multimodal single-cell data. Cell 2021 Jun 24; 184(13):3573-87 e29 doi:10.1016/j.cell.2021.04.048. PubMed PMID: 34062119; PubMed Central PMCID: PMC8238499.
- [tidyverse](https://joss.theoj.org/papers/10.21105/joss.01686)

> Wickham H, Averick M, Bryan J, Winston C, McGowan LD, François R, Grolemund G, Hayes A , Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H. Welcome to the Tidyverse. Journal of Open Source Software 2019, 4(43), 1686, doi:10.21105/joss.01686
## Python libraries

- [Biopython](https://pubmed.ncbi.nlm.nih.gov/19304878/)
Expand Down
33 changes: 13 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,10 @@

## Introduction

**nf-core/scnanoseq** is a bioinformatics best-practice analysis pipeline for 10X Genomics single-cell/nuclei RNA-seq for data derived from Oxford Nanopore Q20+ chemistry ([R10.4 flow cells (>Q20)](https://nanoporetech.com/about-us/news/oxford-nanopore-announces-technology-updates-nanopore-community-meeting)). Due to the expectation of >Q20 quality, the input data for the pipeline is not dependent on Illumina paired data.

<!-- TODO: after test write brief sentence on exon only vs intron method (1 and 2) --->
**nf-core/scnanoseq** is a bioinformatics best-practice analysis pipeline for 10X Genomics single-cell/nuclei RNA-seq for data derived from Oxford Nanopore Q20+ chemistry ([R10.4 flow cells (>Q20)](https://nanoporetech.com/about-us/news/oxford-nanopore-announces-technology-updates-nanopore-community-meeting)). Due to the expectation of >Q20 quality, the input data for the pipeline is not dependent on Illumina paired data. Please note `scnanoseq` can also process Oxford data with older chemistry, but we encourage usage of the Q20+ chemistry.

The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

<!-- TODO nf-core: Add full-sized test dataset and amend the paragraph below if applicable -->

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/scnanoseq/results).

## Pipeline summary
Expand Down Expand Up @@ -48,7 +44,7 @@ On release, automated continuous integration tests run the pipeline on a full-si
14. UMI-based deduplication [`UMI-tools`](https://github.com/CGATOxford/UMI-tools)
15. Gene and transcript level matrices generation. [`IsoQuant`](https://github.com/ablab/IsoQuant)
16. Preliminary matrix QC ([`Seurat`](https://github.com/satijalab/seurat))
17. Present QC for raw reads, trimmed reads, pre and post-extracted reads, mapping metrics and preliminary single-cell/nuclei QC ([`MultiQC`](http://multiqc.info/))
17. Compile QC for raw reads, trimmed reads, pre and post-extracted reads, mapping metrics and preliminary single-cell/nuclei QC ([`MultiQC`](http://multiqc.info/))

## Usage

Expand All @@ -62,17 +58,17 @@ First, prepare a samplesheet with your input data that looks as follows:
`samplesheet.csv`:

```csv
sample,fastq_1,cell_count
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,1000
CONTROL_REP1,AEG588A1_S2_L002_R1_001.fastq.gz,1000
CONTROL_REP2,AEG588A2_S1_L002_R1_001.fastq.gz,1000
CONTROL_REP3,AEG588A3_S1_L002_R1_001.fastq.gz,1000
CONTROL_REP4,AEG588A4_S1_L002_R1_001.fastq.gz,1000
CONTROL_REP4,AEG588A4_S2_L002_R1_001.fastq.gz,1000
CONTROL_REP4,AEG588A4_S3_L002_R1_001.fastq.gz,1000
sample,fastq,cell_count
CONTROL_REP1,AEG588A1_S1.fastq.gz,5000
CONTROL_REP1,AEG588A1_S2.fastq.gz,5000
CONTROL_REP2,AEG588A2_S1.fastq.gz,5000
CONTROL_REP3,AEG588A3_S1.fastq.gz,5000
CONTROL_REP4,AEG588A4_S1.fastq.gz,5000
CONTROL_REP4,AEG588A4_S2.fastq.gz,5000
CONTROL_REP4,AEG588A4_S3.fastq.gz,5000
```

Each row represents a single-end fastq file. Rows with the same sample identifier are considered technical replicates and will be automatically merged. cell_count refers to the expected number of cells you expect
Each row represents a single-end fastq file. Rows with the same sample identifier are considered technical replicates and will be automatically merged. `cell_count` refers to the expected number of cells you expect.

```bash
nextflow run nf-core/scnanoseq \
Expand All @@ -94,20 +90,17 @@ To see the results of an example test run with a full size dataset refer to the
For more details about the output files and reports, please refer to the
[output documentation](https://nf-co.re/scnanoseq/output).

This pipeline produces feature barcode matrices at both the gene and transcript level and can retain introns within the counts themselves. These files are able to be ingested directly by most packages used for downstream analyses such as Seurat. In addition the pipeline produces a number of quality control metrics to assess in ensuring the confidence of the results of the samples that were processed.
This pipeline produces feature barcode matrices at both the gene and transcript level and can retain introns within the counts themselves. These files are able to be ingested directly by most packages used for downstream analyses such as `Seurat`. In addition the pipeline produces a number of quality control metrics to ensure that the samples processed meet expected metrics for single-cell/nuclei data.

## Credits

nf-core/scnanoseq was originally written by [Austyn Trull](https://github.com/atrull314), and [Dr. Lara Ianov](https://github.com/lianov).

We thank the following people for their extensive assistance in the development of this pipeline:

We would also like to thank the following people and groups for their support, including financial support:

- Dr. Elizabeth Worthey
- University of Alabama at Birmingham Biological Data Science Core (U-BDS), RRID:SCR_021766, https://github.com/U-BDS

<!-- TODO from Lara: check that all financial support has been stated -->
- Support from: 3P30CA013148-48S8

## Contributions and Support

Expand Down

0 comments on commit 43600c0

Please sign in to comment.