From 43600c07517a35605eee539b1c91f0651f86c7f7 Mon Sep 17 00:00:00 2001 From: Lara Ianov Date: Fri, 8 Mar 2024 16:12:46 -0600 Subject: [PATCH] updated citations and README --- CITATIONS.md | 12 ------------ README.md | 33 +++++++++++++-------------------- 2 files changed, 13 insertions(+), 32 deletions(-) diff --git a/CITATIONS.md b/CITATIONS.md index 7cd2465..943e55d 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -10,10 +10,6 @@ ## Pipeline tools -- [BEDTools](https://pubmed.ncbi.nlm.nih.gov/20110278/) - - > Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PubMed PMID: 20110278; PubMed Central PMCID: PMC2832824. - - [BLAZE](https://www.biorxiv.org/content/10.1101/2022.08.16.504056v1) > You Y, Prawer Y D, De Paoli-Iseppi R, Hunt C P, Parish C L, Shim H, Clark M B. Identification of cell barcodes from long-read single-cell RNA-seq with BLAZE. bioRxiv 2022 Aug .08.16.504056; doi: 10.1101/2022.08.16.504056. @@ -48,10 +44,6 @@ - [pigz](https://zlib.net/pigz/) -- [Prowler](https://pubmed.ncbi.nlm.nih.gov/34473226/) - - > Lee S, Nguyen LT, Hayes BJ, Ross E. Prowler: A novel trimming algorithm for Oxford Nanopore sequence data. Bioinformatics 2021 Sep 2 doi:10.1093/bioinformatics/btab630. PubMed PMID: 34473226 - - [SAMtools](https://pubmed.ncbi.nlm.nih.gov/19505943/) > Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002. @@ -82,10 +74,6 @@ > Hao Y, Hao S, Andersen-Nissen E, Mauck WM, 3rd, Zheng S, Butler A, Lee MJ, Wilk AJ, Darby C, Zager M, Hoffman P, Stoeckius M, Papalexi E, Mimitou EP, Jain J, Srivastava A, Stuart T, Fleming LM, Yeung B, Rogers AJ, McElrath JM, Blish CA, Gottardo R, Smibert P, Satija R. Integrated analysis of multimodal single-cell data. Cell 2021 Jun 24; 184(13):3573-87 e29 doi:10.1016/j.cell.2021.04.048. PubMed PMID: 34062119; PubMed Central PMCID: PMC8238499. -- [tidyverse](https://joss.theoj.org/papers/10.21105/joss.01686) - - > Wickham H, Averick M, Bryan J, Winston C, McGowan LD, François R, Grolemund G, Hayes A , Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H. Welcome to the Tidyverse. Journal of Open Source Software 2019, 4(43), 1686, doi:10.21105/joss.01686 - ## Python libraries - [Biopython](https://pubmed.ncbi.nlm.nih.gov/19304878/) diff --git a/README.md b/README.md index 9dba8d5..153f1b8 100644 --- a/README.md +++ b/README.md @@ -12,14 +12,10 @@ ## Introduction -**nf-core/scnanoseq** is a bioinformatics best-practice analysis pipeline for 10X Genomics single-cell/nuclei RNA-seq for data derived from Oxford Nanopore Q20+ chemistry ([R10.4 flow cells (>Q20)](https://nanoporetech.com/about-us/news/oxford-nanopore-announces-technology-updates-nanopore-community-meeting)). Due to the expectation of >Q20 quality, the input data for the pipeline is not dependent on Illumina paired data. - - +**nf-core/scnanoseq** is a bioinformatics best-practice analysis pipeline for 10X Genomics single-cell/nuclei RNA-seq for data derived from Oxford Nanopore Q20+ chemistry ([R10.4 flow cells (>Q20)](https://nanoporetech.com/about-us/news/oxford-nanopore-announces-technology-updates-nanopore-community-meeting)). Due to the expectation of >Q20 quality, the input data for the pipeline is not dependent on Illumina paired data. Please note `scnanoseq` can also process Oxford data with older chemistry, but we encourage usage of the Q20+ chemistry. The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community! - - On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/scnanoseq/results). ## Pipeline summary @@ -48,7 +44,7 @@ On release, automated continuous integration tests run the pipeline on a full-si 14. UMI-based deduplication [`UMI-tools`](https://github.com/CGATOxford/UMI-tools) 15. Gene and transcript level matrices generation. [`IsoQuant`](https://github.com/ablab/IsoQuant) 16. Preliminary matrix QC ([`Seurat`](https://github.com/satijalab/seurat)) -17. Present QC for raw reads, trimmed reads, pre and post-extracted reads, mapping metrics and preliminary single-cell/nuclei QC ([`MultiQC`](http://multiqc.info/)) +17. Compile QC for raw reads, trimmed reads, pre and post-extracted reads, mapping metrics and preliminary single-cell/nuclei QC ([`MultiQC`](http://multiqc.info/)) ## Usage @@ -62,17 +58,17 @@ First, prepare a samplesheet with your input data that looks as follows: `samplesheet.csv`: ```csv -sample,fastq_1,cell_count -CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,1000 -CONTROL_REP1,AEG588A1_S2_L002_R1_001.fastq.gz,1000 -CONTROL_REP2,AEG588A2_S1_L002_R1_001.fastq.gz,1000 -CONTROL_REP3,AEG588A3_S1_L002_R1_001.fastq.gz,1000 -CONTROL_REP4,AEG588A4_S1_L002_R1_001.fastq.gz,1000 -CONTROL_REP4,AEG588A4_S2_L002_R1_001.fastq.gz,1000 -CONTROL_REP4,AEG588A4_S3_L002_R1_001.fastq.gz,1000 +sample,fastq,cell_count +CONTROL_REP1,AEG588A1_S1.fastq.gz,5000 +CONTROL_REP1,AEG588A1_S2.fastq.gz,5000 +CONTROL_REP2,AEG588A2_S1.fastq.gz,5000 +CONTROL_REP3,AEG588A3_S1.fastq.gz,5000 +CONTROL_REP4,AEG588A4_S1.fastq.gz,5000 +CONTROL_REP4,AEG588A4_S2.fastq.gz,5000 +CONTROL_REP4,AEG588A4_S3.fastq.gz,5000 ``` -Each row represents a single-end fastq file. Rows with the same sample identifier are considered technical replicates and will be automatically merged. cell_count refers to the expected number of cells you expect +Each row represents a single-end fastq file. Rows with the same sample identifier are considered technical replicates and will be automatically merged. `cell_count` refers to the expected number of cells you expect. ```bash nextflow run nf-core/scnanoseq \ @@ -94,20 +90,17 @@ To see the results of an example test run with a full size dataset refer to the For more details about the output files and reports, please refer to the [output documentation](https://nf-co.re/scnanoseq/output). -This pipeline produces feature barcode matrices at both the gene and transcript level and can retain introns within the counts themselves. These files are able to be ingested directly by most packages used for downstream analyses such as Seurat. In addition the pipeline produces a number of quality control metrics to assess in ensuring the confidence of the results of the samples that were processed. +This pipeline produces feature barcode matrices at both the gene and transcript level and can retain introns within the counts themselves. These files are able to be ingested directly by most packages used for downstream analyses such as `Seurat`. In addition the pipeline produces a number of quality control metrics to ensure that the samples processed meet expected metrics for single-cell/nuclei data. ## Credits nf-core/scnanoseq was originally written by [Austyn Trull](https://github.com/atrull314), and [Dr. Lara Ianov](https://github.com/lianov). -We thank the following people for their extensive assistance in the development of this pipeline: - We would also like to thank the following people and groups for their support, including financial support: - Dr. Elizabeth Worthey - University of Alabama at Birmingham Biological Data Science Core (U-BDS), RRID:SCR_021766, https://github.com/U-BDS - - +- Support from: 3P30CA013148-48S8 ## Contributions and Support