Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for generating mag samplesheet #544

Open
wants to merge 27 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
b0b37eb
Add chaining samplesheet with mag
sofstam Oct 10, 2024
590f9b1
Filter out correctly
sofstam Oct 14, 2024
738837e
Rename parameters
sofstam Oct 14, 2024
3d19176
Fix linting
sofstam Oct 14, 2024
80bed68
Fix linting
sofstam Oct 14, 2024
4a9fa88
Use same schema as createtaxdb
sofstam Oct 14, 2024
a3dea86
Use correct name of argument
sofstam Oct 14, 2024
b0ceef7
Add function
sofstam Oct 14, 2024
b5fe3db
Update nextflow_schema.json
sofstam Oct 14, 2024
a1cab25
Apply review suggestions
sofstam Oct 14, 2024
c315cae
Update docs/output.md
sofstam Oct 14, 2024
cd136d7
Review suggestions
sofstam Oct 14, 2024
7acbc4b
Merge branch 'generate-samplesheet' of https://github.com/sofstam/tax…
sofstam Oct 14, 2024
263c7d3
[automated] Fix code linting
nf-core-bot Oct 15, 2024
e3fa0ee
Add pattern to nextflow_schema.json
sofstam Oct 15, 2024
c6ac0cb
Prettier
sofstam Oct 15, 2024
aff979e
Review suggestions and new function
sofstam Oct 15, 2024
67f33e5
Update docs/output.md
sofstam Oct 16, 2024
94a28f4
Update docs/output.md
sofstam Oct 16, 2024
892b428
Remove tests folder
sofstam Oct 16, 2024
e0e83c3
Merge branch 'generate-samplesheet' of https://github.com/sofstam/tax…
sofstam Oct 16, 2024
0abfdf7
Use the same function as detaxizer
sofstam Oct 17, 2024
1e00c4c
LintinG
sofstam Oct 17, 2024
f79fc7d
[automated] Fix code linting
nf-core-bot Oct 22, 2024
7a25178
Apply suggestions from code review
jfy133 Oct 24, 2024
b729483
Get the samplesheet generate to generate se reads
jfy133 Oct 24, 2024
6eeb982
Fix run column
jfy133 Oct 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -758,7 +758,9 @@ The pipeline can also generate input files for the following downstream pipeline
<summary>Output files</summary>

- `downstream_samplesheets/`
- `mag.csv`: input sheet for that contains paths to preprocessed FASTQs (corresponding to what is saved with `--save_analysis_ready_fastqs`) that can be used to skip read preprocessing steps in nf-core/mag
- `mag-{pe,se}.csv`: input sheet for single-end and paired-end reads that contains paths to preprocessed short-read FASTQs (corresponding to what is saved with `--save_analysis_ready_fastqs`) that can be used to skip read preprocessing steps in nf-core/mag.
- Note: if you merge reads, these will be listed in teh `mag-se.csv`.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot make a suggestion but:

teh mag-se.csv. -> the mag-se.csv.

- Note: the nf-core/mag mandatory `group` column is filled with a dummy ID (`0`), you may wish to change this depending on your nf-core/mag settings!

</details>

Expand Down
58 changes: 28 additions & 30 deletions subworkflows/local/generate_downstream_samplesheets/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -7,64 +7,62 @@ workflow SAMPLESHEET_MAG {
ch_processed_reads

main:
format = 'csv' // most common format in nf-core
format_sep = ','
format = 'csv'

ch_list_for_samplesheet = ch_processed_reads.view()
.map {
meta, sample_id, instrument_platform,fastq_1,fastq_2,fasta ->
def sample = meta.id
def run = meta.run_accession //this should be optional
def group = ""
def short_reads_1 = file(params.outdir).toString() + '/' + meta.id + '/' + fastq_1.getName()
def short_reads_2 = meta.single_end ? "" : file(params.outdir).toString() + '/' + meta.id + '/' + fastq_2.getName()
def long_reads = meta.is_fasta ? file(params.outdir).toString() + '/' + meta.id + '/' + fasta.getName() : ""
[sample: sample, run: run, group: group, short_reads_1: short_reads_1, short_reads_2: short_reads_2, long_reads: long_reads]
ch_list_for_samplesheet = ch_processed_reads
.dump()
.map { meta, reads ->
def sample = meta.id
def run = params.perform_runmerging ? '' : meta.run_accession
def group = "0"
//this should be optional
def short_reads_1 = meta.is_fasta ? "" : file(params.outdir).toString() + '/analysis_ready_fastqs/' + reads[0].getName()
def short_reads_2 = meta.is_fasta || meta.single_end ? "" : file(params.outdir).toString() + '/analysis_ready_fastqs/' + reads[1].getName()
def long_reads = meta.is_fasta ? file(params.outdir).toString() + '/analysis_ready_fastqs/' + reads[0].getName() : ""

[sample: sample, run: run, group: group, short_reads_1: short_reads_1, short_reads_2: short_reads_2, long_reads: long_reads]
}
.view()
.tap{ ch_list_for_samplesheet_all }
.filter{ it.short_reads_1 != "" }
.branch{
.tap { ch_list_for_samplesheet_all }
.filter { it.short_reads_1 != "" }
.branch {
se: it.short_reads_2 == ""
pe: true
}
pe: it.short_reads_2 != ""
unknown: true
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the unknown: true here?

}

// Throw a warning that only long reads are not supported yet by MAG
ch_list_for_samplesheet_all
.filter{ it.long_reads != "" && it.short_reads_1 == "" }
.collect{ log.warn("[nf-core/taxprofiler] WARNING: Standalone long reads are not yet supported by the nf-core/mag pipeline and will not be in present in `mag-*.csv`. Sample: ${it.sample}" )}
.filter { it.long_reads != "" && it.short_reads_1 == "" }
.collect { log.warn("[nf-core/taxprofiler] WARNING: Standalone long reads are not yet supported by the nf-core/mag pipeline and will not be in present in `mag-*.csv`. Sample: ${it.sample}") }

channelToSamplesheet(ch_list_for_samplesheet.pe,"${params.outdir}/downstream_samplesheets/mag-pe", format)
channelToSamplesheet(ch_list_for_samplesheet.pe, "${params.outdir}/downstream_samplesheets/mag-pe", format)
channelToSamplesheet(ch_list_for_samplesheet.se, "${params.outdir}/downstream_samplesheets/mag-se", format)

}

workflow GENERATE_DOWNSTREAM_SAMPLESHEETS {

take:
ch_processed_reads

main:
def downstreampipeline_names = params.generate_pipeline_samplesheets.split(",")

if ( downstreampipeline_names.contains('mag') && params.save_analysis_ready_fastqs) {
if (downstreampipeline_names.contains('mag') && params.save_analysis_ready_fastqs) {
SAMPLESHEET_MAG(ch_processed_reads)
}

}

// Constructs the header string and then the strings of each row, and
def channelToSamplesheet(ch_list_for_samplesheet, path, format) {
format_sep = ["csv":",", "tsv":"\t", "txt":"\t"][format]
def format_sep = [csv: ",", tsv: "\t", txt: "\t"][format]

ch_header = ch_list_for_samplesheet
def ch_header = ch_list_for_samplesheet

ch_header
.first()
.map{ it.keySet().join(format_sep) }
.concat( ch_list_for_samplesheet.map{ it.values().join(format_sep) })
.map { it.keySet().join(format_sep) }
.concat(ch_list_for_samplesheet.map { it.values().join(format_sep) })
.collectFile(
name:"${path}.${format}",
name: "${path}.${format}",
newLine: true,
sort: false
)
Expand Down
12 changes: 11 additions & 1 deletion subworkflows/local/utils_nfcore_taxprofiler_pipeline/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,17 @@ workflow PIPELINE_COMPLETION {
//
def validateInputParameters() {
genomeExistsError()
}//

if (params.generate_downstream_samplesheets && !params.generate_pipeline_samplesheets) {
error('[nf-core/taxprofiler] ERROR: If supplying `--generate_downstream_samplesheets`, you must also specify which pipeline to generate for with `--generate_pipeline_samplesheets`! Check input.')
}

if ( params.generate_downstream_samplesheets && params.generate_pipeline_samplesheets.split(",").contains('mag') && !params.save_analysis_ready_fastqs ) {
error("[nf-core/taxprofiler] ERROR: To generate downstream samplesheets for nf-core/mag, you must also specify `--save_analysis_ready_fastqs`")
}
}

//
// Validate channels from input samplesheet
//
def validateInputSamplesheet(input) {
Expand Down
2 changes: 1 addition & 1 deletion workflows/taxprofiler.nf
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@ workflow TAXPROFILER {
// Samplesheet generation
//
if ( params.generate_downstream_samplesheets ) {
GENERATE_DOWNSTREAM_SAMPLESHEETS ( samplesheet )
GENERATE_DOWNSTREAM_SAMPLESHEETS ( ch_reads_runmerged )
}

//
Expand Down
Loading