Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown process directive: from #1

Open
NicMAlexandre opened this issue Jan 20, 2022 · 5 comments
Open

Unknown process directive: from #1

NicMAlexandre opened this issue Jan 20, 2022 · 5 comments

Comments

@NicMAlexandre
Copy link
Owner

  1. When I run the following:

nextflow run -params-file params.yml split_by_2megabases.nf -c nextflow.config

I get the error message:

[- ] process > makeWindows -
Unknown process directive: from

-- Check script 'split_by_2megabases.nf' at line: 53 or see '.nextflow.log' file for more details
Unexpected error [ClosedByInterruptException]

  1. For the processes process GatherVcf and process GzipVcf, I'm having trouble with how to assign the output. For Gathervcfs, it requires a list of all the vcf files for that chromosome. So this would include all of the vcf files with 2Mb ranges for that particular chromosome. Alternatively, we could just group all of the vcfs for an individual and output one vcf (preferred).
@mahesh-panchal
Copy link

  1. When I run the following:

nextflow run -params-file params.yml split_by_2megabases.nf -c nextflow.config

I get the error message:

[- ] process > makeWindows - Unknown process directive: from

-- Check script 'split_by_2megabases.nf' at line: 53 or see '.nextflow.log' file for more details Unexpected error [ClosedByInterruptException]

This looks like DSL1 doesn't like the from keyword on a new line. I'm not sure about this though, so have a try. It's been a long time since I've used DSL1 ( the DSL2 syntax is nicer in my opinion and easier to read - the channel connections are in a workflow block rather than spread across the processes ).

2. For the processes process GatherVcf and process GzipVcf, I'm having trouble with how to assign the output. For Gathervcfs, it requires a list of all the vcf files for that chromosome. So this would include all of the vcf files with 2Mb ranges for that particular chromosome. Alternatively, we could just group all of the vcfs for an individual and output one vcf (preferred).

I don't understand what you mean here. Can you explain which processes produce the output you want to combine as input for the Gathervcf process?
If you want to group outputs, you should make a key with which to join lists on, such as the file prefix.

e.g.

process FOO {

    input:
    path vcf from vcf_files_ch   // e.g. sample1.vcf.gz
    
    output:
    tuple val(prefix), path("windows.tsv") into window_ch // output here is a list [ 'sample1', file('windows.tsv') ]
    
    script:
    prefix = vcf.baseName  // Makes a Groovy variable with name of the vcf file with the file ending removed.
    """
    makewindows $vcf
    """
   
}

Then channel operators like join or groupBy can take outputs from multiple processes and combine them on the key (sample1 in this example), and then that can be supplied to the downstream process.

@NicMAlexandre
Copy link
Owner Author

  1. Okay, so then I need to add the following line to the beginning of my script:

nextflow.enable.dsl=2

  1. How would I change the syntax I have to be in line with dsl2? Could you annotate a process so that I have an example?

  2. So essentially, the ImputeVcf step is generating a vcf file for every 2Mb of every chromosome of every sample. Then, I want to use Gathervcf to combine all of these to just produce a single vcf file for the sample from these. This step takes a list of all the vcfs for the sample as an argument. Then the last GzipVcf step is just to compress this vcf.

@NicMAlexandre
Copy link
Owner Author

So the process FOO above needs to take in a list of all the chromosome vcfs for that sample and produce a unique list for each sample

@NicMAlexandre
Copy link
Owner Author

So currently, my files are as such:

vcfs:
Sample1.Chr1.vcf.gz
Sample1.Chr2.vcf.gz
Sample1.Chr3.vcf.gz
Sample2.Chr1.vcf.gz
Sample2.Chr2.vcf.gz
Sample2.Chr3.vcf.gz
Sample3.Chr1.vcf.gz
Sample3.Chr2.vcf.gz
Sample3.Chr3.vcf.gz

And I'm trying to get:
Sample1.imputed.vcf.gz
Sample2.imputed.vcf.gz
Sample3.imputed.vcf.gz

I added some functions into the main script, but I'm still having trouble with what goes with what.

@mahesh-panchal
Copy link

  1. Exactly.

  2. To convert DSL1 to DSL2

    1. Add a workflow block.
    2. List the processes in the order they're executed.
    3. Move all channel manipulations (including creating channels) inside the workflow block.
    4. For each process, comment out the from and into statements, and add emit labels to each output for readability.
    5. Go back to the workflow block and connect process outputs to inputs again.

    I'll make a pull request so you can see what I mean.

  3. What you want is the groupTuple operator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants