diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index ad8a7f87..5d64f953 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -19,7 +19,7 @@ If you'd like to write some code for nf-core/proteinfold, the standard workflow 1. Check that there isn't already an issue about your idea in the [nf-core/proteinfold issues](https://github.com/nf-core/proteinfold/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/proteinfold repository](https://github.com/nf-core/proteinfold) to your GitHub account 3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) -4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). +4. Use `nf-core pipelines schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). 5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). @@ -40,7 +40,7 @@ There are typically two types of tests that run: ### Lint tests `nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. -To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. +To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core pipelines lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. @@ -75,7 +75,7 @@ If you wish to contribute a new step, please use the following coding standards: 2. Write the process block (see below). 3. Define the output channel if needed (see below). 4. Add any new parameters to `nextflow.config` with a default (see below). -5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core schema build` tool). +5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core pipelines schema build` tool). 6. Add sanity checks and validation for all relevant parameters. 7. Perform local tests to validate that the new code works as expected. 8. If applicable, add a new test command in `.github/workflow/ci.yml`. @@ -86,11 +86,11 @@ If you wish to contribute a new step, please use the following coding standards: Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope. -Once there, use `nf-core schema build` to add to `nextflow_schema.json`. +Once there, use `nf-core pipelines schema build` to add to `nextflow_schema.json`. ### Default processes resource requirements -Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. +Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/main/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. The process resources can be passed on to the tool dynamically within the process with the `${task.cpus}` and `${task.memory}` variables in the `script:` block. @@ -103,7 +103,7 @@ Please use the following naming schemes, to make it easy to understand what is g ### Nextflow version bumping -If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]` +If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core pipelines bump-version --nextflow . [min-nf-version]` ### Images and figures diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 8dc3e6a4..992c391e 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -17,7 +17,7 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/prot - [ ] If you've fixed a bug or added code that should be tested, add tests! - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/proteinfold/tree/master/.github/CONTRIBUTING.md) - [ ] If necessary, also make a PR on the nf-core/proteinfold _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. -- [ ] Make sure your code lints (`nf-core lint`). +- [ ] Make sure your code lints (`nf-core pipelines lint`). - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir `). - [ ] Check for unexpected warnings in debug mode (`nextflow run . -profile debug,test,docker --outdir `). - [ ] Usage Documentation in `docs/usage.md` is updated. diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml index 3774758d..ea0a032b 100644 --- a/.github/workflows/awsfulltest.yml +++ b/.github/workflows/awsfulltest.yml @@ -1,16 +1,21 @@ name: nf-core AWS full size tests -# This workflow is triggered on published releases. +# This workflow is triggered on PRs opened against the master branch. # It can be additionally triggered manually with GitHub actions workflow dispatch button. # It runs the -profile 'test_full' on AWS batch on: - release: - types: [published] + pull_request: + branches: + - master workflow_dispatch: + pull_request_review: + types: [submitted] + jobs: run-platform: name: Run AWS full tests - if: github.repository == 'nf-core/proteinfold' + # run only if the PR is approved by at least 2 reviewers and against the master branch or manually triggered + if: github.repository == 'nf-core/proteinfold' && github.event.review.state == 'approved' && github.event.pull_request.base.ref == 'master' || github.event_name == 'workflow_dispatch' runs-on: ubuntu-latest # Do a full-scale run on each of the mode strategy: @@ -27,6 +32,18 @@ jobs: "esmfold_multimer", ] steps: + - uses: octokit/request-action@v2.x + id: check_approvals + with: + route: GET /repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/reviews + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - id: test_variables + if: github.event_name != 'workflow_dispatch' + run: | + JSON_RESPONSE='${{ steps.check_approvals.outputs.data }}' + CURRENT_APPROVALS_COUNT=$(echo $JSON_RESPONSE | jq -c '[.[] | select(.state | contains("APPROVED")) ] | length') + test $CURRENT_APPROVALS_COUNT -ge 2 || exit 1 # At least 2 approvals are required - name: Launch workflow via Seqera Platform uses: seqeralabs/action-tower-launch@v2 with: diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 47ad6707..161ca5e8 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -7,9 +7,12 @@ on: pull_request: release: types: [published] + workflow_dispatch: env: NXF_ANSI_LOG: false + NXF_SINGULARITY_CACHEDIR: ${{ github.workspace }}/.singularity + NXF_SINGULARITY_LIBRARYDIR: ${{ github.workspace }}/.singularity concurrency: group: "${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}" @@ -17,16 +20,22 @@ concurrency: jobs: test: - name: Run pipeline with test data + name: "Run pipeline with test data (${{ matrix.NXF_VER }} | ${{ matrix.profile }} | ${{ matrix.test_profile }})" # Only run on push if this is the nf-core dev branch (merged PRs) if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/proteinfold') }}" runs-on: ubuntu-latest strategy: matrix: NXF_VER: - - "23.04.0" + - "24.04.2" - "latest-everything" - parameters: + profile: + - "conda" + - "docker" + - "singularity" + test_name: + - "test" + test_profile: - "test" - "test_alphafold2_split" - "test_alphafold2_download" @@ -34,19 +43,62 @@ jobs: - "test_colabfold_webserver" - "test_colabfold_download" - "test_esmfold" - + isMaster: + - ${{ github.base_ref == 'master' }} + # Exclude conda and singularity on dev + exclude: + - isMaster: false + profile: "conda" + - isMaster: false + profile: "singularity" steps: - name: Check out pipeline code uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 - - name: Install Nextflow + - name: Set up Nextflow uses: nf-core/setup-nextflow@v2 with: version: "${{ matrix.NXF_VER }}" - - name: Disk space cleanup + - name: Set up Apptainer + if: matrix.profile == 'singularity' + uses: eWaterCycle/setup-apptainer@main + + - name: Set up Singularity + if: matrix.profile == 'singularity' + run: | + mkdir -p $NXF_SINGULARITY_CACHEDIR + mkdir -p $NXF_SINGULARITY_LIBRARYDIR + + - name: Set up Miniconda + if: matrix.profile == 'conda' + uses: conda-incubator/setup-miniconda@a4260408e20b96e80095f42ff7f1a15b27dd94ca # v3 + with: + miniconda-version: "latest" + auto-update-conda: true + conda-solver: libmamba + channels: conda-forge,bioconda + + - name: Set up Conda + if: matrix.profile == 'conda' + run: | + echo $(realpath $CONDA)/condabin >> $GITHUB_PATH + echo $(realpath python) >> $GITHUB_PATH + + - name: Clean up Disk space uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1 - - name: Run pipeline with test data ${{ matrix.parameters }} profile + - name: Run pipeline with test data (docker) run: | - nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.parameters }},docker --outdir ./results_${{ matrix.parameters }} + nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.test_profile }},docker --outdir ./results + + - name: Run pipeline with test data (singularity) + run: | + nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.test_profile }},singularity --outdir ./results + if: "${{ github.base_ref == 'master' }}" + + # ## Warning: Pipeline can not be run with conda + # - name: Run pipeline with test data (conda) + # run: | + # nextflow run ${GITHUB_WORKSPACE} -profile test,conda --outdir ./results + # if: "${{ github.base_ref == 'master' }}" diff --git a/.github/workflows/download_pipeline.yml b/.github/workflows/download_pipeline.yml index 640ac03c..51f84a59 100644 --- a/.github/workflows/download_pipeline.yml +++ b/.github/workflows/download_pipeline.yml @@ -1,4 +1,4 @@ -name: Test successful pipeline download with 'nf-core download' +name: Test successful pipeline download with 'nf-core pipelines download' # Run the workflow when: # - dispatched manually @@ -8,7 +8,7 @@ on: workflow_dispatch: inputs: testbranch: - description: "The specific branch you wish to utilize for the test execution of nf-core download." + description: "The specific branch you wish to utilize for the test execution of nf-core pipelines download." required: true default: "dev" pull_request: @@ -39,9 +39,11 @@ jobs: with: python-version: "3.12" architecture: "x64" - - uses: eWaterCycle/setup-singularity@931d4e31109e875b13309ae1d07c70ca8fbc8537 # v7 + + - name: Setup Apptainer + uses: eWaterCycle/setup-apptainer@4bb22c52d4f63406c49e94c804632975787312b3 # v2.0.0 with: - singularity-version: 3.8.3 + apptainer-version: 1.3.4 - name: Install dependencies run: | @@ -54,33 +56,64 @@ jobs: echo "REPOTITLE_LOWERCASE=$(basename ${GITHUB_REPOSITORY,,})" >> ${GITHUB_ENV} echo "REPO_BRANCH=${{ github.event.inputs.testbranch || 'dev' }}" >> ${GITHUB_ENV} + - name: Make a cache directory for the container images + run: | + mkdir -p ./singularity_container_images + - name: Download the pipeline env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images run: | - nf-core download ${{ env.REPO_LOWERCASE }} \ + nf-core pipelines download ${{ env.REPO_LOWERCASE }} \ --revision ${{ env.REPO_BRANCH }} \ --outdir ./${{ env.REPOTITLE_LOWERCASE }} \ --compress "none" \ --container-system 'singularity' \ - --container-library "quay.io" -l "docker.io" -l "ghcr.io" \ + --container-library "quay.io" -l "docker.io" -l "community.wave.seqera.io" \ --container-cache-utilisation 'amend' \ - --download-configuration + --download-configuration 'yes' - name: Inspect download run: tree ./${{ env.REPOTITLE_LOWERCASE }} + - name: Count the downloaded number of container images + id: count_initial + run: | + image_count=$(ls -1 ./singularity_container_images | wc -l | xargs) + echo "Initial container image count: $image_count" + echo "IMAGE_COUNT_INITIAL=$image_count" >> ${GITHUB_ENV} + - name: Run the downloaded pipeline (stub) id: stub_run_pipeline continue-on-error: true env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images NXF_SINGULARITY_HOME_MOUNT: true run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -stub -profile test,singularity --outdir ./results - name: Run the downloaded pipeline (stub run not supported) id: run_pipeline if: ${{ job.steps.stub_run_pipeline.status == failure() }} env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images NXF_SINGULARITY_HOME_MOUNT: true run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -profile test,singularity --outdir ./results + + - name: Count the downloaded number of container images + id: count_afterwards + run: | + image_count=$(ls -1 ./singularity_container_images | wc -l | xargs) + echo "Post-pipeline run container image count: $image_count" + echo "IMAGE_COUNT_AFTER=$image_count" >> ${GITHUB_ENV} + + - name: Compare container image counts + run: | + if [ "${{ env.IMAGE_COUNT_INITIAL }}" -ne "${{ env.IMAGE_COUNT_AFTER }}" ]; then + initial_count=${{ env.IMAGE_COUNT_INITIAL }} + final_count=${{ env.IMAGE_COUNT_AFTER }} + difference=$((final_count - initial_count)) + echo "$difference additional container images were \n downloaded at runtime . The pipeline has no support for offline runs!" + tree ./singularity_container_images + exit 1 + else + echo "The pipeline can be downloaded successfully!" + fi diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 1fcafe88..a502573c 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -1,6 +1,6 @@ name: nf-core linting # This workflow is triggered on pushes and PRs to the repository. -# It runs the `nf-core lint` and markdown lint tests to ensure +# It runs the `nf-core pipelines lint` and markdown lint tests to ensure # that the code meets the nf-core guidelines. on: push: @@ -41,17 +41,32 @@ jobs: python-version: "3.12" architecture: "x64" + - name: read .nf-core.yml + uses: pietrobolcato/action-read-yaml@1.1.0 + id: read_yml + with: + config: ${{ github.workspace }}/.nf-core.yml + - name: Install dependencies run: | python -m pip install --upgrade pip - pip install nf-core + pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }} + + - name: Run nf-core pipelines lint + if: ${{ github.base_ref != 'master' }} + env: + GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} + run: nf-core -l lint_log.txt pipelines lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - - name: Run nf-core lint + - name: Run nf-core pipelines lint --release + if: ${{ github.base_ref == 'master' }} env: GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} - run: nf-core -l lint_log.txt lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md + run: nf-core -l lint_log.txt pipelines lint --release --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - name: Save PR number if: ${{ always() }} diff --git a/.github/workflows/linting_comment.yml b/.github/workflows/linting_comment.yml index 40acc23f..42e519bf 100644 --- a/.github/workflows/linting_comment.yml +++ b/.github/workflows/linting_comment.yml @@ -11,7 +11,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Download lint results - uses: dawidd6/action-download-artifact@09f2f74827fd3a8607589e5ad7f9398816f540fe # v3 + uses: dawidd6/action-download-artifact@bf251b5aa9c2f7eeb574a96ee720e24f801b7c11 # v6 with: workflow: linting.yml workflow_conclusion: completed diff --git a/.github/workflows/release-announcements.yml b/.github/workflows/release-announcements.yml index 03ecfcf7..c6ba35df 100644 --- a/.github/workflows/release-announcements.yml +++ b/.github/workflows/release-announcements.yml @@ -12,7 +12,7 @@ jobs: - name: get topics and convert to hashtags id: get_topics run: | - echo "topics=$(curl -s https://nf-co.re/pipelines.json | jq -r '.remote_workflows[] | select(.full_name == "${{ github.repository }}") | .topics[]' | awk '{print "#"$0}' | tr '\n' ' ')" >> $GITHUB_OUTPUT + echo "topics=$(curl -s https://nf-co.re/pipelines.json | jq -r '.remote_workflows[] | select(.full_name == "${{ github.repository }}") | .topics[]' | awk '{print "#"$0}' | tr '\n' ' ')" | sed 's/-//g' >> $GITHUB_OUTPUT - uses: rzr/fediverse-action@master with: diff --git a/.github/workflows/template_version_comment.yml b/.github/workflows/template_version_comment.yml new file mode 100644 index 00000000..e8aafe44 --- /dev/null +++ b/.github/workflows/template_version_comment.yml @@ -0,0 +1,46 @@ +name: nf-core template version comment +# This workflow is triggered on PRs to check if the pipeline template version matches the latest nf-core version. +# It posts a comment to the PR, even if it comes from a fork. + +on: pull_request_target + +jobs: + template_version: + runs-on: ubuntu-latest + steps: + - name: Check out pipeline code + uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 + with: + ref: ${{ github.event.pull_request.head.sha }} + + - name: Read template version from .nf-core.yml + uses: nichmor/minimal-read-yaml@v0.0.2 + id: read_yml + with: + config: ${{ github.workspace }}/.nf-core.yml + + - name: Install nf-core + run: | + python -m pip install --upgrade pip + pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }} + + - name: Check nf-core outdated + id: nf_core_outdated + run: echo "OUTPUT=$(pip list --outdated | grep nf-core)" >> ${GITHUB_ENV} + + - name: Post nf-core template version comment + uses: mshick/add-pr-comment@b8f338c590a895d50bcbfa6c5859251edc8952fc # v2 + if: | + contains(env.OUTPUT, 'nf-core') + with: + repo-token: ${{ secrets.NF_CORE_BOT_AUTH_TOKEN }} + allow-repeats: false + message: | + > [!WARNING] + > Newer version of the nf-core template is available. + > + > Your pipeline is using an old version of the nf-core template: ${{ steps.read_yml.outputs['nf_core_version'] }}. + > Please update your pipeline to the latest version. + > + > For more documentation on how to update your pipeline, please see the [nf-core documentation](https://github.com/nf-core/tools?tab=readme-ov-file#sync-a-pipeline-with-the-template) and [Synchronisation documentation](https://nf-co.re/docs/contributing/sync). + # diff --git a/.gitignore b/.gitignore index 5124c9ac..a42ce016 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,4 @@ results/ testing/ testing* *.pyc +null/ diff --git a/.gitpod.yml b/.gitpod.yml index 105a1821..46118637 100644 --- a/.gitpod.yml +++ b/.gitpod.yml @@ -4,17 +4,14 @@ tasks: command: | pre-commit install --install-hooks nextflow self-update - - name: unset JAVA_TOOL_OPTIONS - command: | - unset JAVA_TOOL_OPTIONS vscode: extensions: # based on nf-core.nf-core-extensionpack - - esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code + #- esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code - EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files - Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar - mechatroner.rainbow-csv # Highlight columns in csv files in different colors - # - nextflow.nextflow # Nextflow syntax highlighting + - nextflow.nextflow # Nextflow syntax highlighting - oderwat.indent-rainbow # Highlight indentation level - streetsidesoftware.code-spell-checker # Spelling checker for source code - charliermarsh.ruff # Code linter Ruff diff --git a/.nf-core.yml b/.nf-core.yml index 69e8d9bf..71682137 100644 --- a/.nf-core.yml +++ b/.nf-core.yml @@ -1,6 +1,23 @@ -repository_type: pipeline -nf_core_version: "2.14.1" +bump_version: null lint: + actions_ci: false files_unchanged: + - .github/workflows/linting.yml - .github/CONTRIBUTING.md multiqc_config: false +nf_core_version: 3.0.2 +org_path: null +repository_type: pipeline +template: + author: Athanasios Baltzis, Jose Espinosa-Carrasco, Harshil Patel + description: Protein 3D structure prediction pipeline + force: false + is_nfcore: true + name: proteinfold + org: nf-core + outdir: . + skip_features: + - fastqc + - igenomes + version: 1.2.0dev +update: null diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 4dc0f1dc..9e9f0e1c 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -7,7 +7,7 @@ repos: - prettier@3.2.5 - repo: https://github.com/editorconfig-checker/editorconfig-checker.python - rev: "2.7.3" + rev: "3.0.3" hooks: - id: editorconfig-checker alias: ec diff --git a/CHANGELOG.md b/CHANGELOG.md index 2fbddadb..25051396 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,12 +3,22 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [[1.1.1](https://github.com/nf-core/proteinfold/releases/tag/1.1.1)] - 2025-07-30 +## v1.2.0dev - [date] -- Minor patch release to fix multiqc report. +### Enhancements & fixes + +- [[#177](https://github.com/nf-core/proteinfold/issues/177)]- Fix typo in some instances of model preset `alphafold2_ptm`. +- [[PR #178](https://github.com/nf-core/proteinfold/pull/178)] - Enable running multiple modes in parallel. +- [[#179](https://github.com/nf-core/proteinfold/issues/179)]- Produce an interactive html report for the predicted structures. +- [[#180](https://github.com/nf-core/proteinfold/issues/180)]- Implement Fooldseek. +- [[#188](https://github.com/nf-core/proteinfold/issues/188)]- Fix colabfold image to run in gpus. + +## [[1.1.1](https://github.com/nf-core/proteinfold/releases/tag/1.1.1)] - 2025-07-30 ### Enhancements & fixes +- Minor patch release to fix multiqc report. + ## [[1.1.0](https://github.com/nf-core/proteinfold/releases/tag/1.1.0)] - 2025-06-25 ### Credits @@ -62,6 +72,7 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements - [[PR ##163](https://github.com/nf-core/proteinfold/pull/163)] - Fix full test CI. - [[#150]](https://github.com/nf-core/proteinfold/issues/150)] - Add thanks to the AWS Open Data Sponsorship program in `README.md`. - [[PR ##166](https://github.com/nf-core/proteinfold/pull/166)] - Create 2 different parameters for Colabfold and ESMfold number of recycles. +- [[PR ##205](https://github.com/nf-core/proteinfold/pull/205)] - Change input schema from `sequence,fasta` to `id,fasta`. ### Parameters diff --git a/README.md b/README.md index 9653ad17..0cbbbea7 100644 --- a/README.md +++ b/README.md @@ -6,10 +6,10 @@ [![GitHub Actions CI Status](https://github.com/nf-core/proteinfold/actions/workflows/ci.yml/badge.svg)](https://github.com/nf-core/proteinfold/actions/workflows/ci.yml) -[![GitHub Actions Linting Status](https://github.com/nf-core/proteinfold/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/proteinfold/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/proteinfold/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX) +[![GitHub Actions Linting Status](https://github.com/nf-core/proteinfold/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/proteinfold/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/proteinfold/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.13135393-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.13135393) [![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com) -[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.04.0-23aa62.svg)](https://www.nextflow.io/) +[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/) [![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/) [![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/) [![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/) @@ -139,8 +139,7 @@ The pipeline takes care of downloading the databases and parameters required by ``` > [!WARNING] -> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; -> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files). +> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files). For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/proteinfold/usage) and the [parameter documentation](https://nf-co.re/proteinfold/parameters). diff --git a/assets/NO_FILE b/assets/NO_FILE new file mode 100644 index 00000000..e69de29b diff --git a/assets/comparison_template.html b/assets/comparison_template.html new file mode 100644 index 00000000..44158b03 --- /dev/null +++ b/assets/comparison_template.html @@ -0,0 +1,789 @@ + + + + + + + Protein structure comparison + + + + + + + + + + + + + + + +
+ +
+ +
+ + + +
+ +
+ + + + + +
+ +
+ +
+ +
+
Navigation
+
+
+ Scroll up/down + to zoom in and out +
+
+ Click + drag + to rotate the structure +
+
+ CTRL + click + drag + to move the structure +
+
+ Click + an atom to bring it into focus +
+
+
+
+
Display
+
+ + +
+
+
+
+ +
+
+
+ +
+
    +
    + +
    +
    +
    Information
    +
    +
    Program: *prog_name*
    +
    ID: *sample_name*
    +
    + Average pLDDT: + +
    +
    +
    +
    +
    Download
    +
    + + +
    +
    +
    +
    +
    +
    pLDDT
    +
    +
    +
    +
    +
    +
    +
    +
    +
    Sequence Coverage
    +
    +
    + +
    + +
    +
    +
    + + + +
    +
    +

    + The Australian BioCommons + is supported by + Bioplatforms Australia +

    +

    + Bioplatforms Australia + is enabled by + NCRIS +

    +
    +
    +
    + + + diff --git a/assets/multiqc_config.yml b/assets/multiqc_config.yml index 3b58e3d0..f6acb16a 100644 --- a/assets/multiqc_config.yml +++ b/assets/multiqc_config.yml @@ -1,7 +1,7 @@ report_comment: > - This report has been generated by the nf-core/proteinfold + This report has been generated by the nf-core/proteinfold analysis pipeline. For information about how to interpret these results, please see the - documentation. + documentation. report_section_order: "nf-core-proteinfold-methods-description": order: -1000 diff --git a/assets/proteinfold_template.html b/assets/proteinfold_template.html new file mode 100644 index 00000000..2bb4c6ff --- /dev/null +++ b/assets/proteinfold_template.html @@ -0,0 +1,872 @@ + + + + + + + Protein structure prediction + + + + + + + + + + + + + + + +
    + +
    + +
    + + + +
    + +
    + +
    +
    +
    +
    +
    +
    <50
    +
    70
    +
    90+
    +
    +
    +
    +
    +

    + Alphafold produces a + + per-residue confidence score (pLDDT) + + between 0 and 100. Some regions below 50 pLDDT may be unstructured in isolation. +

    +
    +
    + + + + +
    + +
    + +
    +
    + +
    +
    Information
    +
    +
    +
    Program: *prog_name*
    +
    ID: *sample_name*
    +
    +
    + Average pLDDT: + +
    +
    +
    + +
    +
    Navigation
    +
    +
    + Scroll up/down + to zoom in and out +
    +
    + Click + drag + to rotate the structure +
    +
    + CTRL + click + drag + to move the structure +
    +
    + Click + an atom to bring it into focus +
    +
    +
    +
    + +
    +
    +
    Representations
    +
    + + + + +
    +
    +
    +
    +
    +
    Display
    +
    + + +
    +
    +
    +
    Download
    +
    + + +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +
    +
    +
    Sequence Coverage
    +
    +
    + +
    +
    +
    +
    +
    pLDDT
    +
    +
    +
    +
    +
    +
    + +
    + +
    +
    +
    + + + +
    +
    +

    + The Australian BioCommons + is supported by + Bioplatforms Australia +

    +

    + Bioplatforms Australia + is enabled by + NCRIS +

    +
    +
    +
    + + + diff --git a/assets/schema_input.json b/assets/schema_input.json index b16e3ae5..133802ac 100644 --- a/assets/schema_input.json +++ b/assets/schema_input.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/nf-core/proteinfold/master/assets/schema_input.json", "title": "nf-core/proteinfold pipeline - params.input schema", "description": "Schema for the file provided with params.input", @@ -7,7 +7,7 @@ "items": { "type": "object", "properties": { - "sequence": { + "id": { "type": "string", "pattern": "^\\S+$", "errorMessage": "Sequence name must be provided and cannot contain spaces", @@ -21,6 +21,6 @@ "errorMessage": "Fasta file must be provided, cannot contain spaces and must have extension '.fa' or '.fasta'" } }, - "required": ["sequence", "fasta"] + "required": ["id", "fasta"] } } diff --git a/bin/extract_output.py b/bin/extract_output.py new file mode 100755 index 00000000..3b22f8e5 --- /dev/null +++ b/bin/extract_output.py @@ -0,0 +1,32 @@ +#!/usr/bin/env python +import pickle +import os +import argparse + + +def read_pkl(id, pkl_files): + for pkl_file in pkl_files: + dict_data = pickle.load(open(pkl_file, "rb")) + if pkl_file.endswith("features.pkl"): + with open(f"{id}_msa.tsv", "w") as out_f: + for val in dict_data["msa"]: + out_f.write("\t".join([str(x) for x in val]) + "\n") + else: + model_id = ( + os.path.basename(pkl_file) + .replace("result_model_", "") + .replace("_pred_0.pkl", "") + ) + with open(f"{id}_lddt_{model_id}.tsv", "w") as out_f: + out_f.write("\t".join([str(x) for x in dict_data["plddt"]]) + "\n") + + +parser = argparse.ArgumentParser() +parser.add_argument("--pkls", dest="pkls", required=True, nargs="+") +parser.add_argument("--name", dest="name") +parser.add_argument("--output_dir", dest="output_dir") +parser.set_defaults(output_dir="") +parser.set_defaults(name="") +args = parser.parse_args() + +read_pkl(args.name, args.pkls) diff --git a/bin/generate_comparison_report.py b/bin/generate_comparison_report.py new file mode 100755 index 00000000..bea765f9 --- /dev/null +++ b/bin/generate_comparison_report.py @@ -0,0 +1,270 @@ +#!/usr/bin/env python + +import os +import argparse +from collections import OrderedDict +import base64 +import plotly.graph_objects as go +from Bio import PDB + + +def generate_output(plddt_data, name, out_dir, generate_tsv, pdb): + plddt_per_model = OrderedDict() + output_data = plddt_data + + if generate_tsv == "y": + for plddt_path in output_data: + with open(plddt_path, "r") as in_file: + plddt_per_model[os.path.basename(plddt_path)[:-4]] = [ + float(x) for x in in_file.read().strip().split() + ] + else: + for i, plddt_values_str in enumerate(output_data): + plddt_per_model[i] = [] + plddt_per_model[i] = [float(x) for x in plddt_values_str.strip().split()] + + fig = go.Figure() + for idx, (model_name, value_plddt) in enumerate(plddt_per_model.items()): + rank_label = os.path.splitext(pdb[idx])[0] + fig.add_trace( + go.Scatter( + x=list(range(len(value_plddt))), + y=value_plddt, + mode="lines", + name=rank_label, + text=[f"({i}, {value:.2f})" for i, value in enumerate(value_plddt)], + hoverinfo="text", + ) + ) + fig.update_layout( + title=dict(text="Predicted LDDT per position", x=0.5, xanchor="center"), + xaxis=dict( + title="Positions", showline=True, linecolor="black", gridcolor="WhiteSmoke" + ), + yaxis=dict( + title="Predicted LDDT", + range=[0, 100], + minallowed=0, + maxallowed=100, + showline=True, + linecolor="black", + gridcolor="WhiteSmoke", + ), + legend=dict(y=0, x=1), + plot_bgcolor="white", + width=600, + height=600, + modebar_remove=["toImage", "zoomIn", "zoomOut"], + ) + html_content = fig.to_html( + full_html=False, + include_plotlyjs="cdn", + config={"displayModeBar": True, "displaylogo": False, "scrollZoom": True}, + ) + + with open( + f"{out_dir}/{name+('_' if name else '')}coverage_LDDT.html", "w" + ) as out_file: + out_file.write(html_content) + + +def align_structures_old(structures): + parser = PDB.PDBParser(QUIET=True) + structures = [ + parser.get_structure(f"Structure_{i}", pdb) for i, pdb in enumerate(structures) + ] + + ref_structure = structures[0] + ref_atoms = [atom for atom in ref_structure.get_atoms()] + + super_imposer = PDB.Superimposer() + aligned_structures = [structures[0]] # Include the reference structure in the list + + for i, structure in enumerate(structures[1:], start=1): + target_atoms = [atom for atom in structure.get_atoms()] + + super_imposer.set_atoms(ref_atoms, target_atoms) + super_imposer.apply(structure.get_atoms()) + + aligned_structure = f"aligned_structure_{i}.pdb" + io = PDB.PDBIO() + io.set_structure(structure) + io.save(aligned_structure) + aligned_structures.append(aligned_structure) + + return aligned_structures + + +def align_structures(structures): + parser = PDB.PDBParser(QUIET=True) + structures = [ + parser.get_structure(f"Structure_{i}", pdb) for i, pdb in enumerate(structures) + ] + ref_structure = structures[0] + + common_atoms = set( + f"{atom.get_parent().get_id()[1]}-{atom.name}" + for atom in ref_structure.get_atoms() + ) + for i, structure in enumerate(structures[1:], start=1): + common_atoms = common_atoms.intersection( + set( + f"{atom.get_parent().get_id()[1]}-{atom.name}" + for atom in structure.get_atoms() + ) + ) + + ref_atoms = [ + atom + for atom in ref_structure.get_atoms() + if f"{atom.get_parent().get_id()[1]}-{atom.name}" in common_atoms + ] + # print(ref_atoms) + super_imposer = PDB.Superimposer() + aligned_structures = [structures[0]] # Include the reference structure in the list + + for i, structure in enumerate(structures[1:], start=1): + target_atoms = [ + atom + for atom in structure.get_atoms() + if f"{atom.get_parent().get_id()[1]}-{atom.name}" in common_atoms + ] + + super_imposer.set_atoms(ref_atoms, target_atoms) + super_imposer.apply(structure.get_atoms()) + + aligned_structure = f"aligned_structure_{i}.pdb" + io = PDB.PDBIO() + io.set_structure(structure) + io.save(aligned_structure) + aligned_structures.append(aligned_structure) + + return aligned_structures + + +def pdb_to_lddt(pdb_files, generate_tsv): + pdb_files_sorted = pdb_files + pdb_files_sorted.sort() + + output_lddt = [] + averages = [] + + for pdb_file in pdb_files_sorted: + plddt_values = [] + current_resd = [] + last = None + with open(pdb_file, "r") as infile: + for line in infile: + columns = line.split() + if len(columns) >= 11: + if last and last != columns[5]: + plddt_values.append(sum(current_resd) / len(current_resd)) + current_resd = [] + current_resd.append(float(columns[10])) + last = columns[5] + if len(current_resd) > 0: + plddt_values.append(sum(current_resd) / len(current_resd)) + + # Calculate the average PLDDT value for the current file + if plddt_values: + avg_plddt = sum(plddt_values) / len(plddt_values) + averages.append(round(avg_plddt, 3)) + else: + averages.append(0.0) + + if generate_tsv == "y": + output_file = f"{pdb_file.replace('.pdb', '')}_plddt.tsv" + with open(output_file, "w") as outfile: + outfile.write(" ".join(map(str, plddt_values)) + "\n") + output_lddt.append(output_file) + else: + plddt_values_string = " ".join(map(str, plddt_values)) + output_lddt.append(plddt_values_string) + + return output_lddt, averages + + +print("Starting...") + +version = "1.0.0" +parser = argparse.ArgumentParser() +parser.add_argument("--type", dest="in_type") +parser.add_argument( + "--generate_tsv", choices=["y", "n"], default="n", dest="generate_tsv" +) +parser.add_argument("--msa", dest="msa", required=True, nargs="+") +parser.add_argument("--pdb", dest="pdb", required=True, nargs="+") +parser.add_argument("--name", dest="name") +parser.add_argument("--output_dir", dest="output_dir") +parser.add_argument("--html_template", dest="html_template") +parser.add_argument("--version", action="version", version=f"{version}") +parser.set_defaults(output_dir="") +parser.set_defaults(in_type="comparison") +parser.set_defaults(name="") +args = parser.parse_args() + +lddt_data, lddt_averages = pdb_to_lddt(args.pdb, args.generate_tsv) + +generate_output(lddt_data, args.name, args.output_dir, args.generate_tsv, args.pdb) + +print("generating html report...") + +structures = args.pdb +# structures.sort() +aligned_structures = align_structures(structures) + +io = PDB.PDBIO() +ref_structure_path = "aligned_structure_0.pdb" +io.set_structure(aligned_structures[0]) +io.save(ref_structure_path) +aligned_structures[0] = ref_structure_path + +alphafold_template = open(args.html_template, "r").read() +alphafold_template = alphafold_template.replace("*sample_name*", args.name) +alphafold_template = alphafold_template.replace("*prog_name*", args.in_type) + +args_pdb_array_js = ( + "const MODELS = [" + ",\n".join([f'"{model}"' for model in structures]) + "];" +) +alphafold_template = alphafold_template.replace("const MODELS = [];", args_pdb_array_js) + +seq_cov_imgs = [] +for item in args.msa: + if item != "NO_FILE": + image_path = item + with open(image_path, "rb") as in_file: + encoded_image = base64.b64encode(in_file.read()).decode("utf-8") + seq_cov_imgs.append(f"data:image/png;base64,{encoded_image}") + +args_msa_array_js = ( + f"""const SEQ_COV_IMGS = [{", ".join([f'"{img}"' for img in seq_cov_imgs])}];""" +) +alphafold_template = alphafold_template.replace( + "const SEQ_COV_IMGS = [];", args_msa_array_js +) + +averages_js_array = f"const LDDT_AVERAGES = {lddt_averages};" +alphafold_template = alphafold_template.replace( + "const LDDT_AVERAGES = [];", averages_js_array +) + +i = 0 +for structure in aligned_structures: + alphafold_template = alphafold_template.replace( + f"*_data_ranked_{i}.pdb*", open(structure, "r").read().replace("\n", "\\n") + ) + i += 1 + +with open( + f"{args.output_dir}/{args.name + ('_' if args.name else '')}coverage_LDDT.html", + "r", +) as in_file: + lddt_html = in_file.read() + alphafold_template = alphafold_template.replace( + '
    ', lddt_html + ) + +with open( + f"{args.output_dir}/{args.name}_{args.in_type.lower()}_report.html", "w" +) as out_file: + out_file.write(alphafold_template) diff --git a/bin/generate_report.py b/bin/generate_report.py new file mode 100755 index 00000000..b6cfa390 --- /dev/null +++ b/bin/generate_report.py @@ -0,0 +1,399 @@ +#!/usr/bin/env python + +import os +import argparse +from matplotlib import pyplot as plt +from collections import OrderedDict +import base64 +import plotly.graph_objects as go +import re +from Bio import PDB + + +def generate_output_images(msa_path, plddt_data, name, out_dir, in_type, generate_tsv, pdb): + msa = [] + if in_type.lower() != "colabfold" and not msa_path.endswith("NO_FILE"): + with open(msa_path, "r") as in_file: + for line in in_file: + msa.append([int(x) for x in line.strip().split()]) + + seqid = [] + for sequence in msa: + matches = [ + 1.0 if first == other else 0.0 for first, other in zip(msa[0], sequence) + ] + seqid.append(sum(matches) / len(matches)) + + seqid_sort = sorted(range(len(seqid)), key=seqid.__getitem__) + + non_gaps = [] + for sequence in msa: + non_gaps.append( + [float(num != 21) if num != 21 else float("nan") for num in sequence] + ) + + sorted_non_gaps = [non_gaps[i] for i in seqid_sort] + final = [] + for sorted_seq, identity in zip( + sorted_non_gaps, [seqid[i] for i in seqid_sort] + ): + final.append( + [ + value * identity if not isinstance(value, str) else value + for value in sorted_seq + ] + ) + + # ################################################################## + plt.figure(figsize=(14, 14), dpi=100) + # ################################################################## + plt.title("Sequence coverage", fontsize=30, pad=36) + plt.imshow( + final, + interpolation="nearest", + aspect="auto", + cmap="rainbow_r", + vmin=0, + vmax=1, + origin="lower", + ) + + column_counts = [0] * len(msa[0]) + for col in range(len(msa[0])): + for row in msa: + if row[col] != 21: + column_counts[col] += 1 + + plt.plot(column_counts, color="black") + plt.xlim(-0.5, len(msa[0]) - 0.5) + plt.ylim(-0.5, len(msa) - 0.5) + + plt.tick_params(axis="both", which="both", labelsize=18) + + cbar = plt.colorbar() + cbar.set_label("Sequence identity to query", fontsize=24, labelpad=24) + cbar.ax.tick_params(labelsize=18) + plt.xlabel("Positions", fontsize=24, labelpad=24) + plt.ylabel("Sequences", fontsize=24, labelpad=36) + plt.savefig(f"{out_dir}/{name+('_' if name else '')}seq_coverage.png") + + # ################################################################## + + plddt_per_model = OrderedDict() + output_data = plddt_data + + if generate_tsv == "y": + for plddt_path in output_data: + with open(plddt_path, "r") as in_file: + plddt_per_model[os.path.basename(plddt_path)[:-4]] = [ + float(x) for x in in_file.read().strip().split() + ] + else: + for i, plddt_values_str in enumerate(output_data): + plddt_per_model[i] = [] + plddt_per_model[i] = [float(x) for x in plddt_values_str.strip().split()] + + fig = go.Figure() + for idx, (model_name, value_plddt) in enumerate(plddt_per_model.items()): + rank_label = os.path.splitext(pdb[idx])[0] + fig.add_trace( + go.Scatter( + x=list(range(len(value_plddt))), + y=value_plddt, + mode="lines", + name=rank_label, + text=[f"({i}, {value:.2f})" for i, value in enumerate(value_plddt)], + hoverinfo="text", + ) + ) + fig.update_layout( + title=dict(text="Predicted LDDT per position", x=0.5, xanchor="center"), + xaxis=dict( + title="Positions", showline=True, linecolor="black", gridcolor="WhiteSmoke" + ), + yaxis=dict( + title="Predicted LDDT", + range=[0, 100], + minallowed=0, + maxallowed=100, + showline=True, + linecolor="black", + gridcolor="WhiteSmoke", + ), + legend=dict(yanchor="bottom", y=0, xanchor="right", x=1.3), + plot_bgcolor="white", + width=600, + height=600, + modebar_remove=["toImage", "zoomIn", "zoomOut"], + ) + html_content = fig.to_html( + full_html=False, + include_plotlyjs="cdn", + config={"displayModeBar": True, "displaylogo": False, "scrollZoom": True}, + ) + + with open( + f"{out_dir}/{name+('_' if name else '')}coverage_LDDT.html", "w" + ) as out_file: + out_file.write(html_content) + + +def generate_plots(msa_path, plddt_paths, name, out_dir): + msa = [] + with open(msa_path, "r") as in_file: + for line in in_file: + msa.append([int(x) for x in line.strip().split()]) + + seqid = [] + for sequence in msa: + matches = [ + 1.0 if first == other else 0.0 for first, other in zip(msa[0], sequence) + ] + seqid.append(sum(matches) / len(matches)) + + seqid_sort = sorted(range(len(seqid)), key=seqid.__getitem__) + + non_gaps = [] + for sequence in msa: + non_gaps.append( + [float(num != 21) if num != 21 else float("nan") for num in sequence] + ) + + sorted_non_gaps = [non_gaps[i] for i in seqid_sort] + final = [] + for sorted_seq, identity in zip(sorted_non_gaps, [seqid[i] for i in seqid_sort]): + final.append( + [ + value * identity if not isinstance(value, str) else value + for value in sorted_seq + ] + ) + + # Plotting Sequence Coverage using Plotly + fig = go.Figure() + fig.add_trace( + go.Heatmap( + z=final, + colorscale="Rainbow", + zmin=0, + zmax=1, + ) + ) + fig.update_layout( + title="Sequence coverage", xaxis_title="Positions", yaxis_title="Sequences" + ) + # Save as interactive HTML instead of an image + fig.savefig(f"{out_dir}/{name+('_' if name else '')}seq_coverage.png") + + # Plotting Predicted LDDT per position using Plotly + plddt_per_model = OrderedDict() + plddt_paths.sort() + for plddt_path in plddt_paths: + with open(plddt_path, "r") as in_file: + plddt_per_model[os.path.basename(plddt_path)[:-4]] = [ + float(x) for x in in_file.read().strip().split() + ] + + i = 0 + for model_name, value_plddt in plddt_per_model.items(): + fig = go.Figure() + fig.add_trace( + go.Scatter( + x=list(range(len(value_plddt))), + y=value_plddt, + mode="lines", + name=model_name, + ) + ) + fig.update_layout(title="Predicted LDDT per Position") + fig.savefig(f"{out_dir}/{name+('_' if name else '')}coverage_LDDT_{i}.png") + i += 1 + + + +def align_structures(structures): + parser = PDB.PDBParser(QUIET=True) + structures = [ + parser.get_structure(f"Structure_{i}", pdb) for i, pdb in enumerate(structures) + ] + ref_structure = structures[0] + + common_atoms = set( + f"{atom.get_parent().get_id()[1]}-{atom.name}" + for atom in ref_structure.get_atoms() + ) + for i, structure in enumerate(structures[1:], start=1): + common_atoms = common_atoms.intersection( + set( + f"{atom.get_parent().get_id()[1]}-{atom.name}" + for atom in structure.get_atoms() + ) + ) + + ref_atoms = [ + atom + for atom in ref_structure.get_atoms() + if f"{atom.get_parent().get_id()[1]}-{atom.name}" in common_atoms + ] + # print(ref_atoms) + super_imposer = PDB.Superimposer() + aligned_structures = [structures[0]] # Include the reference structure in the list + + for i, structure in enumerate(structures[1:], start=1): + target_atoms = [ + atom + for atom in structure.get_atoms() + if f"{atom.get_parent().get_id()[1]}-{atom.name}" in common_atoms + ] + + super_imposer.set_atoms(ref_atoms, target_atoms) + super_imposer.apply(structure.get_atoms()) + + aligned_structure = f"aligned_structure_{i}.pdb" + io = PDB.PDBIO() + io.set_structure(structure) + io.save(aligned_structure) + aligned_structures.append(aligned_structure) + + return aligned_structures + + +def pdb_to_lddt(pdb_files, generate_tsv): + pdb_files_sorted = pdb_files + pdb_files_sorted.sort() + + output_lddt = [] + averages = [] + + for pdb_file in pdb_files_sorted: + plddt_values = [] + current_resd = [] + last = None + with open(pdb_file, "r") as infile: + for line in infile: + columns = line.split() + if len(columns) >= 11: + if last and last != columns[5]: + plddt_values.append(sum(current_resd) / len(current_resd)) + current_resd = [] + current_resd.append(float(columns[10])) + last = columns[5] + if len(current_resd) > 0: + plddt_values.append(sum(current_resd) / len(current_resd)) + + # Calculate the average PLDDT value for the current file + if plddt_values: + avg_plddt = sum(plddt_values) / len(plddt_values) + averages.append(round(avg_plddt, 3)) + else: + averages.append(0.0) + + if generate_tsv == "y": + output_file = f"{pdb_file.replace('.pdb', '')}_plddt.tsv" + with open(output_file, "w") as outfile: + outfile.write(" ".join(map(str, plddt_values)) + "\n") + output_lddt.append(output_file) + else: + plddt_values_string = " ".join(map(str, plddt_values)) + output_lddt.append(plddt_values_string) + + return output_lddt, averages + + +print("Starting...") + +version = "1.0.0" +model_name = { + "esmfold": "ESMFold", + "alphafold2": "AlphaFold2", + "colabfold": "ColabFold", +} + +parser = argparse.ArgumentParser() +parser.add_argument("--type", dest="in_type") +parser.add_argument( + "--generate_tsv", choices=["y", "n"], default="n", dest="generate_tsv" +) +parser.add_argument("--msa", dest="msa", default="NO_FILE") +parser.add_argument("--pdb", dest="pdb", required=True, nargs="+") +parser.add_argument("--name", dest="name") +parser.add_argument("--output_dir", dest="output_dir") +parser.add_argument("--html_template", dest="html_template") +parser.add_argument("--version", action="version", version=f"{version}") +parser.set_defaults(output_dir="") +parser.set_defaults(in_type="esmfold") +parser.set_defaults(name="") +args = parser.parse_args() + +lddt_data, lddt_averages = pdb_to_lddt(args.pdb, args.generate_tsv) + +generate_output_images( + args.msa, lddt_data, args.name, args.output_dir, args.in_type, args.generate_tsv, args.pdb +) +# generate_plots(args.msa, args.plddt, args.name, args.output_dir) + +print("generating html report...") +structures = args.pdb +structures.sort() +aligned_structures = align_structures(structures) + +io = PDB.PDBIO() +ref_structure_path = "aligned_structure_0.pdb" +io.set_structure(aligned_structures[0]) +io.save(ref_structure_path) +aligned_structures[0] = ref_structure_path + +proteinfold_template = open(args.html_template, "r").read() +proteinfold_template = proteinfold_template.replace("*sample_name*", args.name) +proteinfold_template = proteinfold_template.replace( + "*prog_name*", model_name[args.in_type.lower()] +) + +args_pdb_array_js = ",\n".join([f'"{model}"' for model in structures]) +proteinfold_template = re.sub( + r"const MODELS = \[.*?\];", # Match the existing MODELS array in HTML template + f"const MODELS = [\n {args_pdb_array_js}\n];", # Replace with the new array + proteinfold_template, + flags=re.DOTALL, +) + +averages_js_array = f"const LDDT_AVERAGES = {lddt_averages};" +proteinfold_template = proteinfold_template.replace( + "const LDDT_AVERAGES = [];", averages_js_array +) + +i = 0 +for structure in aligned_structures: + proteinfold_template = proteinfold_template.replace( + f"*_data_ranked_{i}.pdb*", open(structure, "r").read().replace("\n", "\\n") + ) + i += 1 + +if not args.msa.endswith("NO_FILE"): + image_path = ( + f"{args.output_dir}/{args.msa}" + if args.in_type.lower() == "colabfold" + else f"{args.output_dir}/{args.name + ('_' if args.name else '')}seq_coverage.png" + ) + with open(image_path, "rb") as in_file: + proteinfold_template = proteinfold_template.replace( + "seq_coverage.png", + f"data:image/png;base64,{base64.b64encode(in_file.read()).decode('utf-8')}", + ) +else: + pattern = r'
    .*?(.*?)*?
    \s*' + proteinfold_template = re.sub(pattern, "", proteinfold_template, flags=re.DOTALL) + +with open( + f"{args.output_dir}/{args.name + ('_' if args.name else '')}coverage_LDDT.html", + "r", +) as in_file: + lddt_html = in_file.read() + proteinfold_template = proteinfold_template.replace( + '
    ', lddt_html + ) + +with open( + f"{args.output_dir}/{args.name}_{args.in_type.lower()}_report.html", "w" +) as out_file: + out_file.write(proteinfold_template) diff --git a/conf/base.config b/conf/base.config index 69ad41e9..7e7a42dd 100644 --- a/conf/base.config +++ b/conf/base.config @@ -11,9 +11,9 @@ process { // TODO nf-core: Check the defaults for all processes - cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 6.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 1 * task.attempt } + memory = { 6.GB * task.attempt } + time = { 4.h * task.attempt } errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' } maxRetries = 1 @@ -27,30 +27,30 @@ process { // TODO nf-core: Customise requirements for specific processes. // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors withLabel:process_single { - cpus = { check_max( 1 , 'cpus' ) } - memory = { check_max( 6.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 1 } + memory = { 6.GB * task.attempt } + time = { 4.h * task.attempt } } withLabel:process_low { - cpus = { check_max( 2 * task.attempt, 'cpus' ) } - memory = { check_max( 12.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 2 * task.attempt } + memory = { 12.GB * task.attempt } + time = { 4.h * task.attempt } } withLabel:process_medium { - cpus = { check_max( 6 * task.attempt, 'cpus' ) } - memory = { check_max( 36.GB * task.attempt, 'memory' ) } - time = { check_max( 8.h * task.attempt, 'time' ) } + cpus = { 6 * task.attempt } + memory = { 36.GB * task.attempt } + time = { 8.h * task.attempt } } withLabel:process_high { - cpus = { check_max( 12 * task.attempt, 'cpus' ) } - memory = { check_max( 72.GB * task.attempt, 'memory' ) } - time = { check_max( 16.h * task.attempt, 'time' ) } + cpus = { 12 * task.attempt } + memory = { 72.GB * task.attempt } + time = { 16.h * task.attempt } } withLabel:process_long { - time = { check_max( 20.h * task.attempt, 'time' ) } + time = { 20.h * task.attempt } } withLabel:process_high_memory { - memory = { check_max( 200.GB * task.attempt, 'memory' ) } + memory = { 200.GB * task.attempt } } withLabel:error_ignore { errorStrategy = 'ignore' diff --git a/conf/dbs.config b/conf/dbs.config index 9fd0ec9a..d4e521a2 100644 --- a/conf/dbs.config +++ b/conf/dbs.config @@ -55,4 +55,9 @@ params { // Esmfold paths esmfold_params_path = "${params.esmfold_db}/*" + + // Foldseek databases paths + foldseek_db = null + foldseek_db_path = null + } diff --git a/conf/modules.config b/conf/modules.config index c12b372d..c56b11eb 100644 --- a/conf/modules.config +++ b/conf/modules.config @@ -41,7 +41,6 @@ process { enabled: false ] } - withName: 'MULTIQC' { ext.args = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' } publishDir = [ @@ -51,4 +50,13 @@ process { ] } + withName: 'FOLDSEEK_EASYSEARCH' { + ext.args = { params.foldseek_easysearch_arg ? "$params.foldseek_easysearch_arg" : "--format-mode 3" } + publishDir = [ + path: { "${params.outdir}/foldseek_easysearch" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + } diff --git a/conf/modules_alphafold2.config b/conf/modules_alphafold2.config index 4aae2d30..33b04c38 100644 --- a/conf/modules_alphafold2.config +++ b/conf/modules_alphafold2.config @@ -17,11 +17,18 @@ process { withName: 'GUNZIP|COMBINE_UNIPROT|DOWNLOAD_PDBMMCIF|ARIA2_PDB_SEQRES' { publishDir = [ - path: {"${params.outdir}/DBs/${params.mode}/${params.alphafold2_mode}"}, + path: {"${params.outdir}/DBs/alphafold2/${params.alphafold2_mode}"}, mode: 'symlink', saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, ] } + withName: 'NFCORE_PROTEINFOLD:ALPHAFOLD2:MULTIQC' { + publishDir = [ + path: { "${params.outdir}/multiqc" }, + mode: 'copy', + saveAs: { filename -> filename.equals('versions.yml') ? null : "alphafold2_$filename" } + ] + } } if (params.alphafold2_mode == 'standard') { @@ -33,7 +40,7 @@ if (params.alphafold2_mode == 'standard') { params.max_template_date ? "--max_template_date ${params.max_template_date}" : '' ].join(' ').trim() publishDir = [ - path: { "${params.outdir}/${params.mode}/${params.alphafold2_mode}" }, + path: { "${params.outdir}/alphafold2/${params.alphafold2_mode}" }, mode: 'copy', saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, ] @@ -47,7 +54,7 @@ if (params.alphafold2_mode == 'split_msa_prediction') { withName: 'RUN_ALPHAFOLD2_MSA' { ext.args = params.max_template_date ? "--max_template_date ${params.max_template_date}" : '' publishDir = [ - path: { "${params.outdir}/${params.mode}/${params.alphafold2_mode}" }, + path: { "${params.outdir}/alphafold2/${params.alphafold2_mode}" }, mode: 'copy', saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] @@ -57,7 +64,7 @@ if (params.alphafold2_mode == 'split_msa_prediction') { if(params.use_gpu) { accelerator = 1 } ext.args = params.use_gpu ? '--use_gpu_relax=true' : '--use_gpu_relax=false' publishDir = [ - path: { "${params.outdir}/${params.mode}/${params.alphafold2_mode}" }, + path: { "${params.outdir}/alphafold2/${params.alphafold2_mode}" }, mode: 'copy', saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] diff --git a/conf/modules_colabfold.config b/conf/modules_colabfold.config index a7a719b0..2efcfa01 100644 --- a/conf/modules_colabfold.config +++ b/conf/modules_colabfold.config @@ -10,6 +10,16 @@ ---------------------------------------------------------------------------------------- */ +process { + withName: 'NFCORE_PROTEINFOLD:COLABFOLD:MULTIQC' { + publishDir = [ + path: { "${params.outdir}/multiqc" }, + mode: 'copy', + saveAs: { filename -> filename.equals('versions.yml') ? null : "colabfold_$filename" } + ] + } +} + if (params.colabfold_server == 'webserver') { process { withName: 'COLABFOLD_BATCH' { @@ -20,7 +30,7 @@ if (params.colabfold_server == 'webserver') { params.host_url ? "--host-url ${params.host_url}" : '' ].join(' ').trim() publishDir = [ - path: { "${params.outdir}/${params.mode}/${params.colabfold_server}" }, + path: { "${params.outdir}/colabfold/${params.colabfold_server}" }, mode: 'copy', saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, pattern: '*.*' @@ -57,7 +67,7 @@ if (params.colabfold_server == 'local') { params.use_templates ? '--templates' : '' ].join(' ').trim() publishDir = [ - path: { "${params.outdir}/${params.mode}/${params.colabfold_server}" }, + path: { "${params.outdir}/colabfold/${params.colabfold_server}" }, mode: 'copy', saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, pattern: '*.*' diff --git a/conf/modules_esmfold.config b/conf/modules_esmfold.config index 81b3048f..d8356924 100644 --- a/conf/modules_esmfold.config +++ b/conf/modules_esmfold.config @@ -14,10 +14,19 @@ process { withName: 'RUN_ESMFOLD' { ext.args = {params.use_gpu ? '' : '--cpu-only'} publishDir = [ - path: { "${params.outdir}/${params.mode}" }, + path: { "${params.outdir}/esmfold" }, mode: 'copy', saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, pattern: '*.*' ] } + + withName: 'NFCORE_PROTEINFOLD:ESMFOLD:MULTIQC' { + publishDir = [ + path: { "${params.outdir}/multiqc" }, + mode: 'copy', + saveAs: { filename -> filename.equals('versions.yml') ? null : "esmfold_$filename" } + ] + } + } diff --git a/conf/test.config b/conf/test.config index e6e18ac2..ff9ced39 100644 --- a/conf/test.config +++ b/conf/test.config @@ -12,15 +12,19 @@ stubRun = true +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + params { config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - // Input data to test alphafold2 analysis mode = 'alphafold2' alphafold2_mode = 'standard' diff --git a/conf/test_alphafold_download.config b/conf/test_alphafold_download.config index 759ec61a..3393de33 100644 --- a/conf/test_alphafold_download.config +++ b/conf/test_alphafold_download.config @@ -12,15 +12,19 @@ stubRun = true +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + params { config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - // Input data to test alphafold2 analysis mode = 'alphafold2' alphafold2_mode = 'standard' diff --git a/conf/test_alphafold_split.config b/conf/test_alphafold_split.config index 47d4f5d6..d4fdd168 100644 --- a/conf/test_alphafold_split.config +++ b/conf/test_alphafold_split.config @@ -12,15 +12,19 @@ stubRun = true +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + params { config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - // Input data to test alphafold2 splitting MSA from prediction analysis mode = 'alphafold2' alphafold2_mode = 'split_msa_prediction' diff --git a/conf/test_colabfold_download.config b/conf/test_colabfold_download.config index 843fa07f..57ef9bf6 100644 --- a/conf/test_colabfold_download.config +++ b/conf/test_colabfold_download.config @@ -12,15 +12,19 @@ stubRun = true +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + params { config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - // Input data to test colabfold analysis mode = 'colabfold' colabfold_server = 'webserver' diff --git a/conf/test_colabfold_local.config b/conf/test_colabfold_local.config index b401c0aa..efa1ad63 100644 --- a/conf/test_colabfold_local.config +++ b/conf/test_colabfold_local.config @@ -10,15 +10,19 @@ stubRun = true +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + params { config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - // Input data to test colabfold with the colabfold webserver analysis mode = 'colabfold' colabfold_server = 'local' diff --git a/conf/test_colabfold_webserver.config b/conf/test_colabfold_webserver.config index 3cd74de7..8f56eae3 100644 --- a/conf/test_colabfold_webserver.config +++ b/conf/test_colabfold_webserver.config @@ -10,15 +10,19 @@ stubRun = true +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + params { config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - // Input data to test colabfold with a local server analysis mode = 'colabfold' colabfold_server = 'webserver' diff --git a/conf/test_esmfold.config b/conf/test_esmfold.config index ad984742..adf82da7 100644 --- a/conf/test_esmfold.config +++ b/conf/test_esmfold.config @@ -10,15 +10,19 @@ stubRun = true +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + params { config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - // Input data to test esmfold mode = 'esmfold' esmfold_db = "${projectDir}/assets/dummy_db_dir" diff --git a/dockerfiles/Dockerfile_nfcore-proteinfold_colabfold b/dockerfiles/Dockerfile_nfcore-proteinfold_colabfold index 2ac1f851..cb63d343 100644 --- a/dockerfiles/Dockerfile_nfcore-proteinfold_colabfold +++ b/dockerfiles/Dockerfile_nfcore-proteinfold_colabfold @@ -1,8 +1,8 @@ -FROM nvidia/cuda:11.4.3-cudnn8-runtime-ubuntu18.04 +FROM nvidia/cuda:12.6.2-cudnn-runtime-ubuntu24.04 LABEL authors="Athanasios Baltzis, Jose Espinosa-Carrasco, Leila Mansouri" \ title="nfcore/proteinfold_colabfold" \ - Version="1.1.0" \ + Version="1.2.0dev" \ description="Docker image containing all software requirements to run the COLABFOLD_BATCH module using the nf-core/proteinfold pipeline" ENV PATH="/localcolabfold/colabfold-conda/bin:$PATH" @@ -14,7 +14,7 @@ ENV PATH="/MMseqs2/build/bin:$PATH" RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \ build-essential \ - cuda-command-line-tools-11-4 \ + cuda-command-line-tools-12-6 \ git \ hmmer \ kalign \ @@ -25,13 +25,13 @@ RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \ && rm -rf /var/lib/apt/lists/* RUN cd / \ - && wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/82a3635/install_colabbatch_linux.sh \ + && wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/07e87ed/install_colabbatch_linux.sh \ && sed -i "/colabfold.download/d" install_colabbatch_linux.sh \ - && sed -i "s|cudatoolkit==.*\sopenmm|cudatoolkit==11.1.1 openmm|g" install_colabbatch_linux.sh \ && bash install_colabbatch_linux.sh -RUN /localcolabfold/colabfold-conda/bin/python3.10 -m pip install tensorflow-cpu==2.11.0 +## Updated +RUN /localcolabfold/colabfold-conda/bin/python3.10 -m pip install tensorflow-cpu==2.17.0 -#Silence download of the AlphaFold2 params +# #Silence download of the AlphaFold2 params RUN sed -i "s|download_alphafold_params(|#download_alphafold_params(|g" /localcolabfold/colabfold-conda/lib/python3.10/site-packages/colabfold/batch.py RUN sed -i "s|if args\.num_models|#if args\.num_models|g" /localcolabfold/colabfold-conda/lib/python3.10/site-packages/colabfold/batch.py diff --git a/docs/output.md b/docs/output.md index 29d2337c..9b9a8fb8 100644 --- a/docs/output.md +++ b/docs/output.md @@ -183,9 +183,9 @@ Below you can find an indicative example of the TSV file with the pLDDT scores p Output files - `multiqc` - - multiqc_report.html: A standalone HTML file that can be viewed in your web browser. - - multiqc_data/: Directory containing parsed statistics from the different tools used in the pipeline. - - multiqc_plots/: Directory containing static images from the report in various formats. + - `_multiqc_report.html`: A standalone HTML file that can be viewed in your web browser. + - `_multiqc_data/`: Directory containing parsed statistics from the different tools used in the pipeline. + - `_multiqc_plots/`: Directory containing static images from the report in various formats. diff --git a/docs/usage.md b/docs/usage.md index be725651..cc7e0b15 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -28,10 +28,10 @@ T1026,https://raw.githubusercontent.com/nf-core/test-datasets/proteinfold/testda The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 2 columns to match those defined in the table below: -| Column | Description | -| ---------- | --------------------------------------------------------------------------------------------------- | -| `sequence` | Custom sequence name. Spaces in sequence names are automatically converted to underscores (`_`). | -| `fasta` | Full path to fasta file for the provided sequence. File has to have the extension ".fasta" or "fa". | +| Column | Description | +| ------- | --------------------------------------------------------------------------------------------------- | +| `id` | Custom sequence name. Spaces in sequence names are automatically converted to underscores (`_`). | +| `fasta` | Full path to fasta file for the provided sequence. File has to have the extension ".fasta" or "fa". | An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline. @@ -39,6 +39,8 @@ An [example samplesheet](../assets/samplesheet.csv) has been provided with the p The typical commands for running the pipeline on AlphaFold2, Colabfold and ESMFold modes are shown below. +> You can run any combination of the models by providing them to the `--mode` parameter separated by a comma. For example: `--mode alphafold2,esmfold,colabfold` will run the three models in parallel. + AlphaFold2 regular can be run using this command: ```bash @@ -447,9 +449,9 @@ The above pipeline run specified with a params file in yaml format: nextflow run nf-core/proteinfold -profile docker -params-file params.yaml ``` -with `params.yaml` containing: +with: -```yaml +```yaml title="params.yaml" input: './samplesheet.csv' outdir: './results/' genome: 'GRCh37' diff --git a/main.nf b/main.nf index d6da0f09..e22337c0 100644 --- a/main.nf +++ b/main.nf @@ -9,21 +9,21 @@ ---------------------------------------------------------------------------------------- */ -nextflow.enable.dsl = 2 - /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ IMPORT FUNCTIONS / MODULES / SUBWORKFLOWS / WORKFLOWS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -if (params.mode == "alphafold2") { +if (params.mode.toLowerCase().split(",").contains("alphafold2")) { include { PREPARE_ALPHAFOLD2_DBS } from './subworkflows/local/prepare_alphafold2_dbs' include { ALPHAFOLD2 } from './workflows/alphafold2' -} else if (params.mode == "colabfold") { +} +if (params.mode.toLowerCase().split(",").contains("colabfold")) { include { PREPARE_COLABFOLD_DBS } from './subworkflows/local/prepare_colabfold_dbs' include { COLABFOLD } from './workflows/colabfold' -} else if (params.mode == "esmfold") { +} +if (params.mode.toLowerCase().split(",").contains("esmfold")) { include { PREPARE_ESMFOLD_DBS } from './subworkflows/local/prepare_esmfold_dbs' include { ESMFOLD } from './workflows/esmfold' } @@ -33,6 +33,10 @@ include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nf include { getColabfoldAlphafold2Params } from './subworkflows/local/utils_nfcore_proteinfold_pipeline' include { getColabfoldAlphafold2ParamsPath } from './subworkflows/local/utils_nfcore_proteinfold_pipeline' +include { GENERATE_REPORT } from './modules/local/generate_report' +include { COMPARE_STRUCTURES } from './modules/local/compare_structures' +include { FOLDSEEK_EASYSEARCH } from './modules/nf-core/foldseek/easysearch/main' + /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ COLABFOLD PARAMETER VALUES @@ -53,14 +57,19 @@ params.colabfold_alphafold2_params_path = getColabfoldAlphafold2ParamsPath() // workflow NFCORE_PROTEINFOLD { - main: - ch_multiqc = Channel.empty() - ch_versions = Channel.empty() + take: + samplesheet // channel: samplesheet read in from --input + main: + ch_samplesheet = samplesheet + ch_multiqc = Channel.empty() + ch_versions = Channel.empty() + ch_report_input = Channel.empty() + requested_modes = params.mode.toLowerCase().split(",") // // WORKFLOW: Run alphafold2 // - if(params.mode == "alphafold2") { + if(requested_modes.contains("alphafold2")) { // // SUBWORKFLOW: Prepare Alphafold2 DBs // @@ -96,6 +105,7 @@ workflow NFCORE_PROTEINFOLD { // WORKFLOW: Run nf-core/alphafold2 workflow // ALPHAFOLD2 ( + ch_samplesheet, ch_versions, params.full_dbs, params.alphafold2_mode, @@ -113,12 +123,15 @@ workflow NFCORE_PROTEINFOLD { ) ch_multiqc = ALPHAFOLD2.out.multiqc_report ch_versions = ch_versions.mix(ALPHAFOLD2.out.versions) + ch_report_input = ch_report_input.mix( + ALPHAFOLD2.out.pdb.join(ALPHAFOLD2.out.msa).map{it[0]["model"] = "alphafold2"; it} + ) } // // WORKFLOW: Run colabfold // - else if(params.mode == "colabfold") { + if(requested_modes.contains("colabfold")) { // // SUBWORKFLOW: Prepare Colabfold DBs // @@ -139,6 +152,7 @@ workflow NFCORE_PROTEINFOLD { // WORKFLOW: Run nf-core/colabfold workflow // COLABFOLD ( + ch_samplesheet, ch_versions, params.colabfold_model_preset, PREPARE_COLABFOLD_DBS.out.params, @@ -148,12 +162,19 @@ workflow NFCORE_PROTEINFOLD { ) ch_multiqc = COLABFOLD.out.multiqc_report ch_versions = ch_versions.mix(COLABFOLD.out.versions) + ch_report_input = ch_report_input.mix( + COLABFOLD + .out + .pdb + .join(COLABFOLD.out.msa) + .map { it[0]["model"] = "colabfold"; it } + ) } // // WORKFLOW: Run esmfold // - else if(params.mode == "esmfold") { + if(requested_modes.contains("esmfold")) { // // SUBWORKFLOW: Prepare esmfold DBs // @@ -170,16 +191,91 @@ workflow NFCORE_PROTEINFOLD { // WORKFLOW: Run nf-core/esmfold workflow // ESMFOLD ( + ch_samplesheet, ch_versions, PREPARE_ESMFOLD_DBS.out.params, params.num_recycles_esmfold ) ch_multiqc = ESMFOLD.out.multiqc_report ch_versions = ch_versions.mix(ESMFOLD.out.versions) + ch_report_input = ch_report_input.mix( + ESMFOLD.out.pdb.combine(Channel.fromPath("$projectDir/assets/NO_FILE")).map{it[0]["model"] = "esmfold"; it} + ) } + // + // POST PROCESSING: generate visulaisation reports + // + if (!params.skip_visualisation){ + GENERATE_REPORT( + ch_report_input.map{[it[0], it[1]]}, + ch_report_input.map{[it[0], it[2]]}, + ch_report_input.map{it[0].model}, + Channel.fromPath("$projectDir/assets/proteinfold_template.html", checkIfExists: true).first() + ) + ch_versions = ch_versions.mix(GENERATE_REPORT.out.versions) + //GENERATE_REPORT.out.sequence_coverage.view() + if (requested_modes.size() > 1){ + ch_report_input.filter{it[0]["model"] == "esmfold"} + .map{[it[0]["id"], it[0], it[1], it[2]]} + .set{ch_comparision_report_files} + + if (requested_modes.contains("alphafold2")) { + ch_comparision_report_files = ch_comparision_report_files.mix( + ALPHAFOLD2 + .out + .main_pdb + .map{[it[0]["id"], it[0], it[1]]} + .join(GENERATE_REPORT.out.sequence_coverage + .filter{it[0]["model"] == "alphafold2"} + .map{[it[0]["id"], it[1]]}, remainder:true + ) + ) + } + if (requested_modes.contains("colabfold")) { + ch_comparision_report_files = ch_comparision_report_files.mix( + COLABFOLD + .out + .main_pdb + .map{[it[0]["id"], it[0], it[1]]} + .join(COLABFOLD.out.msa + .map{[it[0]["id"], it[1]]}, + remainder:true + ) + ) + } + + ch_comparision_report_files + .groupTuple(by: [0], size: requested_modes.size()) + .set{ch_comparision_report_input} + + COMPARE_STRUCTURES( + ch_comparision_report_input.map{it[1][0]["models"] = params.mode.toLowerCase(); [it[1][0], it[2]]}, + ch_comparision_report_input.map{it[1][0]["models"] = params.mode.toLowerCase(); [it[1][0], it[3]]}, + Channel.fromPath("$projectDir/assets/comparison_template.html", checkIfExists: true).first() + ) + ch_versions = ch_versions.mix(COMPARE_STRUCTURES.out.versions) + } + } + + if (params.foldseek_search == "easysearch"){ + ch_foldseek_db = channel.value([["id": params.foldseek_db], + file(params.foldseek_db_path, + checkIfExists: true)]) + + FOLDSEEK_EASYSEARCH( + ch_report_input + .map{ + if (it[0].model == "esmfold") + [it[0], it[1]] + else + [it[0], it[1][0]] + }, + ch_foldseek_db + ) + } + emit: - multiqc_report = ch_multiqc // channel: /path/to/multiqc_report.html - versions = ch_versions // channel: [version1, version2, ...] + multiqc_report = ch_multiqc } /* @@ -196,17 +292,19 @@ workflow { // PIPELINE_INITIALISATION ( params.version, - params.help, params.validate_params, params.monochrome_logs, args, - params.outdir + params.outdir, + params.input ) // // WORKFLOW: Run main workflow // - NFCORE_PROTEINFOLD () + NFCORE_PROTEINFOLD ( + PIPELINE_INITIALISATION.out.samplesheet + ) // // SUBWORKFLOW: Run completion tasks diff --git a/modules.json b/modules.json index cdb36bb6..2debf250 100644 --- a/modules.json +++ b/modules.json @@ -11,6 +11,12 @@ "installed_by": ["modules"], "patch": "modules/nf-core/aria2/aria2.diff" }, + "foldseek/easysearch": { + "branch": "master", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", + "installed_by": ["modules"], + "patch": "modules/nf-core/foldseek/easysearch/foldseek-easysearch.diff" + }, "gunzip": { "branch": "master", "git_sha": "5c460c5a4736974abde2843294f35307ee2b0e5e", @@ -44,17 +50,17 @@ "nf-core": { "utils_nextflow_pipeline": { "branch": "master", - "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", + "git_sha": "3aa0aec1d52d492fe241919f0c6100ebf0074082", "installed_by": ["subworkflows"] }, "utils_nfcore_pipeline": { "branch": "master", - "git_sha": "92de218a329bfc9a9033116eb5f65fd270e72ba3", + "git_sha": "1b6b9a3338d011367137808b49b923515080e3ba", "installed_by": ["subworkflows"] }, - "utils_nfvalidation_plugin": { + "utils_nfschema_plugin": { "branch": "master", - "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", + "git_sha": "bbd5a41f4535a8defafe6080e00ea74c45f4f96c", "installed_by": ["subworkflows"] } } diff --git a/modules/local/colabfold_batch.nf b/modules/local/colabfold_batch.nf index 5dab51fb..8710f9eb 100644 --- a/modules/local/colabfold_batch.nf +++ b/modules/local/colabfold_batch.nf @@ -7,7 +7,7 @@ process COLABFOLD_BATCH { error("Local COLABFOLD_BATCH module does not support Conda. Please use Docker / Singularity / Podman instead.") } - container "nf-core/proteinfold_colabfold:1.1.1" + container "nf-core/proteinfold_colabfold:dev" input: tuple val(meta), path(fasta) @@ -18,9 +18,11 @@ process COLABFOLD_BATCH { val numRec output: - path ("*") , emit: pdb - path ("*_mqc.png") , emit: multiqc - path "versions.yml", emit: versions + tuple val(meta), path ("${meta.id}_colabfold.pdb"), emit: main_pdb + tuple val(meta), path ("*_relaxed_rank_*.pdb"), emit: pdb + tuple val(meta), path ("*_coverage.png") , emit: msa + tuple val(meta), path ("*_mqc.png") , emit: multiqc + path "versions.yml" , emit: versions when: task.ext.when == null || task.ext.when @@ -40,6 +42,7 @@ process COLABFOLD_BATCH { \$PWD for i in `find *_relaxed_rank_001*.pdb`; do cp \$i `echo \$i | sed "s|_relaxed_rank_|\t|g" | cut -f1`"_colabfold.pdb"; done for i in `find *.png -maxdepth 0`; do cp \$i \${i%'.png'}_mqc.png; done + cp *_relaxed_rank_001*.pdb ${meta.id}_colabfold.pdb cat <<-END_VERSIONS > versions.yml "${task.process}": @@ -50,8 +53,13 @@ process COLABFOLD_BATCH { stub: def VERSION = '1.5.2' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. """ - touch ./"${fasta.baseName}"_colabfold.pdb - touch ./"${fasta.baseName}"_mqc.png + touch ./"${meta.id}"_colabfold.pdb + touch ./"${meta.id}"_mqc.png + touch ./${meta.id}_relaxed_rank_01.pdb + touch ./${meta.id}_relaxed_rank_02.pdb + touch ./${meta.id}_relaxed_rank_03.pdb + touch ./${meta.id}_coverage.png + touch ./${meta.id}_scores_rank.json cat <<-END_VERSIONS > versions.yml "${task.process}": diff --git a/modules/local/compare_structures.nf b/modules/local/compare_structures.nf new file mode 100644 index 00000000..756d2525 --- /dev/null +++ b/modules/local/compare_structures.nf @@ -0,0 +1,51 @@ +process COMPARE_STRUCTURES { + tag "$meta.id" + label 'process_single' + + conda "bioconda::multiqc:1.21" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'oras://community.wave.seqera.io/library/pip_biopython_matplotlib_plotly:e865101a15ad0014' : + 'community.wave.seqera.io/library/pip_biopython_matplotlib_plotly:4d51afeb4bb75495' }" + + input: + tuple val(meta), path(pdb) + tuple val(meta_msa), path(msa) + path(template) + + output: + tuple val(meta), path ("*report.html"), emit: report + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + + """ + generate_comparison_report.py \\ + --msa ${msa.join(' ')} \\ + --pdb ${pdb.join(' ')} \\ + --html_template ${template} \\ + --output_dir ./ \\ + --name ${meta.id} \\ + $args + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python3 --version | sed 's/Python //g') + generate_comparison_report.py: \$(python3 --version) + END_VERSIONS + """ + + stub: + """ + touch test_alphafold2_report.html + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python3 --version | sed 's/Python //g') + generate_comparison_report.py: \$(python3 --version) + END_VERSIONS + """ +} diff --git a/modules/local/generate_report.nf b/modules/local/generate_report.nf new file mode 100644 index 00000000..3bfdc04e --- /dev/null +++ b/modules/local/generate_report.nf @@ -0,0 +1,57 @@ +process GENERATE_REPORT { + tag "$meta.id-$meta.model" + label 'process_single' + + conda "bioconda::multiqc:1.21" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'oras://community.wave.seqera.io/library/pip_biopython_matplotlib_plotly:e865101a15ad0014' : + 'community.wave.seqera.io/library/pip_biopython_matplotlib_plotly:4d51afeb4bb75495' }" + + input: + tuple val(meta), path(pdb) + tuple val(meta_msa), path(msa) + val(output_type) + path(template) + + output: + tuple val(meta), path ("*report.html"), emit: report + tuple val(meta), path ("*seq_coverage.png"), optional: true, emit: sequence_coverage + tuple val(meta), path ("*_LDDT.html"), emit: plddt + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + + """ + generate_report.py \\ + --type ${output_type} \\ + --msa ${msa} \\ + --pdb ${pdb.join(' ')} \\ + --html_template ${template} \\ + --output_dir ./ \\ + --name ${meta.id} \\ + $args \\ + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python3 --version | sed 's/Python //g') + generate_report.py: \$(python3 --version) + END_VERSIONS + """ + + stub: + """ + touch test_alphafold2_report.html + touch test_seq_coverage.png + touch test_LDDT.html + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python3 --version | sed 's/Python //g') + generate_report.py: \$(python3 --version) + END_VERSIONS + """ +} diff --git a/modules/local/mmseqs_colabfoldsearch.nf b/modules/local/mmseqs_colabfoldsearch.nf index c2140c5b..c6a2c9b0 100644 --- a/modules/local/mmseqs_colabfoldsearch.nf +++ b/modules/local/mmseqs_colabfoldsearch.nf @@ -7,7 +7,7 @@ process MMSEQS_COLABFOLDSEARCH { error("Local MMSEQS_COLABFOLDSEARCH module does not support Conda. Please use Docker / Singularity / Podman instead.") } - container "nf-core/proteinfold_colabfold:1.1.1" + container "nf-core/proteinfold_colabfold:dev" input: tuple val(meta), path(fasta) diff --git a/modules/local/run_alphafold2.nf b/modules/local/run_alphafold2.nf index cb3527d3..f41636a9 100644 --- a/modules/local/run_alphafold2.nf +++ b/modules/local/run_alphafold2.nf @@ -10,7 +10,7 @@ process RUN_ALPHAFOLD2 { error("Local RUN_ALPHAFOLD2 module does not support Conda. Please use Docker / Singularity / Podman instead.") } - container "nf-core/proteinfold_alphafold2_standard:1.1.1" + container "nf-core/proteinfold_alphafold2_standard:dev" input: tuple val(meta), path(fasta) @@ -29,7 +29,10 @@ process RUN_ALPHAFOLD2 { output: path ("${fasta.baseName}*") - path "*_mqc.tsv", emit: multiqc + tuple val(meta), path ("${meta.id}_alphafold2.pdb"), emit: main_pdb + tuple val(meta), path ("${fasta.baseName}/ranked*pdb"), emit: pdb + tuple val(meta), path ("${fasta.baseName}/*_msa.tsv") , emit: msa + tuple val(meta), path ("*_mqc.tsv") , emit: multiqc path "versions.yml", emit: versions when: @@ -63,7 +66,7 @@ process RUN_ALPHAFOLD2 { --random_seed=53343 \ $args - cp "${fasta.baseName}"/ranked_0.pdb ./"${fasta.baseName}".alphafold.pdb + cp "${fasta.baseName}"/ranked_0.pdb ./"${meta.id}"_alphafold2.pdb cd "${fasta.baseName}" awk '{print \$6"\\t"\$11}' ranked_0.pdb | uniq > ranked_0_plddt.tsv for i in 1 2 3 4 @@ -71,7 +74,10 @@ process RUN_ALPHAFOLD2 { done paste ranked_0_plddt.tsv ranked_1_plddt.tsv ranked_2_plddt.tsv ranked_3_plddt.tsv ranked_4_plddt.tsv > plddt.tsv echo -e Positions"\\t"rank_0"\\t"rank_1"\\t"rank_2"\\t"rank_3"\\t"rank_4 > header.tsv - cat header.tsv plddt.tsv > ../"${fasta.baseName}"_plddt_mqc.tsv + cat header.tsv plddt.tsv > ../"${meta.id}"_plddt_mqc.tsv + + extract_output.py --name ${meta.id} \\ + --pkls features.pkl cd .. cat <<-END_VERSIONS > versions.yml @@ -82,12 +88,19 @@ process RUN_ALPHAFOLD2 { stub: """ - touch ./"${fasta.baseName}".alphafold.pdb - touch ./"${fasta.baseName}"_mqc.tsv + touch ./"${meta.id}"_alphafold2.pdb + touch ./"${meta.id}"_mqc.tsv + mkdir "${fasta.baseName}" + touch "${fasta.baseName}/ranked_0.pdb" + touch "${fasta.baseName}/ranked_1.pdb" + touch "${fasta.baseName}/ranked_2.pdb" + touch "${fasta.baseName}/ranked_3.pdb" + touch "${fasta.baseName}/ranked_4.pdb" + touch "${fasta.baseName}/${fasta.baseName}_msa.tsv" cat <<-END_VERSIONS > versions.yml "${task.process}": - awk: \$(gawk --version| head -1 | sed 's/GNU Awk //; s/, API:.*//') + python: \$(python3 --version | sed 's/Python //g') END_VERSIONS """ } diff --git a/modules/local/run_alphafold2_msa.nf b/modules/local/run_alphafold2_msa.nf index fdc67e88..a4f00676 100644 --- a/modules/local/run_alphafold2_msa.nf +++ b/modules/local/run_alphafold2_msa.nf @@ -10,7 +10,7 @@ process RUN_ALPHAFOLD2_MSA { error("Local RUN_ALPHAFOLD2_MSA module does not support Conda. Please use Docker / Singularity / Podman instead.") } - container "nf-core/proteinfold_alphafold2_msa:1.1.1" + container "nf-core/proteinfold_alphafold2_msa:dev" input: tuple val(meta), path(fasta) @@ -29,7 +29,7 @@ process RUN_ALPHAFOLD2_MSA { output: path ("${fasta.baseName}*") - path ("${fasta.baseName}.features.pkl"), emit: features + tuple val(meta), path ("${fasta.baseName}.features.pkl"), emit: features path "versions.yml" , emit: versions when: diff --git a/modules/local/run_alphafold2_pred.nf b/modules/local/run_alphafold2_pred.nf index 92b5d2a5..d5e1b9b5 100644 --- a/modules/local/run_alphafold2_pred.nf +++ b/modules/local/run_alphafold2_pred.nf @@ -10,7 +10,7 @@ process RUN_ALPHAFOLD2_PRED { error("Local RUN_ALPHAFOLD2_PRED module does not support Conda. Please use Docker / Singularity / Podman instead.") } - container "nf-core/proteinfold_alphafold2_split:1.1.1" + container "nf-core/proteinfold_alphafold2_split:dev" input: tuple val(meta), path(fasta) @@ -26,11 +26,14 @@ process RUN_ALPHAFOLD2_PRED { path ('uniref90/*') path ('pdb_seqres/*') path ('uniprot/*') - path msa + tuple val(meta), path(msa) output: path ("${fasta.baseName}*") - path "*_mqc.tsv", emit: multiqc + tuple val(meta), path ("${meta.id}_alphafold2.pdb"), emit: main_pdb + tuple val(meta), path ("${fasta.baseName}/ranked*pdb"), emit: pdb + tuple val(meta), path ("*_msa.tsv"), emit: msa + tuple val(meta), path ("*_mqc.tsv"), emit: multiqc path "versions.yml", emit: versions when: @@ -49,7 +52,7 @@ process RUN_ALPHAFOLD2_PRED { --msa_path=${msa} \ $args - cp "${fasta.baseName}"/ranked_0.pdb ./"${fasta.baseName}".alphafold.pdb + cp "${fasta.baseName}"/ranked_0.pdb ./"${meta.id}"_alphafold2.pdb cd "${fasta.baseName}" awk '{print \$6"\\t"\$11}' ranked_0.pdb | uniq > ranked_0_plddt.tsv for i in 1 2 3 4 @@ -57,9 +60,11 @@ process RUN_ALPHAFOLD2_PRED { done paste ranked_0_plddt.tsv ranked_1_plddt.tsv ranked_2_plddt.tsv ranked_3_plddt.tsv ranked_4_plddt.tsv > plddt.tsv echo -e Positions"\\t"rank_0"\\t"rank_1"\\t"rank_2"\\t"rank_3"\\t"rank_4 > header.tsv - cat header.tsv plddt.tsv > ../"${fasta.baseName}"_plddt_mqc.tsv - cd .. + cat header.tsv plddt.tsv > ../"${meta.id}"_plddt_mqc.tsv + cd .. + extract_output.py --name ${meta.id} \\ + --pkls ${msa} cat <<-END_VERSIONS > versions.yml "${task.process}": python: \$(python3 --version | sed 's/Python //g') @@ -68,12 +73,19 @@ process RUN_ALPHAFOLD2_PRED { stub: """ - touch ./"${fasta.baseName}".alphafold.pdb - touch ./"${fasta.baseName}"_mqc.tsv + touch ./"${meta.id}"_alphafold2.pdb + touch ./"${meta.id}"_mqc.tsv + mkdir "${fasta.baseName}" + touch "${fasta.baseName}/ranked_0.pdb" + touch "${fasta.baseName}/ranked_1.pdb" + touch "${fasta.baseName}/ranked_2.pdb" + touch "${fasta.baseName}/ranked_3.pdb" + touch "${fasta.baseName}/ranked_4.pdb" + touch ${meta.id}_msa.tsv cat <<-END_VERSIONS > versions.yml "${task.process}": - awk: \$(gawk --version| head -1 | sed 's/GNU Awk //; s/, API:.*//') + python: \$(python3 --version | sed 's/Python //g') END_VERSIONS """ } diff --git a/modules/local/run_esmfold.nf b/modules/local/run_esmfold.nf index 66c5bbc7..83397be1 100644 --- a/modules/local/run_esmfold.nf +++ b/modules/local/run_esmfold.nf @@ -6,7 +6,7 @@ process RUN_ESMFOLD { error("Local RUN_ESMFOLD module does not support Conda. Please use Docker / Singularity / Podman instead.") } - container "nf-core/proteinfold_esmfold:1.1.1" + container "nf-core/proteinfold_esmfold:dev" input: tuple val(meta), path(fasta) @@ -14,8 +14,8 @@ process RUN_ESMFOLD { val numRec output: - path ("${fasta.baseName}*.pdb"), emit: pdb - path ("${fasta.baseName}_plddt_mqc.tsv"), emit: multiqc + tuple val(meta), path ("${meta.id}_esmfold.pdb") , emit: pdb + tuple val(meta), path ("${meta.id}_plddt_mqc.tsv"), emit: multiqc path "versions.yml", emit: versions when: @@ -33,9 +33,12 @@ process RUN_ESMFOLD { --num-recycles ${numRec} \ $args - awk '{print \$2"\\t"\$3"\\t"\$4"\\t"\$6"\\t"\$11}' "${fasta.baseName}"*.pdb | grep -v 'N/A' | uniq > plddt.tsv + mv *.pdb tmp.pdb + mv tmp.pdb ${meta.id}_esmfold.pdb + + awk '{print \$2"\\t"\$3"\\t"\$4"\\t"\$6"\\t"\$11}' ${meta.id}_esmfold.pdb | grep -v 'N/A' | uniq > plddt.tsv echo -e Atom_serial_number"\\t"Atom_name"\\t"Residue_name"\\t"Residue_sequence_number"\\t"pLDDT > header.tsv - cat header.tsv plddt.tsv > "${fasta.baseName}"_plddt_mqc.tsv + cat header.tsv plddt.tsv > ${meta.id}_plddt_mqc.tsv cat <<-END_VERSIONS > versions.yml "${task.process}": @@ -46,8 +49,8 @@ process RUN_ESMFOLD { stub: def VERSION = '1.0.3' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. """ - touch ./"${fasta.baseName}".pdb - touch ./"${fasta.baseName}"_plddt_mqc.tsv + touch ./${meta.id}_esmfold.pdb + touch ./${meta.id}_plddt_mqc.tsv cat <<-END_VERSIONS > versions.yml "${task.process}": diff --git a/modules/nf-core/aria2/environment.yml b/modules/nf-core/aria2/environment.yml index 5dc58a07..50c54a6e 100644 --- a/modules/nf-core/aria2/environment.yml +++ b/modules/nf-core/aria2/environment.yml @@ -2,6 +2,5 @@ name: aria2 channels: - conda-forge - bioconda - - defaults dependencies: - conda-forge::aria2=1.36.0 diff --git a/modules/nf-core/foldseek/easysearch/environment.yml b/modules/nf-core/foldseek/easysearch/environment.yml new file mode 100644 index 00000000..c38d477d --- /dev/null +++ b/modules/nf-core/foldseek/easysearch/environment.yml @@ -0,0 +1,6 @@ +name: foldseek_easysearch +channels: + - conda-forge + - bioconda +dependencies: + - bioconda::foldseek=9.427df8a diff --git a/modules/nf-core/foldseek/easysearch/foldseek-easysearch.diff b/modules/nf-core/foldseek/easysearch/foldseek-easysearch.diff new file mode 100644 index 00000000..81e91964 --- /dev/null +++ b/modules/nf-core/foldseek/easysearch/foldseek-easysearch.diff @@ -0,0 +1,42 @@ +Changes in module 'nf-core/foldseek/easysearch' +--- modules/nf-core/foldseek/easysearch/main.nf ++++ modules/nf-core/foldseek/easysearch/main.nf +@@ -12,7 +12,8 @@ + tuple val(meta_db), path(db) + + output: +- tuple val(meta), path("${meta.id}.m8"), emit: aln ++ tuple val(meta), path("${meta.id}.m8"), emit: aln, optional: true ++ tuple val(meta), path("${meta.id}_${meta.model.toLowerCase()}_foldseek.html"), emit: report, optional: true + path "versions.yml" , emit: versions + + when: +@@ -21,12 +22,17 @@ + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" ++ def output_file = "${prefix}.m8" ++ if (args.contains("--format-mode 3")){ ++ output_file = "${meta.id}_${meta.model.toLowerCase()}_foldseek.html" ++ } ++ + """ + foldseek \\ + easy-search \\ + ${pdb} \\ + ${db}/${meta_db.id} \\ +- ${prefix}.m8 \\ ++ ${output_file} \\ + tmpFolder \\ + ${args} + +@@ -42,6 +48,7 @@ + + """ + touch ${prefix}.m8 ++ touch ${prefix}.html + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + +************************************************************ diff --git a/modules/nf-core/foldseek/easysearch/main.nf b/modules/nf-core/foldseek/easysearch/main.nf new file mode 100644 index 00000000..b8a431b0 --- /dev/null +++ b/modules/nf-core/foldseek/easysearch/main.nf @@ -0,0 +1,58 @@ +process FOLDSEEK_EASYSEARCH { + tag "$meta.id" + label 'process_medium' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/foldseek:9.427df8a--pl5321hb365157_0': + 'biocontainers/foldseek:9.427df8a--pl5321hb365157_0' }" + + input: + tuple val(meta) , path(pdb) + tuple val(meta_db), path(db) + + output: + tuple val(meta), path("${meta.id}.m8"), emit: aln, optional: true + tuple val(meta), path("${meta.id}_${meta.model.toLowerCase()}_foldseek.html"), emit: report, optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def output_file = "${prefix}.m8" + if (args.contains("--format-mode 3")){ + output_file = "${meta.id}_${meta.model.toLowerCase()}_foldseek.html" + } + + """ + foldseek \\ + easy-search \\ + ${pdb} \\ + ${db}/${meta_db.id} \\ + ${output_file} \\ + tmpFolder \\ + ${args} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + foldseek: \$(foldseek --help | grep Version | sed 's/.*Version: //') + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + + """ + touch ${prefix}.m8 + touch ${prefix}.html + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + foldseek: \$(foldseek --help | grep Version | sed 's/.*Version: //') + END_VERSIONS + """ +} diff --git a/modules/nf-core/foldseek/easysearch/meta.yml b/modules/nf-core/foldseek/easysearch/meta.yml new file mode 100644 index 00000000..c5482137 --- /dev/null +++ b/modules/nf-core/foldseek/easysearch/meta.yml @@ -0,0 +1,58 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/yaml-schema.json +name: "foldseek_easysearch" +description: Search for protein structural hits against a foldseek database of protein + structures +keywords: + - protein + - structure + - comparisons +tools: + - "foldseek": + description: "Foldseek: fast and accurate protein structure search" + homepage: "https://search.foldseek.com/search" + documentation: "https://github.com/steineggerlab/foldseek" + tool_dev_url: "https://github.com/steineggerlab/foldseek" + doi: "10.1038/s41587-023-01773-0" + licence: ["GPL v3"] + identifier: biotools:foldseek +input: + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test', single_end:false ]` + - pdb: + type: file + description: Protein structure(s) in PDB, mmCIF or mmJSON format to compare + against a foldseek database (also works with folder input) + pattern: "*.{pdb,mmcif,mmjson}" + - - meta_db: + type: map + description: | + Groovy Map containing sample information for the foldseek db + e.g. `[ id:'test', single_end:false ]` + - db: + type: directory + description: foldseek database from protein structures + pattern: "*" +output: + - aln: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'test', single_end:false ]` + - ${meta.id}.m8: + type: file + description: | + Structural comparisons file output + Query, Target, Identity, Alignment length, Mismatches, Gap openings, + Query start, Query end, Target start, Target end, E-value, Bit score + pattern: "*.{m8}" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@vagkaratzas" diff --git a/modules/nf-core/foldseek/easysearch/tests/main.nf.test b/modules/nf-core/foldseek/easysearch/tests/main.nf.test new file mode 100644 index 00000000..c71e2743 --- /dev/null +++ b/modules/nf-core/foldseek/easysearch/tests/main.nf.test @@ -0,0 +1,66 @@ +nextflow_process { + + name "Test Process FOLDSEEK_EASYSEARCH" + script "../main.nf" + process "FOLDSEEK_EASYSEARCH" + tag "modules" + tag "modules_nfcore" + tag "foldseek" + tag "foldseek/createdb" + tag "foldseek/easysearch" + + setup { + run("FOLDSEEK_CREATEDB") { + script "../../createdb/main.nf" + process { + """ + input[0] = [ [ id:'test_db' ], [ file(params.modules_testdata_base_path + 'proteomics/pdb/1tim.pdb', checkIfExists: true) ] ] + """ + } + } + } + + test("proteomics - pdb") { + + when { + process { + """ + input[0] = [ [ id:'test_search' ], [ file(params.modules_testdata_base_path + 'proteomics/pdb/8tim.pdb', checkIfExists: true) ] ] + input[1] = FOLDSEEK_CREATEDB.out.db + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert path(process.out.aln.get(0).get(1)).readLines().contains("8tim_A\t1tim_A\t0.967\t247\t8\t0\t1\t247\t1\t247\t1.152E-43\t1523") }, + { assert process.out.versions } + ) + } + + } + + test("proteomics - pdb -stub") { + + options "-stub" + + when { + process { + """ + input[0] = [ [ id:'test_search' ], [ file(params.modules_testdata_base_path + 'proteomics/pdb/8tim.pdb', checkIfExists: true) ] ] + input[1] = FOLDSEEK_CREATEDB.out.db + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + +} diff --git a/modules/nf-core/foldseek/easysearch/tests/main.nf.test.snap b/modules/nf-core/foldseek/easysearch/tests/main.nf.test.snap new file mode 100644 index 00000000..819648dd --- /dev/null +++ b/modules/nf-core/foldseek/easysearch/tests/main.nf.test.snap @@ -0,0 +1,31 @@ +{ + "proteomics - pdb -stub": { + "content": [ + { + "0": [ + [ + { + "id": "test_search" + }, + "test_search.m8:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + "versions.yml:md5,ddc75b2e08b63a7082ecad353073fd3b" + ], + "aln": [ + [ + { + "id": "test_search" + }, + "test_search.m8:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,ddc75b2e08b63a7082ecad353073fd3b" + ] + } + ], + "timestamp": "2024-07-02T13:55:57.915188646" + } +} \ No newline at end of file diff --git a/modules/nf-core/foldseek/easysearch/tests/tags.yml b/modules/nf-core/foldseek/easysearch/tests/tags.yml new file mode 100644 index 00000000..58db1c24 --- /dev/null +++ b/modules/nf-core/foldseek/easysearch/tests/tags.yml @@ -0,0 +1,3 @@ +foldseek/easysearch: + - modules/nf-core/foldseek/easysearch/** + - modules/nf-core/foldseek/createdb/** diff --git a/modules/nf-core/multiqc/environment.yml b/modules/nf-core/multiqc/environment.yml index ca39fb67..6f5b867b 100644 --- a/modules/nf-core/multiqc/environment.yml +++ b/modules/nf-core/multiqc/environment.yml @@ -1,7 +1,5 @@ -name: multiqc channels: - conda-forge - bioconda - - defaults dependencies: - - bioconda::multiqc=1.21 + - bioconda::multiqc=1.25.1 diff --git a/modules/nf-core/multiqc/main.nf b/modules/nf-core/multiqc/main.nf index bef8f50b..7d6cf081 100644 --- a/modules/nf-core/multiqc/main.nf +++ b/modules/nf-core/multiqc/main.nf @@ -3,14 +3,16 @@ process MULTIQC { conda "${moduleDir}/environment.yml" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/multiqc:1.21--pyhdfd78af_0' : - 'biocontainers/multiqc:1.21--pyhdfd78af_0' }" + 'https://depot.galaxyproject.org/singularity/multiqc:1.25.1--pyhdfd78af_0' : + 'biocontainers/multiqc:1.25.1--pyhdfd78af_0' }" input: path multiqc_files path(multiqc_config) path(extra_multiqc_config) path(multiqc_logo) + path(replace_names) + path(sample_names) output: path "*multiqc_report.html", emit: report @@ -23,16 +25,22 @@ process MULTIQC { script: def args = task.ext.args ?: '' + def prefix = task.ext.prefix ? "--filename ${task.ext.prefix}.html" : '' def config = multiqc_config ? "--config $multiqc_config" : '' def extra_config = extra_multiqc_config ? "--config $extra_multiqc_config" : '' - def logo = multiqc_logo ? /--cl-config 'custom_logo: "${multiqc_logo}"'/ : '' + def logo = multiqc_logo ? "--cl-config 'custom_logo: \"${multiqc_logo}\"'" : '' + def replace = replace_names ? "--replace-names ${replace_names}" : '' + def samples = sample_names ? "--sample-names ${sample_names}" : '' """ multiqc \\ --force \\ $args \\ $config \\ + $prefix \\ $extra_config \\ $logo \\ + $replace \\ + $samples \\ . cat <<-END_VERSIONS > versions.yml @@ -44,7 +52,7 @@ process MULTIQC { stub: """ mkdir multiqc_data - touch multiqc_plots + mkdir multiqc_plots touch multiqc_report.html cat <<-END_VERSIONS > versions.yml diff --git a/modules/nf-core/multiqc/meta.yml b/modules/nf-core/multiqc/meta.yml index 45a9bc35..b16c1879 100644 --- a/modules/nf-core/multiqc/meta.yml +++ b/modules/nf-core/multiqc/meta.yml @@ -1,5 +1,6 @@ name: multiqc -description: Aggregate results from bioinformatics analyses across many samples into a single report +description: Aggregate results from bioinformatics analyses across many samples into + a single report keywords: - QC - bioinformatics tools @@ -12,40 +13,59 @@ tools: homepage: https://multiqc.info/ documentation: https://multiqc.info/docs/ licence: ["GPL-3.0-or-later"] + identifier: biotools:multiqc input: - - multiqc_files: - type: file - description: | - List of reports / files recognised by MultiQC, for example the html and zip output of FastQC - - multiqc_config: - type: file - description: Optional config yml for MultiQC - pattern: "*.{yml,yaml}" - - extra_multiqc_config: - type: file - description: Second optional config yml for MultiQC. Will override common sections in multiqc_config. - pattern: "*.{yml,yaml}" - - multiqc_logo: - type: file - description: Optional logo file for MultiQC - pattern: "*.{png}" + - - multiqc_files: + type: file + description: | + List of reports / files recognised by MultiQC, for example the html and zip output of FastQC + - - multiqc_config: + type: file + description: Optional config yml for MultiQC + pattern: "*.{yml,yaml}" + - - extra_multiqc_config: + type: file + description: Second optional config yml for MultiQC. Will override common sections + in multiqc_config. + pattern: "*.{yml,yaml}" + - - multiqc_logo: + type: file + description: Optional logo file for MultiQC + pattern: "*.{png}" + - - replace_names: + type: file + description: | + Optional two-column sample renaming file. First column a set of + patterns, second column a set of corresponding replacements. Passed via + MultiQC's `--replace-names` option. + pattern: "*.{tsv}" + - - sample_names: + type: file + description: | + Optional TSV file with headers, passed to the MultiQC --sample_names + argument. + pattern: "*.{tsv}" output: - report: - type: file - description: MultiQC report file - pattern: "multiqc_report.html" + - "*multiqc_report.html": + type: file + description: MultiQC report file + pattern: "multiqc_report.html" - data: - type: directory - description: MultiQC data dir - pattern: "multiqc_data" + - "*_data": + type: directory + description: MultiQC data dir + pattern: "multiqc_data" - plots: - type: file - description: Plots created by MultiQC - pattern: "*_data" + - "*_plots": + type: file + description: Plots created by MultiQC + pattern: "*_data" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@abhi18av" - "@bunop" diff --git a/modules/nf-core/multiqc/tests/main.nf.test b/modules/nf-core/multiqc/tests/main.nf.test index f1c4242e..33316a7d 100644 --- a/modules/nf-core/multiqc/tests/main.nf.test +++ b/modules/nf-core/multiqc/tests/main.nf.test @@ -8,6 +8,8 @@ nextflow_process { tag "modules_nfcore" tag "multiqc" + config "./nextflow.config" + test("sarscov2 single-end [fastqc]") { when { @@ -17,6 +19,8 @@ nextflow_process { input[1] = [] input[2] = [] input[3] = [] + input[4] = [] + input[5] = [] """ } } @@ -41,6 +45,8 @@ nextflow_process { input[1] = Channel.of(file("https://github.com/nf-core/tools/raw/dev/nf_core/pipeline-template/assets/multiqc_config.yml", checkIfExists: true)) input[2] = [] input[3] = [] + input[4] = [] + input[5] = [] """ } } @@ -66,6 +72,8 @@ nextflow_process { input[1] = [] input[2] = [] input[3] = [] + input[4] = [] + input[5] = [] """ } } diff --git a/modules/nf-core/multiqc/tests/main.nf.test.snap b/modules/nf-core/multiqc/tests/main.nf.test.snap index bfebd802..2fcbb5ff 100644 --- a/modules/nf-core/multiqc/tests/main.nf.test.snap +++ b/modules/nf-core/multiqc/tests/main.nf.test.snap @@ -2,14 +2,14 @@ "multiqc_versions_single": { "content": [ [ - "versions.yml:md5,21f35ee29416b9b3073c28733efe4b7d" + "versions.yml:md5,41f391dcedce7f93ca188f3a3ffa0916" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-02-29T08:48:55.657331" + "timestamp": "2024-10-02T17:51:46.317523" }, "multiqc_stub": { "content": [ @@ -17,25 +17,25 @@ "multiqc_report.html", "multiqc_data", "multiqc_plots", - "versions.yml:md5,21f35ee29416b9b3073c28733efe4b7d" + "versions.yml:md5,41f391dcedce7f93ca188f3a3ffa0916" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-02-29T08:49:49.071937" + "timestamp": "2024-10-02T17:52:20.680978" }, "multiqc_versions_config": { "content": [ [ - "versions.yml:md5,21f35ee29416b9b3073c28733efe4b7d" + "versions.yml:md5,41f391dcedce7f93ca188f3a3ffa0916" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-02-29T08:49:25.457567" + "timestamp": "2024-10-02T17:52:09.185842" } } \ No newline at end of file diff --git a/modules/nf-core/multiqc/tests/nextflow.config b/modules/nf-core/multiqc/tests/nextflow.config new file mode 100644 index 00000000..c537a6a3 --- /dev/null +++ b/modules/nf-core/multiqc/tests/nextflow.config @@ -0,0 +1,5 @@ +process { + withName: 'MULTIQC' { + ext.prefix = null + } +} diff --git a/nextflow.config b/nextflow.config index 7a0c5c4e..d8fc2623 100644 --- a/nextflow.config +++ b/nextflow.config @@ -79,8 +79,13 @@ params { // Esmfold paths esmfold_params_path = null + // Foldseek params + foldseek_search = null + foldseek_easysearch_arg = null + // Process skipping options skip_multiqc = false + skip_visualisation = false // MultiQC options multiqc_config = null @@ -98,48 +103,28 @@ params { monochrome_logs = false hook_url = null help = false + help_full = false + show_hidden = false version = false pipelines_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/' // Config options config_profile_name = null config_profile_description = null + custom_config_version = 'master' custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" config_profile_contact = null config_profile_url = null - // Max resource options - // Defaults only, expecting to be overwritten - max_memory = '128.GB' - max_cpus = 16 - max_time = '240.h' - // Schema validation default options - validationFailUnrecognisedParams = false - validationLenientMode = false - validationSchemaIgnoreParams = '' - validationShowHiddenParams = false - validate_params = true + validate_params = true } // Load base.config by default for all pipelines includeConfig 'conf/base.config' -// Load nf-core custom profiles from different Institutions -try { - includeConfig "${params.custom_config_base}/nfcore_custom.config" -} catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") -} - -// Load nf-core/proteinfold custom profiles from different institutions. -try { - includeConfig "${params.custom_config_base}/pipeline/proteinfold.config" -} catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config/proteinfold profiles: ${params.custom_config_base}/pipeline/proteinfold.config") -} profiles { debug { dumpHashes = true @@ -154,7 +139,7 @@ profiles { podman.enabled = false shifter.enabled = false charliecloud.enabled = false - conda.channels = ['conda-forge', 'bioconda', 'defaults'] + conda.channels = ['conda-forge', 'bioconda'] apptainer.enabled = false } mamba { @@ -169,7 +154,6 @@ profiles { } docker { docker.enabled = true - docker.userEmulation = true if (params.use_gpu) { docker.runOptions = '--gpus all' } else { @@ -267,18 +251,20 @@ profiles { test_full_esmfold_multimer { includeConfig 'conf/test_full_esmfold_multimer.config' } } -// Set default registry for Apptainer, Docker, Podman and Singularity independent of -profile -// Will not be used unless Apptainer / Docker / Podman / Singularity are enabled -// Set to your registry if you have a mirror of containers -apptainer.registry = 'quay.io' -docker.registry = 'quay.io' -podman.registry = 'quay.io' -singularity.registry = 'quay.io' +// Load nf-core custom profiles from different Institutions +includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/nfcore_custom.config" : "/dev/null" -// Nextflow plugins -plugins { - id 'nf-validation@1.1.3' // Validation of pipeline parameters and creation of an input channel from a sample sheet -} +// Load nf-core/proteinfold custom profiles from different institutions. +includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/pipeline/proteinfold.config" : "/dev/null" + +// Set default registry for Apptainer, Docker, Podman, Charliecloud and Singularity independent of -profile +// Will not be used unless Apptainer / Docker / Podman / Charliecloud / Singularity are enabled +// Set to your registry if you have a mirror of containers +apptainer.registry = 'quay.io' +docker.registry = 'quay.io' +podman.registry = 'quay.io' +singularity.registry = 'quay.io' +charliecloud.registry = 'quay.io' // Export these variables to prevent local Python/R libraries from conflicting with those in the container // The JULIA depot path has been adjusted to a fixed path `/usr/local/share/julia` that needs to be used for packages in the container. @@ -291,8 +277,15 @@ env { JULIA_DEPOT_PATH = "/usr/local/share/julia" } -// Capture exit codes from upstream processes when piping -process.shell = ['/bin/bash', '-euo', 'pipefail'] +// Set bash options +process.shell = """\ +bash + +set -e # Exit if a tool returns a non-zero status/exit code +set -u # Treat unset variables and parameters as an error +set -o pipefail # Returns the status of the last command to exit with a non-zero status or zero if all successfully execute +set -C # No clobber - prevent output redirection from overwriting files. +""" // Disable process selector warnings by default. Use debug profile to enable warnings. nextflow.enable.configProcessNamesValidation = false @@ -321,58 +314,60 @@ manifest { homePage = 'https://github.com/nf-core/proteinfold' description = """Protein 3D structure prediction pipeline""" mainScript = 'main.nf' - nextflowVersion = '!>=23.04.0' - version = '1.1.1' + nextflowVersion = '!>=24.04.2' + version = '1.2.0dev' doi = '10.5281/zenodo.7629996' } +// Nextflow plugins +plugins { + id 'nf-schema@2.1.1' // Validation of pipeline parameters and creation of an input channel from a sample sheet +} + +validation { + defaultIgnoreParams = ["genomes"] + help { + enabled = true + command = "nextflow run $manifest.name -profile --input samplesheet.csv --outdir " + fullParameter = "help_full" + showHiddenParameter = "show_hidden" + beforeText = """ +-\033[2m----------------------------------------------------\033[0m- + \033[0;32m,--.\033[0;30m/\033[0;32m,-.\033[0m +\033[0;34m ___ __ __ __ ___ \033[0;32m/,-._.--~\'\033[0m +\033[0;34m |\\ | |__ __ / ` / \\ |__) |__ \033[0;33m} {\033[0m +\033[0;34m | \\| | \\__, \\__/ | \\ |___ \033[0;32m\\`-._,-`-,\033[0m + \033[0;32m`._,._,\'\033[0m +\033[0;35m ${manifest.name} ${manifest.version}\033[0m +-\033[2m----------------------------------------------------\033[0m- +""" + afterText = """${manifest.doi ? "* The pipeline\n" : ""}${manifest.doi.tokenize(",").collect { " https://doi.org/${it.trim().replace('https://doi.org/','')}"}.join("\n")}${manifest.doi ? "\n" : ""} +* The nf-core framework + https://doi.org/10.1038/s41587-020-0439-x + +* Software dependencies + https://github.com/${manifest.name}/blob/master/CITATIONS.md +""" + } + summary { + beforeText = validation.help.beforeText + afterText = validation.help.afterText + } +} + // Load modules.config for DSL2 module specific options includeConfig 'conf/modules.config' // Load modules config for pipeline specific modes -if (params.mode == 'alphafold2') { +if (params.mode.toLowerCase().split(",").contains("alphafold2")) { includeConfig 'conf/modules_alphafold2.config' -} else if (params.mode == 'colabfold') { +} +if (params.mode.toLowerCase().split(",").contains("colabfold")) { includeConfig 'conf/modules_colabfold.config' -} else if (params.mode == 'esmfold') { +} +if (params.mode.toLowerCase().split(",").contains("esmfold")) { includeConfig 'conf/modules_esmfold.config' } // Load links to DBs and parameters includeConfig 'conf/dbs.config' - -// Function to ensure that resource requirements don't go beyond -// a maximum limit -def check_max(obj, type) { - if (type == 'memory') { - try { - if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) - return params.max_memory as nextflow.util.MemoryUnit - else - return obj - } catch (all) { - println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'time') { - try { - if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) - return params.max_time as nextflow.util.Duration - else - return obj - } catch (all) { - println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'cpus') { - try { - return Math.min( obj, params.max_cpus as int ) - } catch (all) { - println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" - return obj - } - } -} - - - diff --git a/nextflow_schema.json b/nextflow_schema.json index df0bbfe3..313997a8 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -1,10 +1,10 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/nf-core/proteinfold/master/nextflow_schema.json", "title": "nf-core/proteinfold pipeline parameters", "description": "Protein 3D structure prediction pipeline", "type": "object", - "definitions": { + "$defs": { "input_output_options": { "title": "Global options", "type": "object", @@ -32,8 +32,7 @@ "mode": { "type": "string", "default": "alphafold2", - "description": "Specifies the mode in which the pipeline will be run", - "enum": ["alphafold2", "colabfold", "esmfold"], + "description": "Specifies the mode in which the pipeline will be run. mode can be any combination of ['alphafold2', 'colabfold', 'esmfold'] separated by a comma (',') with no spaces.", "fa_icon": "fas fa-cogs" }, "use_gpu": { @@ -194,6 +193,38 @@ } } }, + "foldseek_options": { + "title": "Foldseek options", + "type": "object", + "fa_icon": "fas fa-coins", + "description": "Foldseek options.", + "properties": { + "foldseek_search": { + "type": "string", + "enum": [null, "easysearch"], + "default": null, + "description": "Specifies the mode of foldseek search.", + "fa_icon": "fas fa-search" + }, + "foldseek_db": { + "type": "string", + "description": "The ID of Foldseek databases", + "fa_icon": "fas fa-server" + }, + "foldseek_db_path": { + "type": "string", + "format": "path", + "exists": true, + "description": "Specifies the path to foldseek databases used by 'foldseek'.", + "fa_icon": "fas fa-folder-open" + }, + "foldseek_easysearch_arg": { + "type": "string", + "description": "Specifies the arguments to be passed to foldseek easysearch command", + "fa_icon": "fas fa-server" + } + } + }, "process_skipping_options": { "title": "Process skipping options", "type": "object", @@ -204,6 +235,11 @@ "type": "boolean", "description": "Skip MultiQC.", "fa_icon": "fas fa-fast-forward" + }, + "skip_visualisation": { + "type": "boolean", + "description": "Skip visualisation reports.", + "fa_icon": "fas fa-fast-forward" } } }, @@ -255,41 +291,6 @@ } } }, - "max_job_request_options": { - "title": "Max job request options", - "type": "object", - "fa_icon": "fab fa-acquisitions-incorporated", - "description": "Set the top limit for requested resources for any single job.", - "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.", - "properties": { - "max_cpus": { - "type": "integer", - "description": "Maximum number of CPUs that can be requested for any single job.", - "default": 16, - "fa_icon": "fas fa-microchip", - "hidden": true, - "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" - }, - "max_memory": { - "type": "string", - "description": "Maximum amount of memory that can be requested for any single job.", - "default": "128.GB", - "fa_icon": "fas fa-memory", - "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", - "hidden": true, - "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" - }, - "max_time": { - "type": "string", - "description": "Maximum amount of time that can be requested for any single job.", - "default": "240.h", - "fa_icon": "far fa-clock", - "pattern": "^(\\d+\\.?\\s*(s|m|h|d|day)\\s*)+$", - "hidden": true, - "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" - } - } - }, "alphafold2_dbs_and_parameters_link_options": { "title": "Alphafold2 DBs and parameters links options", "type": "object", @@ -475,7 +476,7 @@ "fa_icon": "fas fa-folder-open" }, "colabfold_alphafold2_params_tags": { - "type": "string", + "type": "object", "description": "Dictionary with Alphafold2 parameters tags", "fa_icon": "fas fa-stream" } @@ -527,12 +528,6 @@ "description": "Less common options for the pipeline, typically set in a config file.", "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.", "properties": { - "help": { - "type": "boolean", - "description": "Display help text.", - "fa_icon": "fas fa-question-circle", - "hidden": true - }, "version": { "type": "boolean", "description": "Display version and exit.", @@ -616,27 +611,6 @@ "fa_icon": "fas fa-check-square", "hidden": true }, - "validationShowHiddenParams": { - "type": "boolean", - "fa_icon": "far fa-eye-slash", - "description": "Show all params when using `--help`", - "hidden": true, - "help_text": "By default, parameters set as _hidden_ in the schema are not shown on the command line when a user runs with `--help`. Specifying this option will tell the pipeline to show all parameters." - }, - "validationFailUnrecognisedParams": { - "type": "boolean", - "fa_icon": "far fa-check-circle", - "description": "Validation of parameters fails when an unrecognised parameter is found.", - "hidden": true, - "help_text": "By default, when an unrecognised parameter is found, it returns a warinig." - }, - "validationLenientMode": { - "type": "boolean", - "fa_icon": "far fa-check-circle", - "description": "Validation of parameters in lenient more.", - "hidden": true, - "help_text": "Allows string values that are parseable as numbers or booleans. For further information see [JSONSchema docs](https://github.com/everit-org/json-schema#lenient-mode)." - }, "pipelines_testdata_base_path": { "type": "string", "fa_icon": "far fa-check-circle", @@ -649,46 +623,46 @@ }, "allOf": [ { - "$ref": "#/definitions/input_output_options" + "$ref": "#/$defs/input_output_options" }, { - "$ref": "#/definitions/alphafold2_options" + "$ref": "#/$defs/alphafold2_options" }, { - "$ref": "#/definitions/colabfold_options" + "$ref": "#/$defs/colabfold_options" }, { - "$ref": "#/definitions/esmfold_options" + "$ref": "#/$defs/esmfold_options" }, { - "$ref": "#/definitions/process_skipping_options" + "$ref": "#/$defs/foldseek_options" }, { - "$ref": "#/definitions/institutional_config_options" + "$ref": "#/$defs/process_skipping_options" }, { - "$ref": "#/definitions/max_job_request_options" + "$ref": "#/$defs/institutional_config_options" }, { - "$ref": "#/definitions/alphafold2_dbs_and_parameters_link_options" + "$ref": "#/$defs/alphafold2_dbs_and_parameters_link_options" }, { - "$ref": "#/definitions/alphafold2_dbs_and_parameters_path_options" + "$ref": "#/$defs/alphafold2_dbs_and_parameters_path_options" }, { - "$ref": "#/definitions/colabfold_dbs_and_parameters_link_options" + "$ref": "#/$defs/colabfold_dbs_and_parameters_link_options" }, { - "$ref": "#/definitions/colabfold_dbs_and_parameters_path_options" + "$ref": "#/$defs/colabfold_dbs_and_parameters_path_options" }, { - "$ref": "#/definitions/esmfold_parameters_link_options" + "$ref": "#/$defs/esmfold_parameters_link_options" }, { - "$ref": "#/definitions/esmfold_parameters_path_options" + "$ref": "#/$defs/esmfold_parameters_path_options" }, { - "$ref": "#/definitions/generic_options" + "$ref": "#/$defs/generic_options" } ] } diff --git a/subworkflows/local/utils_nfcore_proteinfold_pipeline/main.nf b/subworkflows/local/utils_nfcore_proteinfold_pipeline/main.nf index 742d460a..fa0545a6 100644 --- a/subworkflows/local/utils_nfcore_proteinfold_pipeline/main.nf +++ b/subworkflows/local/utils_nfcore_proteinfold_pipeline/main.nf @@ -8,34 +8,34 @@ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -include { UTILS_NFVALIDATION_PLUGIN } from '../../nf-core/utils_nfvalidation_plugin' -include { paramsSummaryMap } from 'plugin/nf-validation' -include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline' +include { UTILS_NFSCHEMA_PLUGIN } from '../../nf-core/utils_nfschema_plugin' +include { paramsSummaryMap } from 'plugin/nf-schema' +include { samplesheetToList } from 'plugin/nf-schema' include { completionEmail } from '../../nf-core/utils_nfcore_pipeline' include { completionSummary } from '../../nf-core/utils_nfcore_pipeline' -include { dashedLine } from '../../nf-core/utils_nfcore_pipeline' -include { nfCoreLogo } from '../../nf-core/utils_nfcore_pipeline' include { imNotification } from '../../nf-core/utils_nfcore_pipeline' include { UTILS_NFCORE_PIPELINE } from '../../nf-core/utils_nfcore_pipeline' -include { workflowCitation } from '../../nf-core/utils_nfcore_pipeline' +include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline' /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW TO INITIALISE PIPELINE -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow PIPELINE_INITIALISATION { take: version // boolean: Display version and exit - help // boolean: Display help text validate_params // boolean: Boolean whether to validate parameters against the schema at runtime monochrome_logs // boolean: Do not use coloured log outputs nextflow_cli_args // array: List of positional nextflow CLI args outdir // string: The output directory where the results will be saved + input // string: Path to input samplesheet main: + ch_versions = Channel.empty() + // // Print version and exit if required and dump pipeline parameters to JSON file // @@ -49,16 +49,10 @@ workflow PIPELINE_INITIALISATION { // // Validate parameters and generate parameter summary to stdout // - pre_help_text = nfCoreLogo(monochrome_logs) - post_help_text = '\n' + workflowCitation() + '\n' + dashedLine(monochrome_logs) - def String workflow_command = "nextflow run ${workflow.manifest.name} -profile --input samplesheet.csv --outdir " - UTILS_NFVALIDATION_PLUGIN ( - help, - workflow_command, - pre_help_text, - post_help_text, + UTILS_NFSCHEMA_PLUGIN ( + workflow, validate_params, - "nextflow_schema.json" + null ) // @@ -67,12 +61,21 @@ workflow PIPELINE_INITIALISATION { UTILS_NFCORE_PIPELINE ( nextflow_cli_args ) + + // + // Create channel from input file provided through params.input + // + ch_samplesheet = Channel.fromList(samplesheetToList(params.input, "assets/schema_input.json")) + + emit: + samplesheet = ch_samplesheet + versions = ch_versions } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW FOR PIPELINE COMPLETION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow PIPELINE_COMPLETION { @@ -87,7 +90,6 @@ workflow PIPELINE_COMPLETION { multiqc_report // string: Path to MultiQC report main: - summary_params = paramsSummaryMap(workflow, parameters_schema: "nextflow_schema.json") // @@ -95,11 +97,18 @@ workflow PIPELINE_COMPLETION { // workflow.onComplete { if (email || email_on_fail) { - completionEmail(summary_params, email, email_on_fail, plaintext_email, outdir, monochrome_logs, multiqc_report.toList()) + completionEmail( + summary_params, + email, + email_on_fail, + plaintext_email, + outdir, + monochrome_logs, + multiqc_report.toList() + ) } completionSummary(monochrome_logs) - if (hook_url) { imNotification(summary_params, hook_url) } @@ -111,9 +120,9 @@ workflow PIPELINE_COMPLETION { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // @@ -152,7 +161,6 @@ def toolCitationText() { // Uncomment function in methodsDescriptionText to render in MultiQC report def citation_text = [ "Tools used in the workflow included:", - "FastQC (Andrews 2010),", "MultiQC (Ewels et al. 2016)", "." ].join(' ').trim() @@ -165,7 +173,6 @@ def toolBibliographyText() { // Can use ternary operators to dynamically construct based conditions, e.g. params["run_xyz"] ? "
  • Author (2023) Pub name, Journal, DOI
  • " : "", // Uncomment function in methodsDescriptionText to render in MultiQC report def reference_text = [ - "
  • Andrews S, (2010) FastQC, URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
  • ", "
  • Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. doi: /10.1093/bioinformatics/btw354
  • " ].join(' ').trim() @@ -184,8 +191,10 @@ def methodsDescriptionText(mqc_methods_yaml) { // Removing `https://doi.org/` to handle pipelines using DOIs vs DOI resolvers // Removing ` ` since the manifest.doi is a string and not a proper list def temp_doi_ref = "" - String[] manifest_doi = meta.manifest_map.doi.tokenize(",") - for (String doi_ref: manifest_doi) temp_doi_ref += "(doi: ${doi_ref.replace("https://doi.org/", "").replace(" ", "")}), " + def manifest_doi = meta.manifest_map.doi.tokenize(",") + manifest_doi.each { doi_ref -> + temp_doi_ref += "(doi: ${doi_ref.replace("https://doi.org/", "").replace(" ", "")}), " + } meta["doi_text"] = temp_doi_ref.substring(0, temp_doi_ref.length() - 2) } else meta["doi_text"] = "" meta["nodoi_text"] = meta.manifest_map.doi ? "" : "
  • If available, make sure to update the text to include the Zenodo DOI of version of the pipeline used.
  • " @@ -204,3 +213,4 @@ def methodsDescriptionText(mqc_methods_yaml) { return description_html.toString() } + diff --git a/subworkflows/nf-core/utils_nextflow_pipeline/main.nf b/subworkflows/nf-core/utils_nextflow_pipeline/main.nf index ac31f28f..0fcbf7b3 100644 --- a/subworkflows/nf-core/utils_nextflow_pipeline/main.nf +++ b/subworkflows/nf-core/utils_nextflow_pipeline/main.nf @@ -2,18 +2,13 @@ // Subworkflow with functionality that may be useful for any Nextflow pipeline // -import org.yaml.snakeyaml.Yaml -import groovy.json.JsonOutput -import nextflow.extension.FilesEx - /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW DEFINITION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow UTILS_NEXTFLOW_PIPELINE { - take: print_version // boolean: print version dump_parameters // boolean: dump parameters @@ -26,7 +21,7 @@ workflow UTILS_NEXTFLOW_PIPELINE { // Print workflow version and exit on --version // if (print_version) { - log.info "${workflow.manifest.name} ${getWorkflowVersion()}" + log.info("${workflow.manifest.name} ${getWorkflowVersion()}") System.exit(0) } @@ -49,16 +44,16 @@ workflow UTILS_NEXTFLOW_PIPELINE { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Generate version string // def getWorkflowVersion() { - String version_string = "" + def version_string = "" as String if (workflow.manifest.version) { def prefix_v = workflow.manifest.version[0] != 'v' ? 'v' : '' version_string += "${prefix_v}${workflow.manifest.version}" @@ -76,13 +71,13 @@ def getWorkflowVersion() { // Dump pipeline parameters to a JSON file // def dumpParametersToJSON(outdir) { - def timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss') - def filename = "params_${timestamp}.json" - def temp_pf = new File(workflow.launchDir.toString(), ".${filename}") - def jsonStr = JsonOutput.toJson(params) - temp_pf.text = JsonOutput.prettyPrint(jsonStr) + def timestamp = new java.util.Date().format('yyyy-MM-dd_HH-mm-ss') + def filename = "params_${timestamp}.json" + def temp_pf = new File(workflow.launchDir.toString(), ".${filename}") + def jsonStr = groovy.json.JsonOutput.toJson(params) + temp_pf.text = groovy.json.JsonOutput.prettyPrint(jsonStr) - FilesEx.copyTo(temp_pf.toPath(), "${outdir}/pipeline_info/params_${timestamp}.json") + nextflow.extension.FilesEx.copyTo(temp_pf.toPath(), "${outdir}/pipeline_info/params_${timestamp}.json") temp_pf.delete() } @@ -90,37 +85,40 @@ def dumpParametersToJSON(outdir) { // When running with -profile conda, warn if channels have not been set-up appropriately // def checkCondaChannels() { - Yaml parser = new Yaml() + def parser = new org.yaml.snakeyaml.Yaml() def channels = [] try { def config = parser.load("conda config --show channels".execute().text) channels = config.channels - } catch(NullPointerException | IOException e) { - log.warn "Could not verify conda channel configuration." - return + } + catch (NullPointerException e) { + log.warn("Could not verify conda channel configuration.") + return null + } + catch (IOException e) { + log.warn("Could not verify conda channel configuration.") + return null } // Check that all channels are present // This channel list is ordered by required channel priority. - def required_channels_in_order = ['conda-forge', 'bioconda', 'defaults'] + def required_channels_in_order = ['conda-forge', 'bioconda'] def channels_missing = ((required_channels_in_order as Set) - (channels as Set)) as Boolean // Check that they are in the right order - def channel_priority_violation = false - def n = required_channels_in_order.size() - for (int i = 0; i < n - 1; i++) { - channel_priority_violation |= !(channels.indexOf(required_channels_in_order[i]) < channels.indexOf(required_channels_in_order[i+1])) - } + def channel_priority_violation = required_channels_in_order != channels.findAll { ch -> ch in required_channels_in_order } if (channels_missing | channel_priority_violation) { - log.warn "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" + - " There is a problem with your Conda configuration!\n\n" + - " You will need to set-up the conda-forge and bioconda channels correctly.\n" + - " Please refer to https://bioconda.github.io/\n" + - " The observed channel order is \n" + - " ${channels}\n" + - " but the following channel order is required:\n" + - " ${required_channels_in_order}\n" + - "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + log.warn """\ + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + There is a problem with your Conda configuration! + You will need to set-up the conda-forge and bioconda channels correctly. + Please refer to https://bioconda.github.io/ + The observed channel order is + ${channels} + but the following channel order is required: + ${required_channels_in_order} + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + """.stripIndent(true) } } diff --git a/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config b/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config index d0a926bf..a09572e5 100644 --- a/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config +++ b/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config @@ -3,7 +3,7 @@ manifest { author = """nf-core""" homePage = 'https://127.0.0.1' description = """Dummy pipeline""" - nextflowVersion = '!>=23.04.0' + nextflowVersion = '!>=23.04.0' version = '9.9.9' doi = 'https://doi.org/10.5281/zenodo.5070524' } diff --git a/subworkflows/nf-core/utils_nfcore_pipeline/main.nf b/subworkflows/nf-core/utils_nfcore_pipeline/main.nf index 14558c39..5cb7bafe 100644 --- a/subworkflows/nf-core/utils_nfcore_pipeline/main.nf +++ b/subworkflows/nf-core/utils_nfcore_pipeline/main.nf @@ -2,17 +2,13 @@ // Subworkflow with utility functions specific to the nf-core pipeline template // -import org.yaml.snakeyaml.Yaml -import nextflow.extension.FilesEx - /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW DEFINITION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow UTILS_NFCORE_PIPELINE { - take: nextflow_cli_args @@ -25,23 +21,20 @@ workflow UTILS_NFCORE_PIPELINE { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Warn if a -profile or Nextflow config has not been provided to run the pipeline // def checkConfigProvided() { - valid_config = true + def valid_config = true as Boolean if (workflow.profile == 'standard' && workflow.configFiles.size() <= 1) { - log.warn "[$workflow.manifest.name] You are attempting to run the pipeline without any custom configuration!\n\n" + - "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + - " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + - " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + - " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + - "Please refer to the quick start section and usage docs for the pipeline.\n " + log.warn( + "[${workflow.manifest.name}] You are attempting to run the pipeline without any custom configuration!\n\n" + "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + "Please refer to the quick start section and usage docs for the pipeline.\n " + ) valid_config = false } return valid_config @@ -52,12 +45,14 @@ def checkConfigProvided() { // def checkProfileProvided(nextflow_cli_args) { if (workflow.profile.endsWith(',')) { - error "The `-profile` option cannot end with a trailing comma, please remove it and re-run the pipeline!\n" + - "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + error( + "The `-profile` option cannot end with a trailing comma, please remove it and re-run the pipeline!\n" + "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + ) } if (nextflow_cli_args[0]) { - log.warn "nf-core pipelines do not accept positional arguments. The positional argument `${nextflow_cli_args[0]}` has been detected.\n" + - "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + log.warn( + "nf-core pipelines do not accept positional arguments. The positional argument `${nextflow_cli_args[0]}` has been detected.\n" + "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + ) } } @@ -66,25 +61,21 @@ def checkProfileProvided(nextflow_cli_args) { // def workflowCitation() { def temp_doi_ref = "" - String[] manifest_doi = workflow.manifest.doi.tokenize(",") - // Using a loop to handle multiple DOIs + def manifest_doi = workflow.manifest.doi.tokenize(",") + // Handling multiple DOIs // Removing `https://doi.org/` to handle pipelines using DOIs vs DOI resolvers // Removing ` ` since the manifest.doi is a string and not a proper list - for (String doi_ref: manifest_doi) temp_doi_ref += " https://doi.org/${doi_ref.replace('https://doi.org/', '').replace(' ', '')}\n" - return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + - "* The pipeline\n" + - temp_doi_ref + "\n" + - "* The nf-core framework\n" + - " https://doi.org/10.1038/s41587-020-0439-x\n\n" + - "* Software dependencies\n" + - " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" + manifest_doi.each { doi_ref -> + temp_doi_ref += " https://doi.org/${doi_ref.replace('https://doi.org/', '').replace(' ', '')}\n" + } + return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + "* The pipeline\n" + temp_doi_ref + "\n" + "* The nf-core framework\n" + " https://doi.org/10.1038/s41587-020-0439-x\n\n" + "* Software dependencies\n" + " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" } // // Generate workflow version string // def getWorkflowVersion() { - String version_string = "" + def version_string = "" as String if (workflow.manifest.version) { def prefix_v = workflow.manifest.version[0] != 'v' ? 'v' : '' version_string += "${prefix_v}${workflow.manifest.version}" @@ -102,8 +93,8 @@ def getWorkflowVersion() { // Get software versions for pipeline // def processVersionsFromYAML(yaml_file) { - Yaml yaml = new Yaml() - versions = yaml.load(yaml_file).collectEntries { k, v -> [ k.tokenize(':')[-1], v ] } + def yaml = new org.yaml.snakeyaml.Yaml() + def versions = yaml.load(yaml_file).collectEntries { k, v -> [k.tokenize(':')[-1], v] } return yaml.dumpAsMap(versions).trim() } @@ -113,8 +104,8 @@ def processVersionsFromYAML(yaml_file) { def workflowVersionToYAML() { return """ Workflow: - $workflow.manifest.name: ${getWorkflowVersion()} - Nextflow: $workflow.nextflow.version + ${workflow.manifest.name}: ${getWorkflowVersion()} + Nextflow: ${workflow.nextflow.version} """.stripIndent().trim() } @@ -122,11 +113,7 @@ def workflowVersionToYAML() { // Get channel of software versions used in pipeline in YAML format // def softwareVersionsToYAML(ch_versions) { - return ch_versions - .unique() - .map { processVersionsFromYAML(it) } - .unique() - .mix(Channel.of(workflowVersionToYAML())) + return ch_versions.unique().map { version -> processVersionsFromYAML(version) }.unique().mix(Channel.of(workflowVersionToYAML())) } // @@ -134,25 +121,31 @@ def softwareVersionsToYAML(ch_versions) { // def paramsSummaryMultiqc(summary_params) { def summary_section = '' - for (group in summary_params.keySet()) { - def group_params = summary_params.get(group) // This gets the parameters of that particular group - if (group_params) { - summary_section += "

    $group

    \n" - summary_section += "
    \n" - for (param in group_params.keySet()) { - summary_section += "
    $param
    ${group_params.get(param) ?: 'N/A'}
    \n" + summary_params + .keySet() + .each { group -> + def group_params = summary_params.get(group) + // This gets the parameters of that particular group + if (group_params) { + summary_section += "

    ${group}

    \n" + summary_section += "
    \n" + group_params + .keySet() + .sort() + .each { param -> + summary_section += "
    ${param}
    ${group_params.get(param) ?: 'N/A'}
    \n" + } + summary_section += "
    \n" } - summary_section += "
    \n" } - } - String yaml_file_text = "id: '${workflow.manifest.name.replace('/','-')}-summary'\n" - yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" - yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" - yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" - yaml_file_text += "plot_type: 'html'\n" - yaml_file_text += "data: |\n" - yaml_file_text += "${summary_section}" + def yaml_file_text = "id: '${workflow.manifest.name.replace('/', '-')}-summary'\n" as String + yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" + yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" + yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" + yaml_file_text += "plot_type: 'html'\n" + yaml_file_text += "data: |\n" + yaml_file_text += "${summary_section}" return yaml_file_text } @@ -161,7 +154,7 @@ def paramsSummaryMultiqc(summary_params) { // nf-core logo // def nfCoreLogo(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map String.format( """\n ${dashedLine(monochrome_logs)} @@ -180,7 +173,7 @@ def nfCoreLogo(monochrome_logs=true) { // Return dashed line // def dashedLine(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map return "-${colors.dim}----------------------------------------------------${colors.reset}-" } @@ -188,7 +181,7 @@ def dashedLine(monochrome_logs=true) { // ANSII colours used for terminal logging // def logColours(monochrome_logs=true) { - Map colorcodes = [:] + def colorcodes = [:] as Map // Reset / Meta colorcodes['reset'] = monochrome_logs ? '' : "\033[0m" @@ -200,54 +193,54 @@ def logColours(monochrome_logs=true) { colorcodes['hidden'] = monochrome_logs ? '' : "\033[8m" // Regular Colors - colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" - colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" - colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" - colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" - colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" - colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" - colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" - colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" + colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" + colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" + colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" + colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" + colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" + colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" + colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" + colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" // Bold - colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" - colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" - colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" - colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" - colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" - colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" - colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" - colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" + colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" + colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" + colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" + colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" + colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" + colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" + colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" + colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" // Underline - colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" - colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" - colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" - colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" - colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" - colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" - colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" - colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" + colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" + colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" + colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" + colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" + colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" + colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" + colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" + colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" // High Intensity - colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" - colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" - colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" - colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" - colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" - colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" - colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" - colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" + colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" + colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" + colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" + colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" + colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" + colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" + colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" + colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" // Bold High Intensity - colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" - colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" - colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" - colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" - colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" - colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" - colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" - colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" + colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" + colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" + colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" + colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" + colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" + colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" + colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" + colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" return colorcodes } @@ -262,14 +255,15 @@ def attachMultiqcReport(multiqc_report) { mqc_report = multiqc_report.getVal() if (mqc_report.getClass() == ArrayList && mqc_report.size() >= 1) { if (mqc_report.size() > 1) { - log.warn "[$workflow.manifest.name] Found multiple reports from process 'MULTIQC', will use only one" + log.warn("[${workflow.manifest.name}] Found multiple reports from process 'MULTIQC', will use only one") } mqc_report = mqc_report[0] } } - } catch (all) { + } + catch (Exception all) { if (multiqc_report) { - log.warn "[$workflow.manifest.name] Could not attach MultiQC report to summary email" + log.warn("[${workflow.manifest.name}] Could not attach MultiQC report to summary email") } } return mqc_report @@ -281,26 +275,35 @@ def attachMultiqcReport(multiqc_report) { def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdir, monochrome_logs=true, multiqc_report=null) { // Set up the e-mail variables - def subject = "[$workflow.manifest.name] Successful: $workflow.runName" + def subject = "[${workflow.manifest.name}] Successful: ${workflow.runName}" if (!workflow.success) { - subject = "[$workflow.manifest.name] FAILED: $workflow.runName" + subject = "[${workflow.manifest.name}] FAILED: ${workflow.runName}" } def summary = [:] - for (group in summary_params.keySet()) { - summary << summary_params[group] - } + summary_params + .keySet() + .sort() + .each { group -> + summary << summary_params[group] + } def misc_fields = [:] misc_fields['Date Started'] = workflow.start misc_fields['Date Completed'] = workflow.complete misc_fields['Pipeline script file path'] = workflow.scriptFile misc_fields['Pipeline script hash ID'] = workflow.scriptId - if (workflow.repository) misc_fields['Pipeline repository Git URL'] = workflow.repository - if (workflow.commitId) misc_fields['Pipeline repository Git Commit'] = workflow.commitId - if (workflow.revision) misc_fields['Pipeline Git branch/tag'] = workflow.revision - misc_fields['Nextflow Version'] = workflow.nextflow.version - misc_fields['Nextflow Build'] = workflow.nextflow.build + if (workflow.repository) { + misc_fields['Pipeline repository Git URL'] = workflow.repository + } + if (workflow.commitId) { + misc_fields['Pipeline repository Git Commit'] = workflow.commitId + } + if (workflow.revision) { + misc_fields['Pipeline Git branch/tag'] = workflow.revision + } + misc_fields['Nextflow Version'] = workflow.nextflow.version + misc_fields['Nextflow Build'] = workflow.nextflow.build misc_fields['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp def email_fields = [:] @@ -338,39 +341,41 @@ def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdi // Render the sendmail template def max_multiqc_email_size = (params.containsKey('max_multiqc_email_size') ? params.max_multiqc_email_size : 0) as nextflow.util.MemoryUnit - def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "${workflow.projectDir}", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes() ] + def smail_fields = [email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "${workflow.projectDir}", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes()] def sf = new File("${workflow.projectDir}/assets/sendmail_template.txt") def sendmail_template = engine.createTemplate(sf).make(smail_fields) def sendmail_html = sendmail_template.toString() // Send the HTML e-mail - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map if (email_address) { try { - if (plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') } + if (plaintext_email) { +new org.codehaus.groovy.GroovyException('Send plaintext e-mail, not HTML') } // Try to send HTML e-mail using sendmail def sendmail_tf = new File(workflow.launchDir.toString(), ".sendmail_tmp.html") sendmail_tf.withWriter { w -> w << sendmail_html } - [ 'sendmail', '-t' ].execute() << sendmail_html - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (sendmail)-" - } catch (all) { + ['sendmail', '-t'].execute() << sendmail_html + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Sent summary e-mail to ${email_address} (sendmail)-") + } + catch (Exception all) { // Catch failures and try with plaintext - def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] + def mail_cmd = ['mail', '-s', subject, '--content-type=text/html', email_address] mail_cmd.execute() << email_html - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (mail)-" + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Sent summary e-mail to ${email_address} (mail)-") } } // Write summary e-mail HTML to a file def output_hf = new File(workflow.launchDir.toString(), ".pipeline_report.html") output_hf.withWriter { w -> w << email_html } - FilesEx.copyTo(output_hf.toPath(), "${outdir}/pipeline_info/pipeline_report.html"); + nextflow.extension.FilesEx.copyTo(output_hf.toPath(), "${outdir}/pipeline_info/pipeline_report.html") output_hf.delete() // Write summary e-mail TXT to a file def output_tf = new File(workflow.launchDir.toString(), ".pipeline_report.txt") output_tf.withWriter { w -> w << email_txt } - FilesEx.copyTo(output_tf.toPath(), "${outdir}/pipeline_info/pipeline_report.txt"); + nextflow.extension.FilesEx.copyTo(output_tf.toPath(), "${outdir}/pipeline_info/pipeline_report.txt") output_tf.delete() } @@ -378,15 +383,17 @@ def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdi // Print pipeline summary on completion // def completionSummary(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map if (workflow.success) { if (workflow.stats.ignoredCount == 0) { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Pipeline completed successfully${colors.reset}-" - } else { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.yellow} Pipeline completed successfully, but with errored process(es) ${colors.reset}-" + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Pipeline completed successfully${colors.reset}-") + } + else { + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.yellow} Pipeline completed successfully, but with errored process(es) ${colors.reset}-") } - } else { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed with errors${colors.reset}-" + } + else { + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.red} Pipeline completed with errors${colors.reset}-") } } @@ -395,21 +402,30 @@ def completionSummary(monochrome_logs=true) { // def imNotification(summary_params, hook_url) { def summary = [:] - for (group in summary_params.keySet()) { - summary << summary_params[group] - } + summary_params + .keySet() + .sort() + .each { group -> + summary << summary_params[group] + } def misc_fields = [:] - misc_fields['start'] = workflow.start - misc_fields['complete'] = workflow.complete - misc_fields['scriptfile'] = workflow.scriptFile - misc_fields['scriptid'] = workflow.scriptId - if (workflow.repository) misc_fields['repository'] = workflow.repository - if (workflow.commitId) misc_fields['commitid'] = workflow.commitId - if (workflow.revision) misc_fields['revision'] = workflow.revision - misc_fields['nxf_version'] = workflow.nextflow.version - misc_fields['nxf_build'] = workflow.nextflow.build - misc_fields['nxf_timestamp'] = workflow.nextflow.timestamp + misc_fields['start'] = workflow.start + misc_fields['complete'] = workflow.complete + misc_fields['scriptfile'] = workflow.scriptFile + misc_fields['scriptid'] = workflow.scriptId + if (workflow.repository) { + misc_fields['repository'] = workflow.repository + } + if (workflow.commitId) { + misc_fields['commitid'] = workflow.commitId + } + if (workflow.revision) { + misc_fields['revision'] = workflow.revision + } + misc_fields['nxf_version'] = workflow.nextflow.version + misc_fields['nxf_build'] = workflow.nextflow.build + misc_fields['nxf_timestamp'] = workflow.nextflow.timestamp def msg_fields = [:] msg_fields['version'] = getWorkflowVersion() @@ -434,13 +450,13 @@ def imNotification(summary_params, hook_url) { def json_message = json_template.toString() // POST - def post = new URL(hook_url).openConnection(); + def post = new URL(hook_url).openConnection() post.setRequestMethod("POST") post.setDoOutput(true) post.setRequestProperty("Content-Type", "application/json") - post.getOutputStream().write(json_message.getBytes("UTF-8")); - def postRC = post.getResponseCode(); - if (! postRC.equals(200)) { - log.warn(post.getErrorStream().getText()); + post.getOutputStream().write(json_message.getBytes("UTF-8")) + def postRC = post.getResponseCode() + if (!postRC.equals(200)) { + log.warn(post.getErrorStream().getText()) } } diff --git a/subworkflows/nf-core/utils_nfschema_plugin/main.nf b/subworkflows/nf-core/utils_nfschema_plugin/main.nf new file mode 100644 index 00000000..4994303e --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/main.nf @@ -0,0 +1,46 @@ +// +// Subworkflow that uses the nf-schema plugin to validate parameters and render the parameter summary +// + +include { paramsSummaryLog } from 'plugin/nf-schema' +include { validateParameters } from 'plugin/nf-schema' + +workflow UTILS_NFSCHEMA_PLUGIN { + + take: + input_workflow // workflow: the workflow object used by nf-schema to get metadata from the workflow + validate_params // boolean: validate the parameters + parameters_schema // string: path to the parameters JSON schema. + // this has to be the same as the schema given to `validation.parametersSchema` + // when this input is empty it will automatically use the configured schema or + // "${projectDir}/nextflow_schema.json" as default. This input should not be empty + // for meta pipelines + + main: + + // + // Print parameter summary to stdout. This will display the parameters + // that differ from the default given in the JSON schema + // + if(parameters_schema) { + log.info paramsSummaryLog(input_workflow, parameters_schema:parameters_schema) + } else { + log.info paramsSummaryLog(input_workflow) + } + + // + // Validate the parameters using nextflow_schema.json or the schema + // given via the validation.parametersSchema configuration option + // + if(validate_params) { + if(parameters_schema) { + validateParameters(parameters_schema:parameters_schema) + } else { + validateParameters() + } + } + + emit: + dummy_emit = true +} + diff --git a/subworkflows/nf-core/utils_nfschema_plugin/meta.yml b/subworkflows/nf-core/utils_nfschema_plugin/meta.yml new file mode 100644 index 00000000..f7d9f028 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/meta.yml @@ -0,0 +1,35 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json +name: "utils_nfschema_plugin" +description: Run nf-schema to validate parameters and create a summary of changed parameters +keywords: + - validation + - JSON schema + - plugin + - parameters + - summary +components: [] +input: + - input_workflow: + type: object + description: | + The workflow object of the used pipeline. + This object contains meta data used to create the params summary log + - validate_params: + type: boolean + description: Validate the parameters and error if invalid. + - parameters_schema: + type: string + description: | + Path to the parameters JSON schema. + This has to be the same as the schema given to the `validation.parametersSchema` config + option. When this input is empty it will automatically use the configured schema or + "${projectDir}/nextflow_schema.json" as default. The schema should not be given in this way + for meta pipelines. +output: + - dummy_emit: + type: boolean + description: Dummy emit to make nf-core subworkflows lint happy +authors: + - "@nvnieuwk" +maintainers: + - "@nvnieuwk" diff --git a/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test b/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test new file mode 100644 index 00000000..842dc432 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test @@ -0,0 +1,117 @@ +nextflow_workflow { + + name "Test Subworkflow UTILS_NFSCHEMA_PLUGIN" + script "../main.nf" + workflow "UTILS_NFSCHEMA_PLUGIN" + + tag "subworkflows" + tag "subworkflows_nfcore" + tag "subworkflows/utils_nfschema_plugin" + tag "plugin/nf-schema" + + config "./nextflow.config" + + test("Should run nothing") { + + when { + + params { + test_data = '' + } + + workflow { + """ + validate_params = false + input[0] = workflow + input[1] = validate_params + input[2] = "" + """ + } + } + + then { + assertAll( + { assert workflow.success } + ) + } + } + + test("Should validate params") { + + when { + + params { + test_data = '' + outdir = 1 + } + + workflow { + """ + validate_params = true + input[0] = workflow + input[1] = validate_params + input[2] = "" + """ + } + } + + then { + assertAll( + { assert workflow.failed }, + { assert workflow.stdout.any { it.contains('ERROR ~ Validation of pipeline parameters failed!') } } + ) + } + } + + test("Should run nothing - custom schema") { + + when { + + params { + test_data = '' + } + + workflow { + """ + validate_params = false + input[0] = workflow + input[1] = validate_params + input[2] = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + """ + } + } + + then { + assertAll( + { assert workflow.success } + ) + } + } + + test("Should validate params - custom schema") { + + when { + + params { + test_data = '' + outdir = 1 + } + + workflow { + """ + validate_params = true + input[0] = workflow + input[1] = validate_params + input[2] = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + """ + } + } + + then { + assertAll( + { assert workflow.failed }, + { assert workflow.stdout.any { it.contains('ERROR ~ Validation of pipeline parameters failed!') } } + ) + } + } +} diff --git a/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config new file mode 100644 index 00000000..0907ac58 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config @@ -0,0 +1,8 @@ +plugins { + id "nf-schema@2.1.0" +} + +validation { + parametersSchema = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + monochromeLogs = true +} \ No newline at end of file diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json similarity index 95% rename from subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json rename to subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json index 7626c1c9..331e0d2f 100644 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json @@ -1,10 +1,10 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/./master/nextflow_schema.json", "title": ". pipeline parameters", "description": "", "type": "object", - "definitions": { + "$defs": { "input_output_options": { "title": "Input/output options", "type": "object", @@ -87,10 +87,10 @@ }, "allOf": [ { - "$ref": "#/definitions/input_output_options" + "$ref": "#/$defs/input_output_options" }, { - "$ref": "#/definitions/generic_options" + "$ref": "#/$defs/generic_options" } ] } diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf b/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf deleted file mode 100644 index 2585b65d..00000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf +++ /dev/null @@ -1,62 +0,0 @@ -// -// Subworkflow that uses the nf-validation plugin to render help text and parameter summary -// - -/* -======================================================================================== - IMPORT NF-VALIDATION PLUGIN -======================================================================================== -*/ - -include { paramsHelp } from 'plugin/nf-validation' -include { paramsSummaryLog } from 'plugin/nf-validation' -include { validateParameters } from 'plugin/nf-validation' - -/* -======================================================================================== - SUBWORKFLOW DEFINITION -======================================================================================== -*/ - -workflow UTILS_NFVALIDATION_PLUGIN { - - take: - print_help // boolean: print help - workflow_command // string: default commmand used to run pipeline - pre_help_text // string: string to be printed before help text and summary log - post_help_text // string: string to be printed after help text and summary log - validate_params // boolean: validate parameters - schema_filename // path: JSON schema file, null to use default value - - main: - - log.debug "Using schema file: ${schema_filename}" - - // Default values for strings - pre_help_text = pre_help_text ?: '' - post_help_text = post_help_text ?: '' - workflow_command = workflow_command ?: '' - - // - // Print help message if needed - // - if (print_help) { - log.info pre_help_text + paramsHelp(workflow_command, parameters_schema: schema_filename) + post_help_text - System.exit(0) - } - - // - // Print parameter summary to stdout - // - log.info pre_help_text + paramsSummaryLog(workflow, parameters_schema: schema_filename) + post_help_text - - // - // Validate parameters relative to the parameter JSON schema - // - if (validate_params){ - validateParameters(parameters_schema: schema_filename) - } - - emit: - dummy_emit = true -} diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml b/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml deleted file mode 100644 index 3d4a6b04..00000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml +++ /dev/null @@ -1,44 +0,0 @@ -# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json -name: "UTILS_NFVALIDATION_PLUGIN" -description: Use nf-validation to initiate and validate a pipeline -keywords: - - utility - - pipeline - - initialise - - validation -components: [] -input: - - print_help: - type: boolean - description: | - Print help message and exit - - workflow_command: - type: string - description: | - The command to run the workflow e.g. "nextflow run main.nf" - - pre_help_text: - type: string - description: | - Text to print before the help message - - post_help_text: - type: string - description: | - Text to print after the help message - - validate_params: - type: boolean - description: | - Validate the parameters and error if invalid. - - schema_filename: - type: string - description: | - The filename of the schema to validate against. -output: - - dummy_emit: - type: boolean - description: | - Dummy emit to make nf-core subworkflows lint happy -authors: - - "@adamrtalbot" -maintainers: - - "@adamrtalbot" - - "@maxulysse" diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test b/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test deleted file mode 100644 index 5784a33f..00000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test +++ /dev/null @@ -1,200 +0,0 @@ -nextflow_workflow { - - name "Test Workflow UTILS_NFVALIDATION_PLUGIN" - script "../main.nf" - workflow "UTILS_NFVALIDATION_PLUGIN" - tag "subworkflows" - tag "subworkflows_nfcore" - tag "plugin/nf-validation" - tag "'plugin/nf-validation'" - tag "utils_nfvalidation_plugin" - tag "subworkflows/utils_nfvalidation_plugin" - - test("Should run nothing") { - - when { - - params { - monochrome_logs = true - test_data = '' - } - - workflow { - """ - help = false - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success } - ) - } - } - - test("Should run help") { - - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } } - ) - } - } - - test("Should run help with command") { - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = "nextflow run noorg/doesntexist" - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('nextflow run noorg/doesntexist') } }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } } - ) - } - } - - test("Should run help with extra text") { - - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = "nextflow run noorg/doesntexist" - pre_help_text = "pre-help-text" - post_help_text = "post-help-text" - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('pre-help-text') } }, - { assert workflow.stdout.any { it.contains('nextflow run noorg/doesntexist') } }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } }, - { assert workflow.stdout.any { it.contains('post-help-text') } } - ) - } - } - - test("Should validate params") { - - when { - - params { - monochrome_logs = true - test_data = '' - outdir = 1 - } - workflow { - """ - help = false - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = true - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.failed }, - { assert workflow.stdout.any { it.contains('ERROR ~ ERROR: Validation of pipeline parameters failed!') } } - ) - } - } -} diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml b/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml deleted file mode 100644 index 60b1cfff..00000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml +++ /dev/null @@ -1,2 +0,0 @@ -subworkflows/utils_nfvalidation_plugin: - - subworkflows/nf-core/utils_nfvalidation_plugin/** diff --git a/tower.yml b/tower.yml index 787aedfe..b479a247 100644 --- a/tower.yml +++ b/tower.yml @@ -1,5 +1,23 @@ reports: - multiqc_report.html: - display: "MultiQC HTML report" + esmfold_multiqc_report.html: + display: "ESMFOLD - MultiQC HTML report" + alphafold2_multiqc_report.html: + display: "ALPHAFOLD2 - MultiQC HTML report" + colabfold_multiqc_report.html: + display: "COLABFOLD - MultiQC HTML report" samplesheet.csv: display: "Auto-created samplesheet with collated metadata and FASTQ paths" + "*_alphafold2_report.html": + display: "ALPHAFOLD2 - Predicted structures" + "*_esmfold_report.html": + display: "ESMFOLD - Predicted structures" + "*_colabfold_report.html": + display: "COLABFOLD - Predicted structures" + "*_colabfold_foldseek.html": + display: "COLABFOLD - Foldseek output" + "*_alphafold2_foldseek.html": + display: "ALPHAFOLD2 - Foldseek output" + "*_esmfold_foldseek.html": + display: "ESMFOLD - Foldseek output" + "*_comparison_report.html": + display: "Structure comparison" diff --git a/workflows/alphafold2.nf b/workflows/alphafold2.nf index 9a1aebae..2a753e63 100644 --- a/workflows/alphafold2.nf +++ b/workflows/alphafold2.nf @@ -25,8 +25,7 @@ include { MULTIQC } from '../modules/nf-core/multiqc/main' // // SUBWORKFLOW: Consisting entirely of nf-core/modules // -include { paramsSummaryMap } from 'plugin/nf-validation' -include { fromSamplesheet } from 'plugin/nf-validation' +include { paramsSummaryMap } from 'plugin/nf-schema' include { paramsSummaryMultiqc } from '../subworkflows/nf-core/utils_nfcore_pipeline' include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline' include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_proteinfold_pipeline' @@ -40,6 +39,7 @@ include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_prot workflow ALPHAFOLD2 { take: + ch_samplesheet // channel: samplesheet read in from --input ch_versions // channel: [ path(versions.yml) ] full_dbs // boolean: Use full databases (otherwise reduced version) alphafold2_mode // string: Mode to run Alphafold2 in @@ -57,22 +57,18 @@ workflow ALPHAFOLD2 { main: ch_multiqc_files = Channel.empty() - - // - // Create input channel from input file provided through params.input - // - Channel - .fromSamplesheet("input") - .set { ch_fasta } + ch_pdb = Channel.empty() + ch_main_pdb = Channel.empty() + ch_msa = Channel.empty() if (alphafold2_model_preset != 'multimer') { - ch_fasta + ch_samplesheet .map { meta, fasta -> [ meta, fasta.splitFasta(file:true) ] } .transpose() - .set { ch_fasta } + .set { ch_samplesheet } } if (alphafold2_mode == 'standard') { @@ -80,7 +76,7 @@ workflow ALPHAFOLD2 { // SUBWORKFLOW: Run Alphafold2 standard mode // RUN_ALPHAFOLD2 ( - ch_fasta, + ch_samplesheet, full_dbs, alphafold2_model_preset, ch_alphafold2_params, @@ -94,7 +90,10 @@ workflow ALPHAFOLD2 { ch_pdb_seqres, ch_uniprot ) - ch_multiqc_rep = RUN_ALPHAFOLD2.out.multiqc.collect() + ch_pdb = ch_pdb.mix(RUN_ALPHAFOLD2.out.pdb) + ch_main_pdb = ch_main_pdb.mix(RUN_ALPHAFOLD2.out.main_pdb) + ch_msa = ch_msa.mix(RUN_ALPHAFOLD2.out.msa) + ch_multiqc_rep = RUN_ALPHAFOLD2.out.multiqc.map{it[1]}.collect() ch_versions = ch_versions.mix(RUN_ALPHAFOLD2.out.versions) } else if (alphafold2_mode == 'split_msa_prediction') { @@ -102,7 +101,7 @@ workflow ALPHAFOLD2 { // SUBWORKFLOW: Run Alphafold2 split mode, MSA and predicition // RUN_ALPHAFOLD2_MSA ( - ch_fasta, + ch_samplesheet, full_dbs, alphafold2_model_preset, ch_alphafold2_params, @@ -119,7 +118,7 @@ workflow ALPHAFOLD2 { ch_versions = ch_versions.mix(RUN_ALPHAFOLD2_MSA.out.versions) RUN_ALPHAFOLD2_PRED ( - ch_fasta, + ch_samplesheet, full_dbs, alphafold2_model_preset, ch_alphafold2_params, @@ -134,7 +133,10 @@ workflow ALPHAFOLD2 { ch_uniprot, RUN_ALPHAFOLD2_MSA.out.features ) - ch_multiqc_rep = RUN_ALPHAFOLD2_PRED.out.multiqc.collect() + ch_pdb = ch_pdb.mix(RUN_ALPHAFOLD2_PRED.out.pdb) + ch_main_pdb = ch_main_pdb.mix(RUN_ALPHAFOLD2_PRED.out.main_pdb) + ch_msa = ch_msa.mix(RUN_ALPHAFOLD2_PRED.out.msa) + ch_multiqc_rep = RUN_ALPHAFOLD2_PRED.out.multiqc.map{it[1]}.collect() ch_versions = ch_versions.mix(RUN_ALPHAFOLD2_PRED.out.versions) } @@ -142,9 +144,8 @@ workflow ALPHAFOLD2 { // Collate and save software versions // softwareVersionsToYAML(ch_versions) - .collectFile(storeDir: "${params.outdir}/pipeline_info", name: 'nf_core_proteinfold_software_mqc_versions.yml', sort: true, newLine: true) + .collectFile(storeDir: "${params.outdir}/pipeline_info", name: 'nf_core_proteinfold_software_mqc_alphafold2_versions.yml', sort: true, newLine: true) .set { ch_collated_versions } - // // MODULE: MultiQC // @@ -169,12 +170,17 @@ workflow ALPHAFOLD2 { ch_multiqc_files.collect(), ch_multiqc_config.toList(), ch_multiqc_custom_config.toList(), - ch_multiqc_logo.toList() + ch_multiqc_logo.toList(), + [], + [] ) ch_multiqc_report = MULTIQC.out.report.toList() } emit: + main_pdb = ch_main_pdb // channel: /path/to/*.pdb + pdb = ch_pdb // channel: /path/to/*.pdb + msa = ch_msa // channel: /path/to/*msa.tsv multiqc_report = ch_multiqc_report // channel: /path/to/multiqc_report.html versions = ch_versions // channel: [ path(versions.yml) ] } diff --git a/workflows/colabfold.nf b/workflows/colabfold.nf index eafc222c..c2a4f2b5 100644 --- a/workflows/colabfold.nf +++ b/workflows/colabfold.nf @@ -25,8 +25,7 @@ include { MULTIQC } from '../modules/nf-core/multiqc/main' // // SUBWORKFLOW: Consisting entirely of nf-core/modules // -include { paramsSummaryMap } from 'plugin/nf-validation' -include { fromSamplesheet } from 'plugin/nf-validation' +include { paramsSummaryMap } from 'plugin/nf-schema' include { paramsSummaryMultiqc } from '../subworkflows/nf-core/utils_nfcore_pipeline' include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline' include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_proteinfold_pipeline' @@ -40,6 +39,7 @@ include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_prot workflow COLABFOLD { take: + ch_samplesheet // channel: samplesheet read in from --input ch_versions // channel: [ path(versions.yml) ] colabfold_model_preset // string: Specifies the model preset to use for colabfold ch_colabfold_params // channel: path(colabfold_params) @@ -50,20 +50,13 @@ workflow COLABFOLD { main: ch_multiqc_files = Channel.empty() - // - // Create input channel from input file provided through params.input - // - Channel - .fromSamplesheet("input") - .set { ch_fasta } - if (params.colabfold_server == 'webserver') { // // MODULE: Run colabfold // if (params.colabfold_model_preset != 'alphafold2_ptm' && params.colabfold_model_preset != 'alphafold2') { MULTIFASTA_TO_CSV( - ch_fasta + ch_samplesheet ) ch_versions = ch_versions.mix(MULTIFASTA_TO_CSV.out.versions) COLABFOLD_BATCH( @@ -77,7 +70,7 @@ workflow COLABFOLD { ch_versions = ch_versions.mix(COLABFOLD_BATCH.out.versions) } else { COLABFOLD_BATCH( - ch_fasta, + ch_samplesheet, colabfold_model_preset, ch_colabfold_params, [], @@ -93,7 +86,7 @@ workflow COLABFOLD { // if (params.colabfold_model_preset != 'alphafold2_ptm' && params.colabfold_model_preset != 'alphafold2') { MULTIFASTA_TO_CSV( - ch_fasta + ch_samplesheet ) ch_versions = ch_versions.mix(MULTIFASTA_TO_CSV.out.versions) MMSEQS_COLABFOLDSEARCH ( @@ -105,7 +98,7 @@ workflow COLABFOLD { ch_versions = ch_versions.mix(MMSEQS_COLABFOLDSEARCH.out.versions) } else { MMSEQS_COLABFOLDSEARCH ( - ch_fasta, + ch_samplesheet, ch_colabfold_params, ch_colabfold_db, ch_uniref30 @@ -131,7 +124,7 @@ workflow COLABFOLD { // Collate and save software versions // softwareVersionsToYAML(ch_versions) - .collectFile(storeDir: "${params.outdir}/pipeline_info", name: 'nf_core_proteinfold_software_mqc_versions.yml', sort: true, newLine: true) + .collectFile(storeDir: "${params.outdir}/pipeline_info", name: 'nf_core_proteinfold_software_mqc_colabfold_versions.yml', sort: true, newLine: true) .set { ch_collated_versions } // @@ -152,18 +145,23 @@ workflow COLABFOLD { ch_multiqc_files = ch_multiqc_files.mix(ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')) ch_multiqc_files = ch_multiqc_files.mix(ch_methods_description.collectFile(name: 'methods_description_mqc.yaml')) ch_multiqc_files = ch_multiqc_files.mix(ch_collated_versions) - ch_multiqc_files = ch_multiqc_files.mix(COLABFOLD_BATCH.out.multiqc.collect()) + ch_multiqc_files = ch_multiqc_files.mix(COLABFOLD_BATCH.out.multiqc.map{it[1]}.collect()) MULTIQC ( ch_multiqc_files.collect(), ch_multiqc_config.toList(), ch_multiqc_custom_config.toList(), - ch_multiqc_logo.toList() + ch_multiqc_logo.toList(), + [], + [] ) ch_multiqc_report = MULTIQC.out.report.toList() } emit: + pdb = COLABFOLD_BATCH.out.pdb // channel: /path/to/*.pdb + main_pdb = COLABFOLD_BATCH.out.main_pdb // channel: /path/to/*.pdb + msa = COLABFOLD_BATCH.out.msa // channel: /path/to/*_coverage.png multiqc_report = ch_multiqc_report // channel: /path/to/multiqc_report.html versions = ch_versions // channel: [ path(versions.yml) ] } diff --git a/workflows/esmfold.nf b/workflows/esmfold.nf index 962c01a1..ab7e5a81 100644 --- a/workflows/esmfold.nf +++ b/workflows/esmfold.nf @@ -24,8 +24,7 @@ include { MULTIQC } from '../modules/nf-core/multiqc/main' // // SUBWORKFLOW: Consisting entirely of nf-core/modules // -include { paramsSummaryMap } from 'plugin/nf-validation' -include { fromSamplesheet } from 'plugin/nf-validation' +include { paramsSummaryMap } from 'plugin/nf-schema' include { paramsSummaryMultiqc } from '../subworkflows/nf-core/utils_nfcore_pipeline' include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline' include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_proteinfold_pipeline' @@ -39,6 +38,7 @@ include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_prot workflow ESMFOLD { take: + ch_samplesheet // channel: samplesheet read in from --input ch_versions // channel: [ path(versions.yml) ] ch_esmfold_params // directory: /path/to/esmfold/params/ ch_num_recycles // int: Number of recycles for esmfold @@ -46,19 +46,12 @@ workflow ESMFOLD { main: ch_multiqc_files = Channel.empty() - // - // Create input channel from input file provided through params.input - // - Channel - .fromSamplesheet("input") - .set { ch_fasta } - // // MODULE: Run esmfold // if (params.esmfold_model_preset != 'monomer') { MULTIFASTA_TO_SINGLEFASTA( - ch_fasta + ch_samplesheet ) ch_versions = ch_versions.mix(MULTIFASTA_TO_SINGLEFASTA.out.versions) RUN_ESMFOLD( @@ -69,7 +62,7 @@ workflow ESMFOLD { ch_versions = ch_versions.mix(RUN_ESMFOLD.out.versions) } else { RUN_ESMFOLD( - ch_fasta, + ch_samplesheet, ch_esmfold_params, ch_num_recycles ) @@ -80,7 +73,7 @@ workflow ESMFOLD { // Collate and save software versions // softwareVersionsToYAML(ch_versions) - .collectFile(storeDir: "${params.outdir}/pipeline_info", name: 'nf_core_proteinfold_software_mqc_versions.yml', sort: true, newLine: true) + .collectFile(storeDir: "${params.outdir}/pipeline_info", name: 'nf_core_proteinfold_software_mqc_esmfold_versions.yml', sort: true, newLine: true) .set { ch_collated_versions } // @@ -101,20 +94,23 @@ workflow ESMFOLD { ch_multiqc_files = ch_multiqc_files.mix(ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')) ch_multiqc_files = ch_multiqc_files.mix(ch_methods_description.collectFile(name: 'methods_description_mqc.yaml')) ch_multiqc_files = ch_multiqc_files.mix(ch_collated_versions) - ch_multiqc_files = ch_multiqc_files.mix(RUN_ESMFOLD.out.multiqc.collect()) + ch_multiqc_files = ch_multiqc_files.mix(RUN_ESMFOLD.out.multiqc.map{it[1]}.collect()) MULTIQC ( ch_multiqc_files.collect(), ch_multiqc_config.toList(), ch_multiqc_custom_config.toList(), - ch_multiqc_logo.toList() + ch_multiqc_logo.toList(), + [], + [] ) ch_multiqc_report = MULTIQC.out.report.toList() } emit: - multiqc_report = ch_multiqc_report // channel: /path/to/multiqc_report.html - versions = ch_versions // channel: [ path(versions.yml) ] + pdb = RUN_ESMFOLD.out.pdb // channel: /path/to/*pdb + multiqc_report = ch_multiqc_report // channel: /path/to/multiqc_report.html + versions = ch_versions // channel: [ path(versions.yml) ] } /*