Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix excluding events from the report #332

Merged
merged 4 commits into from
Oct 23, 2024

Conversation

huzuner
Copy link
Contributor

@huzuner huzuner commented Oct 17, 2024

Before excluding events from the report did not work as it was thought. All the events that had False for report generation in the config still had datavzrd reports, because get_calling_events(calling_type) always returned all events. This fix ensures that only True events have datavzrd reports.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Improved output file path generation for fusions and variants.
    • Enhanced handling of group samples and aliases with new helper functions.
    • Streamlined logic for generating reports and managing mutational burden targets.
    • New configuration options for reporting within false discovery rate control.
  • Bug Fixes

    • Corrected event name retrieval in output paths.
    • Fixed indentation issues in configuration files.
  • Refactor

    • Simplified and clarified the output generation logic for better maintainability.

Copy link

coderabbitai bot commented Oct 17, 2024

Walkthrough

The changes made in the workflow/rules/common.smk file involve substantial modifications to the logic for generating output file paths and managing sample data. The get_final_output function has been refactored to simplify event name retrieval by transitioning from dynamic function calls to direct variable references. Additionally, the handling of calling_types and groups has been enhanced with new helper functions for retrieving group samples and aliases. The .test/config-simple/config.yaml file has also been updated to improve configurability for reporting and variant calling. While the overall structure remains intact, these modifications improve clarity and maintainability in the output generation process.

Changes

File Path Change Summary
workflow/rules/common.smk Refactored get_final_output for simplified event name retrieval; updated logic for output paths. Introduced new helper functions for managing group samples and report batches. Streamlined handling of calling_types and output file inclusion based on configuration flags.
.test/config-simple/config.yaml Added report: false under fdr-control.events.present; corrected indentation for local; added description for variants with moderate impact in the calling section.

Possibly related PRs

  • fix: processing vembrane config #330: The changes in this PR involve modifications to the workflow/rules/common.smk file, specifically enhancing the get_report_batches function, which is directly related to the refactoring of the get_final_output function in the main PR. Both PRs focus on improving output generation logic and handling of report batches.

Suggested reviewers

  • dlaehnemann

Poem

🐇 In the meadow where outputs gleam,
A rabbit hops, chasing a dream.
With paths refined and helpers new,
Clarity blooms in the morning dew.
Let's celebrate this code so bright,
Hooray for changes that bring delight! 🌼


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between e6518e3 and 0b43367.

📒 Files selected for processing (1)
  • .test/config-simple/config.yaml (1 hunks)
🧰 Additional context used
🔇 Additional comments (2)
.test/config-simple/config.yaml (2)

Line range hint 1-146: Overall configuration appears consistent and well-structured.

The changes made to the calling.fdr-control.events.present section are focused and align well with the PR objectives. The rest of the configuration file remains unchanged and consistent with the workflow's requirements. No additional issues or inconsistencies were identified.

🧰 Tools
🪛 yamllint

[error] 51-51: trailing spaces

(trailing-spaces)


[error] 54-54: trailing spaces

(trailing-spaces)


Line range hint 53-58: Changes align with PR objectives and improve configuration clarity.

The modifications in this section address the PR's main objective and enhance the configuration's clarity:

  1. The addition of report: false aligns with the fix for excluding events from the report, as mentioned in the PR description.
  2. Changing filter from candidates to moderate narrows down the filtering criteria, which may affect the variants included in this event.
  3. The new desc key provides a clear description of the event's purpose.

These changes are consistent with the modifications in the workflow/rules/common.smk file mentioned in the AI-generated summary.

To ensure these changes are properly reflected in the workflow, please run the following script:

This script will help verify that the workflow correctly utilizes these new configuration options.

🧰 Tools
🪛 yamllint

[error] 51-51: trailing spaces

(trailing-spaces)


[error] 54-54: trailing spaces

(trailing-spaces)


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (7)
workflow/rules/common.smk (7)

Line range hint 107-111: Clarify the Extraction of 'calling_types'

The line where calling_types is defined:

calling_types = samples["calling"].str.split(",").explode().unique().tolist()

This line flattens the list of calling types across all samples. While functional, consider adding a comment to explain the purpose for better readability.


Line range hint 620-622: Compatibility of 'match' Statement with Python Version

The match statement used in get_map_reads_sorting_params requires Python 3.10 or higher. Ensure that the execution environment supports Python 3.10. If not, refactor the code using if-else statements for compatibility.

Refactored code using if-else statements:

-def get_map_reads_sorting_params(wildcards, ordering=False):
-    match (sample_has_umis(wildcards.sample), ordering):
-        case (True, True):
-            return "queryname"
-        case (True, False):
-            return "fgbio"
-        case (False, True):
-            return "coordinate"
-        case (False, False):
-            return "samtools"
+def get_map_reads_sorting_params(wildcards, ordering=False):
+    has_umis = sample_has_umis(wildcards.sample)
+    if has_umis and ordering:
+        return "queryname"
+    elif has_umis and not ordering:
+        return "fgbio"
+    elif not has_umis and ordering:
+        return "coordinate"
+    else:
+        return "samtools"

Line range hint 729-731: Simplify Configuration Lookups with 'dict.get'

In places where lookup is used to access configuration values with a default, consider using the standard dict.get method for simplicity unless lookup provides additional functionality.

Example:

- if lookup(dpath="maf/activate", within=config, default=False):
+ if config.get("maf", {}).get("activate", False):

Line range hint 243-245: Correct Syntax Error in 'wildcard_constraints' Definition

The definition of wildcard_constraints is missing a colon.

Apply this diff to fix the syntax error:

-wildcard_constraints
+wildcard_constraints:

Line range hint 512-514: Handle Missing 'datasources' in Configuration

In the function get_dgidb_datasources, if the key 'datasources' is not present in config["annotations"]["dgidb"], it returns an empty string.

def get_dgidb_datasources():
    if config["annotations"]["dgidb"].get("datasources", ""):
        return "-s {}".format(" ".join(config["annotations"]["dgidb"]["datasources"]))
    return ""

Consider specifying a default value or handling the absence more explicitly to avoid potential KeyErrors.


Line range hint 223-225: Ensure 'primer_panels' DataFrame is Properly Initialized

The primer_panels variable is conditionally initialized based on the presence of a configuration key.

primer_panels = (
    (
        pd.read_csv(
            config["primers"]["trimming"]["tsv"],
            sep="\t",
            dtype={"panel": str, "fa1": str, "fa2": str},
            comment="#",
        )
        .set_index(["panel"], drop=False)
        .sort_index()
    )
    if config["primers"]["trimming"].get("tsv", "")
    else None
)

Ensure that the configuration provides the expected tsv path, and handle cases where primer_panels might be None to prevent AttributeError when accessed later.


Line range hint 1079-1083: Ensure Compatibility of Regular Expressions

In get_fastqc_results, the regular expression used may need verification to ensure it matches the intended files.

valid = re.compile(r"^[^/]+\.tsv$")

Confirm that the pattern correctly matches the filenames you expect to process.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 1629c6d and b92dbc9.

📒 Files selected for processing (1)
  • workflow/rules/common.smk (1 hunks)
🧰 Additional context used
🔇 Additional comments (4)
workflow/rules/common.smk (4)

Line range hint 411-413: Potential Unhandled Case in 'get_trimming_input' Function

In get_trimming_input, if neither calc_consensus_reads nor remove_duplicates is activated, the function defaults to returning a path based on the aligner.

def get_trimming_input(wildcards):
    if is_activated("remove_duplicates"):
        return "results/dedup/{sample}.bam"
    else:
        aligner = "star" if get_sample_datatype(wildcards.sample) == "rna" else "bwa"
        return "results/mapped/{aligner}/{{sample}}.bam".format(aligner=aligner)

Consider whether handling is needed for cases when neither condition is met, especially if primer trimming requires input from a different step.


Line range hint 696-706: Validation of 'variant_caller' Configuration

The variant_caller variable is created based on the activation of certain callers and whether samples contain 'variants' in their calling configuration.

variant_caller = list(
    filter(
        None,
        [
            (
                "freebayes"
                if is_activated("calling/freebayes")
                and samples["calling"].str.contains("variants").any()
                else None
            ),
            (
                "delly"
                if is_activated("calling/delly")
                and samples["calling"].str.contains("variants").any()
                else None
            ),
        ],
    )
)

Ensure that at least one variant caller is activated; otherwise, variant_caller will be an empty list, potentially causing issues in downstream processes.

Execute the following script to check the activation status:

#!/bin/bash
# Verify that at least one variant caller is activated
grep 'activate' config.yaml | grep -E 'calling/freebayes|calling/delly'

Line range hint 290-292: Ensure Correct Retrieval of Calling Events

The get_calling_events function retrieves events based on the calling_type. Verify that this function returns the expected list of events and that it aligns with the current configuration in config["calling"]["fdr-control"]["events"].

Run the following script to list events for each calling type:


Line range hint 860-862: Check Division Operations for Zero Division Errors

Any division operations in the code should have checks to prevent division by zero.

Run the following script to find division operations:

else fusions_groups
),
event=get_calling_events(calling_type),
event=event,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential Issue with Event Parameter Usage in 'expand' Function

In the get_final_output function, the event parameter in the expand call has been changed to use the variable event directly. Ensure that event is correctly defined in this context and that it provides the expected values. Previously, a function call like get_calling_events(calling_type) might have provided a list of events, whereas event here may be a single value from the loop. This could affect the expanded outputs if not handled properly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
workflow/rules/common.smk (1)

Line range hint 137-224: Consider refactoring for improved readability and maintainability

The get_final_output function is quite complex and handles multiple responsibilities. Consider the following suggestions to improve its structure and readability:

  1. Extract the logic for generating different types of outputs (e.g., reports, final calls, MAF files) into separate helper functions.
  2. Use list comprehensions or generator expressions instead of nested loops where possible to make the code more concise.
  3. Consider using a configuration-driven approach to determine which outputs to generate, reducing the need for multiple conditionals.

Example refactoring (partial):

def get_report_outputs(config, groups):
    if config["report"]["activate"]:
        return [
            f"results/datavzrd-report/{group}.coverage"
            for group in groups
        ]
    return []

def get_final_call_outputs(config, groups, events, calling_types):
    if not config["report"]["activate"]:
        return [
            f"results/final-calls/{group}.{event}.{calling_type}.fdr-controlled.bcf"
            for group in groups
            for event in events
            for calling_type in calling_types
        ]
    return []

def get_final_output(wildcards):
    final_output = [
        f"results/qc/multiqc/{group}.html"
        for group in groups
    ]
    
    final_output.extend(get_report_outputs(config, groups))
    final_output.extend(get_final_call_outputs(config, groups, events, calling_types))
    # ... other output types ...
    
    return final_output

This refactoring would make the main function more readable and easier to maintain, with each type of output handled by a separate, focused function.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b92dbc9 and b530bb4.

📒 Files selected for processing (1)
  • workflow/rules/common.smk (2 hunks)
🧰 Additional context used
🔇 Additional comments (3)
workflow/rules/common.smk (3)

166-166: LGTM: Consistent with previous change

This change is consistent with the modification on line 153, directly using the event variable. The previous comment and verification steps apply to this line as well.


Line range hint 1-1000: Summary of changes and suggestions

The changes in this file primarily involve simplifying the usage of the event variable in the get_final_output function. While these changes appear to be improvements, it's crucial to verify that the event variable is correctly defined and provides the expected values in all contexts.

Key points from this review:

  1. Verify the correct usage and scope of the event variable.
  2. Consider refactoring the get_final_output function for improved readability and maintainability.
  3. The overall structure and logic of the function remain intact, with only minor modifications to variable usage.

These changes should improve code simplicity without altering the core functionality. However, thorough testing is recommended to ensure that all output paths are still correctly generated after these modifications.


153-153: Verify the correctness of event variable usage

The change from a function call to directly using the event variable simplifies the code. However, we need to ensure that event is correctly defined and provides the expected values in this context.

To confirm the correct usage of the event variable, please run the following script:

This will help us confirm that the event variable is properly defined and matches the expected events from the configuration.

✅ Verification successful

event Variable Usage Verified Successfully

The event variable is correctly defined and utilized within the get_final_output function. The events in config/config.yaml align with the expected values, ensuring the simplified code functions as intended.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of 'event' variable in get_final_output function

# Test: Check if 'event' is defined within the function scope
rg -A 10 -B 10 'def get_final_output\(' workflow/rules/common.smk

# Test: Verify the events defined in the config
grep -A 5 'events:' config/config.yaml

Length of output: 1338

Copy link
Contributor

@dlaehnemann dlaehnemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good catch!
I would suggest one small change, namely to move the get_calling_events(calling_type) to the for event in [...] loop, to keep its functionality in (the correct) place.

@@ -150,7 +150,7 @@ def get_final_output(wildcards):
expand(
"results/datavzrd-report/{batch}.{event}.{calling_type}.fdr-controlled",
batch=get_report_batches(calling_type),
event=get_calling_events(calling_type),
event=event,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove the use of this utility function here, maybe we can instead use it above, to replace this:

for event in config["calling"]["fdr-control"]["events"]:

with the function call like this:

for event in get_calling_events(calling_type):

This ensures, that the calling_type variable from the respective for-loop is also respected, fixing a potential bug.

@huzuner
Copy link
Contributor Author

huzuner commented Oct 18, 2024

Makes sense, just applied the change. I will test this on the workflow but can't do it now as the testing workflow needs a rerun of almost all jobs and I can only test this hopefully sometime next week. But if you think that's not very necessary, you can also merge it now or it can also wait.

@dlaehnemann
Copy link
Contributor

Hmm, maybe we could even add the new report: False entry to one of the events in one of the simple CI tests in the workflow? This way it would get executed and one could either run that test manually and check the resulting report for the omission of the respective event, or check the artifacts from the CI tests, if there is a way to get to them.

@huzuner
Copy link
Contributor Author

huzuner commented Oct 21, 2024

Makes sense, I just added it to one of the tests. Now we can manually check this feature on the tests but not sure how this can be achieved by checking the artifacts from the CI tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants