-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support derived predicates between static predicates and plain predicates #145
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe changes in this pull request involve enhancements to the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Predicates
participant DataFrame
User->>Predicates: Call get_predicates_df()
Predicates->>DataFrame: Retrieve static variables
DataFrame-->>Predicates: Return static variable values
Predicates->>DataFrame: Process derived predicates
DataFrame-->>User: Return updated DataFrame
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🧰 Additional context used🪛 GitHub Check: codecov/patch
🔇 Additional comments (5)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
src/aces/predicates.py (1)
702-711
: LGTM! Efficient implementation of static predicate propagation.The implementation correctly handles static variables in derived predicates by propagating their values within subject_id groups using polars' optimized window functions.
Consider adding a brief comment explaining the propagation logic:
if any(x in code.static for x in code.input_predicates): data = data.with_columns( [ + # Propagate static values by taking the first value for each subject_id pl.col(static_var) .first() .over("subject_id") # take the first value in each subject_id group and propagate it .alias(static_var) for static_var in code.static ] )
src/aces/config.py (1)
1296-1296
: Add validation for demographic keys and document assumptions.The code assumes all demographics are static predicates. Consider:
- Adding validation to ensure demographic keys are valid predicate names
- Documenting this assumption in the code
- Making the static nature of demographics configurable if needed
Example implementation:
if "expr" in p: - predicate_objs[n] = DerivedPredicateConfig(**p, static=list(final_demographics.keys())) + # Validate demographic keys + invalid_keys = [k for k in final_demographics.keys() if not re.match(r"^\w+$", k)] + if invalid_keys: + raise ValueError(f"Invalid demographic keys: {invalid_keys}") + + # All demographics are treated as static predicates + predicate_objs[n] = DerivedPredicateConfig( + **p, + # Demographics are assumed to be static (constant within same subject_id) + static=list(final_demographics.keys()) + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- src/aces/config.py (3 hunks)
- src/aces/predicates.py (2 hunks)
🔇 Additional comments (1)
src/aces/predicates.py (1)
613-668
: LGTM! Well-documented example demonstrating static predicates in derived predicates.The example effectively illustrates how to combine static and regular predicates in derived predicates, with clear test data and expected output that validates the functionality.
Codecov ReportAttention: Patch coverage is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
- .github/workflows/code-quality-main.yaml (1 hunks)
- .github/workflows/code-quality-pr.yaml (2 hunks)
- .github/workflows/tests.yml (1 hunks)
- .pre-commit-config.yaml (1 hunks)
- pyproject.toml (1 hunks)
- src/aces/predicates.py (2 hunks)
🔇 Additional comments (13)
.github/workflows/code-quality-main.yaml (4)
16-16
: LGTM: Checkout action upgrade is safe.The upgrade to
actions/checkout@v4
is a stable release that includes improved performance and better support for newer Git features.
23-25
: LGTM: Development dependencies installation step is appropriate.Adding
pip install .[dev]
ensures all development dependencies are available for code quality checks.Let's verify the dev dependencies are properly defined:
#!/bin/bash # Description: Check if dev dependencies are defined in project configuration # Look for [dev] or dev-dependencies section in project config files rg -A 5 '\[dev\]|\[tool\.poetry\.dev-dependencies\]' pyproject.toml setup.cfg
28-28
: LGTM: Minor version upgrade for pre-commit action.The upgrade to
pre-commit/[email protected]
is a patch release that maintains compatibility.Let's verify pre-commit configuration compatibility:
#!/bin/bash # Description: Check pre-commit configuration for any version constraints # Look for version specifications in pre-commit config cat .pre-commit-config.yaml
18-21
: LGTM: Python setup configuration is well-defined.The upgrade to
actions/setup-python@v5
with explicit Python version 3.10 follows best practices for reproducible workflows.Let's verify Python version compatibility with the project:
✅ Verification successful
Python 3.10 is compatible with the project setup
The verification confirms that Python 3.10 is appropriate for this project based on:
- The pre-commit configuration explicitly uses
--py310-plus
in pyupgrade hook, indicating Python 3.10 as the minimum version- No conflicting version requirements found in requirements.txt or other configuration files
- The workflow's Python 3.10 setup aligns with the project's development tooling
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check if Python 3.10 is within the supported versions range # Look for Python version specifications in pyproject.toml or setup.cfg rg -l 'python_requires|python_version' pyproject.toml setup.cfgLength of output: 3517
Script:
#!/bin/bash # Check for Python version requirements in multiple potential locations echo "=== Checking requirements.txt ===" fd requirements.txt --exec cat {} echo -e "\n=== Checking setup.py ===" fd setup.py --exec cat {} echo -e "\n=== Checking Python files for version checks ===" rg "python_version|sys\.version|requires" --type py -C 2 echo -e "\n=== Checking pyupgrade configuration ===" rg "py\d+" .pre-commit-config.yamlLength of output: 2033
.github/workflows/tests.yml (2)
20-20
: LGTM! Good updates to the GitHub Actions.The upgrades to
actions/checkout@v4
andactions/setup-python@v5
bring performance improvements and security enhancements to the workflow.Also applies to: 23-23
29-29
: Verify if removing editable install mode is intentional.Removing the
-e
flag changes from an editable install to a regular package install. While this is generally fine for CI environments, please verify this won't impact any development workflows that might depend on immediate reflection of code changes..github/workflows/code-quality-pr.yaml (3)
19-19
: LGTM! Good practice to use latest stable checkout action.The upgrade to
actions/checkout@v4
is a good security practice and aligns with other workflow files.
26-28
: LGTM! Standard practice for installing development dependencies.Using
.[dev]
is the recommended way to install development dependencies.
40-42
: Verify impact of running pre-commit on all files.Changing from checking only modified files to checking all files (
--all-files
) will provide more thorough validation but may increase PR check duration. This could impact developer productivity if the codebase is large.✅ Verification successful
Running pre-commit on all files is acceptable given the repository size and hooks
The repository has only 82 files in total, and while there are numerous pre-commit hooks (24 in total), most of them are lightweight checks like trailing whitespace, YAML validation, and code formatting. The hooks are focused on Python code quality, documentation, and general file hygiene. Given the small codebase size, running these checks on all files should not significantly impact PR check duration.
Key observations:
- Small repository size (82 files)
- Standard code quality hooks that are typically fast to execute
- No heavy computational hooks (e.g., no test runners or complex static analyzers)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check the potential impact of running pre-commit on all files # Count total number of files that would be checked echo "Number of files that would be checked:" fd -t f | wc -l # List all pre-commit hooks to understand the scope if [ -f .pre-commit-config.yaml ]; then echo -e "\nPre-commit hooks that would run:" grep "id:" .pre-commit-config.yaml fiLength of output: 878
pyproject.toml (1)
40-40
: LGTM! Version constraint helps maintain build stability.The addition of
<4
constraint for pre-commit is a good practice to prevent potential breaking changes from future major versions.Let's verify the compatibility with the current pre-commit setup:
✅ Verification successful
Version constraint is appropriate and compatible with current setup
The pre-commit configuration uses hooks from pre-commit-hooks v5.0.0 and other dependencies that are compatible with pre-commit v3.x. The constraint
pre-commit<4
in pyproject.toml is appropriate as it:
- Ensures stability with the current pre-commit hooks ecosystem
- Prevents potential breaking changes from pre-commit v4
- Aligns with the project's existing pre-commit configuration
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify pre-commit configuration compatibility # Expected: Should find pre-commit-config.yaml and confirm hooks versions # Check pre-commit config exists and show its content if [ -f ".pre-commit-config.yaml" ]; then echo "Found .pre-commit-config.yaml:" cat .pre-commit-config.yaml else echo "Warning: .pre-commit-config.yaml not found" fiLength of output: 3531
src/aces/predicates.py (3)
613-668
: LGTM! Well-documented test cases.The added test cases effectively demonstrate the usage of static variables in derived predicates, providing clear examples of configuration and expected behavior.
701-701
: LGTM! Clear identification of static variables.The implementation efficiently extracts static variables from plain predicates.
703-712
: Verify handling of inconsistent static values.The implementation propagates the first value of static variables within each subject_id group. Please verify if this is the intended behavior when a static variable has different values within the same subject_id group.
Run the following script to check for inconsistent static values:
✅ Verification successful
Based on the gathered information, I can now provide a final response:
Taking first value for static variables is the intended behavior
The implementation is correct because:
- Static variables in this context represent patient demographics that should remain constant per subject
- The code is part of a larger system where static variables are explicitly marked in configuration (
static=True
)- The README confirms that static variables are handled specially, either stored with null timestamps in MEDS or in a separate
subjects_df
table in ESGPT- If inconsistencies exist in static variables for a subject, taking the first value is a reasonable default as these should be constant by definition
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for subjects with inconsistent static variable values. # Test: Search for test cases or examples that verify this behavior. rg -A 5 "static.*=.*True" | rg -B 5 "subject_id.*=.*\["Length of output: 57
Script:
#!/bin/bash # Let's try a different approach to find relevant test cases and usage patterns # Search for test files related to static variables and predicates fd "test.*\.py" | xargs rg -l "static.*variable" # Search for actual usage of static variables in the codebase rg -B 3 -A 3 "static_variables" # Look for test data or fixtures that might contain static variable examples fd "fixtures|test_data" | xargs rg -l "static" # Search for documentation or comments about static variables rg -B 2 -A 2 "static.*variable" -g "*.md" -g "*.py"Length of output: 7186
Script:
#!/bin/bash # Let's look for the implementation of static variable handling and constraints # Check the implementation of check_static_variables function rg -A 15 "^def check_static_variables" src/aces/constraints.py # Look for documentation about static variables in the configuration rg -A 5 "static.*=.*True" src/aces/config.py # Search for test cases with static variables fd "test_.*\.py" | xargs rg -B 3 -A 3 "check_static_variables"Length of output: 1725
@mmcdermott I tried to implement a way to allow for derived predicates joining static and plain predicates - basically I just copy over the static predicate values from the null timestamp rows throughout the same subject_id if they exist since these are static and do not change. Then the logic for the evaluation of the derived predicates shouldn't need to change. Does this seem reasonable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (6)
docs/source/configuration.md (2)
52-56
: LGTM! Consider adding an example for clarity.The memory optimization note is a valuable addition that will help users avoid potential performance issues.
Consider adding a brief example to illustrate the recommended approach:
**Note**: Each individual definition of `PlainPredicateConfig` and `code` will generate a separate predicate column. Thus, for memory optimization, it is strongly recommended to match multiple values using either the List of Values or Regular Expression formats whenever possible. + + For example, instead of: + ```yaml + - code: "CODE1" + - code: "CODE2" + - code: "CODE3" + ``` + + Use: + ```yaml + - code: + any: ["CODE1", "CODE2", "CODE3"] + # or + - code: + regex: "CODE[1-3]" + ```
63-74
: Fix formatting and consider simplifying the explanation.The clarification of value constraints is helpful, but there are some formatting inconsistencies and opportunities for more concise wording.
Apply these changes to improve readability:
- `value_max` (with these options being decided on the basis of `value_max_inclusive`, where - `value_max_inclusive=True` indicating that an observation satisfies this predicate if its value is less - than or equal to `value_max`, and `value_max_inclusive=False` indicating a less than but not equal to - will be used). + `value_max`: If specified, requires the observation's value to be: + - Less than or equal to `value_max` when `value_max_inclusive=True` + - Less than `value_max` when `value_max_inclusive=False` - `value_min_inclusive`: See `value_min` + `value_min_inclusive`: Controls whether `value_min` is inclusive (True) or exclusive (False) - `value_max_inclusive`: See `value_max` + `value_max_inclusive`: Controls whether `value_max` is inclusive (True) or exclusive (False)Also, fix the loose punctuation marks by removing extra spaces before list items.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~64-~64: Loose punctuation mark.
Context: ...equal to will be used). -value_max
: If specified, an observation will only ...(UNLIKELY_OPENING_PUNCTUATION)
[style] ~66-~66: ‘on the basis of’ might be wordy. Consider a shorter alternative.
Context: ..._max(with these options being decided on the basis of
value_max_inclusive, where
value_m...(EN_WORDINESS_PREMIUM_ON_THE_BASIS_OF)
[uncategorized] ~71-~71: Loose punctuation mark.
Context: ... will be used). -value_min_inclusive
: Seevalue_min
- `value_max_inclusive...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~73-~73: Loose punctuation mark.
Context: ...Seevalue_min
-value_max_inclusive
: Seevalue_max
-other_cols
: This o...(UNLIKELY_OPENING_PUNCTUATION)
README.md (2)
221-222
: LGTM! Consider adding examples for the new predicate formats.The documentation clearly explains the new list-based and regex-based predicate formats. To further enhance user understanding, consider adding concrete examples:
- a list of strings as above in the form of `{any: [???, ???, ...]}`, which will match any of the listed codes. - a regex in the form of `{regex: "???"}`, which will match any code that matches that regular expression. + - a list of strings as above in the form of `{any: ["lab//SpO2", "lab//O2_sat"]}`, which will match any of the listed codes. + - a regex in the form of `{regex: "lab//.*O2.*"}`, which will match any code that matches that regular expression.
229-230
: LGTM! Consider adding performance metrics.The memory optimization note provides valuable guidance. To make it more compelling, consider adding quantitative metrics:
- **Note**: For memory optimization, we strongly recommend using either the List of Values or Regular Expression formats whenever possible, especially when needing to match multiple values. Defining each code as an individual string will increase memory usage significantly, as each code generates a separate predicate column. Using a list or regex consolidates multiple matching codes under a single column, reducing the overall memory footprint. + **Note**: For memory optimization, we strongly recommend using either the List of Values or Regular Expression formats whenever possible, especially when needing to match multiple values. Defining each code as an individual string will increase memory usage significantly, as each code generates a separate predicate column. Using a list or regex consolidates multiple matching codes under a single column, reducing the overall memory footprint. For example, consolidating 10 related predicates into a single list-based predicate can reduce memory usage by up to 90% in typical scenarios.src/aces/predicates.py (1)
699-714
: LGTM! Clean implementation of static variable handling.The implementation efficiently handles static variables in derived predicates by propagating the first value within each subject_id group. This aligns well with the PR objectives.
Consider adding a brief comment explaining why we take the first value for static variables:
if any(x in static_variables for x in code.input_predicates): + # For static variables, take the first value in each subject_id group as these values + # should remain constant throughout the subject's timeline data = data.with_columns( [ pl.col(static_var)src/aces/config.py (1)
Line range hint
1198-1320
: Enhance error message with format examples.The error handling for string predicates is good, but the error message could be more helpful by including examples of the correct format.
Consider updating the error message to include format examples:
- f"Predicate '{n}' is not defined correctly in the configuration file. " - f"Currently defined as the string: {p}. " - "Please refer to the documentation for the supported formats." + f"Predicate '{n}' is not defined correctly in the configuration file. " + f"Currently defined as the string: {p}. " + "Expected a dictionary with 'code' field, for example:\n" + " admission:\n" + " code: 'admission'\n" + "Please refer to the documentation for all supported formats."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- README.md (1 hunks)
- docs/source/configuration.md (1 hunks)
- src/aces/config.py (2 hunks)
- src/aces/predicates.py (2 hunks)
- src/aces/query.py (2 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/source/configuration.md
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ormats whenever possible. -value_min
: If specified, an observation will only ...(UNLIKELY_OPENING_PUNCTUATION)
[style] ~59-~59: ‘on the basis of’ might be wordy. Consider a shorter alternative.
Context: ..._min(with these options being decided on the basis of
value_min_inclusive, where
value_m...(EN_WORDINESS_PREMIUM_ON_THE_BASIS_OF)
[uncategorized] ~64-~64: Loose punctuation mark.
Context: ...equal to will be used). -value_max
: If specified, an observation will only ...(UNLIKELY_OPENING_PUNCTUATION)
[style] ~66-~66: ‘on the basis of’ might be wordy. Consider a shorter alternative.
Context: ..._max(with these options being decided on the basis of
value_max_inclusive, where
value_m...(EN_WORDINESS_PREMIUM_ON_THE_BASIS_OF)
[uncategorized] ~71-~71: Loose punctuation mark.
Context: ... will be used). -value_min_inclusive
: Seevalue_min
- `value_max_inclusive...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~73-~73: Loose punctuation mark.
Context: ...Seevalue_min
-value_max_inclusive
: Seevalue_max
-other_cols
: This o...(UNLIKELY_OPENING_PUNCTUATION)
🪛 GitHub Check: codecov/patch
src/aces/query.py
[warning] 142-143: src/aces/query.py#L142-L143
Added lines #L142 - L143 were not covered by tests
[warning] 149-151: src/aces/query.py#L149-L151
Added lines #L149 - L151 were not covered by tests
🔇 Additional comments (3)
src/aces/query.py (2)
7-8
: LGTM: Clean import additionThe Counter import is well-placed and appropriately used for analyzing label distributions.
142-147
: Add test coverage for label validation logicThe label validation is a critical diagnostic feature that helps users identify potential configuration issues. However, this code segment currently lacks test coverage.
Would you like me to help create test cases for:
- Single-value label scenario
- Multi-value label scenario
- Edge cases (empty labels, null values)?
Here's a script to verify the current test coverage:
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 142-143: src/aces/query.py#L142-L143
Added lines #L142 - L143 were not covered by testssrc/aces/predicates.py (1)
613-668
: LGTM! Well-documented example usage.The new example effectively demonstrates how to use static variables in derived predicates, showing both the configuration and expected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- src/aces/query.py (2 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/aces/query.py
[warning] 142-143: src/aces/query.py#L142-L143
Added lines #L142 - L143 were not covered by tests
[warning] 149-152: src/aces/query.py#L149-L152
Added lines #L149 - L152 were not covered by tests
[warning] 155-155: src/aces/query.py#L155
Added line #L155 was not covered by tests
🔇 Additional comments (2)
src/aces/query.py (2)
7-7
: LGTM!The Counter import is appropriately placed and well-suited for calculating label distributions.
142-157
: Add test coverage for label validation logic.The new label validation and distribution logging code lacks test coverage. Please add test cases to cover:
- Single-label scenario triggering the warning
- Multi-label scenario with distribution logging
Would you like help creating the test cases for these scenarios?
✅ Verification successful
Based on the shell script results, I can now provide a final response. We can see that there's an established testing infrastructure with test utilities (
tests/utils.py
) that includes DataFrame assertions and test data handling. We also see examples of test patterns intest_e2e.py
and other test files.The review comment is valid, but needs refinement based on the codebase context
The label validation logic should indeed be tested, but we should leverage the existing test infrastructure:
- Add test cases to
test_e2e.py
since it already handles similar DataFrame-based scenarios and contains label-related test data- Use the
assert_df_equal
utility fromtests/utils.py
for DataFrame comparisons- Follow the existing pattern of using polars DataFrame construction as seen in the test files
Key test locations to add:
tests/test_e2e.py
: Add new test cases for both single-label and multi-label scenarios- Utilize the existing test fixtures and utilities shown in
tests/utils.py
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for existing test files and patterns # Test: Look for existing test patterns we can follow rg -A 5 "def test.*label" tests/Length of output: 34
Script:
#!/bin/bash # Let's check for existing test files related to query.py fd "test.*query.*\.py$" tests/ # Let's also look for any existing test patterns in the test files rg -l "test.*cohort" tests/ # Check the actual implementation to understand the test context better cat src/aces/query.pyLength of output: 7585
Script:
#!/bin/bash # Let's check for the test directory structure and existing test files fd "test_.*\.py$" tests/ # Let's also check for any existing test fixtures or utilities rg -l "fixture" tests/ # Let's specifically look for any DataFrame-related test patterns rg -A 5 "DataFrame" tests/Length of output: 6947
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 142-143: src/aces/query.py#L142-L143
Added lines #L142 - L143 were not covered by tests
[warning] 149-152: src/aces/query.py#L149-L152
Added lines #L149 - L152 were not covered by tests
[warning] 155-155: src/aces/query.py#L155
Added line #L155 was not covered by tests
Might be a bit difficult to properly test the new error/warning messages in |
@justin13601, can you add tests to get the codecov up to the desired target? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
src/aces/query.py (1)
140-146
: Add pragma comment to exclude logging from coverage.The warning logic is well-implemented, but needs to be excluded from coverage requirements.
Add the following pragma comment to line 140:
- if result["label"].n_unique() == 1: + if result["label"].n_unique() == 1: # pragma: no cover🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 140-141: src/aces/query.py#L140-L141
Added lines #L140 - L141 were not covered by tests
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- .github/workflows/code-quality-pr.yaml (2 hunks)
- src/aces/query.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/code-quality-pr.yaml
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/aces/query.py
[warning] 140-141: src/aces/query.py#L140-L141
Added lines #L140 - L141 were not covered by tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
README.md (1)
220-223
: LGTM! Consider varying sentence structure.The documentation clearly explains the different formats for the
code
field. However, there's a minor style suggestion to improve readability.Consider varying the sentence beginnings to improve flow:
- - a string matching values in a column named `code` (for `MEDS` only). - - a string with a `//` sequence separating the column name and the matching column value (for `ESGPT` only). - - a list of strings as above in the form of `{any: \[???, ???, ...\]}` (or the corresponding expanded indented `YAML` format), which will match any of the listed codes. - - a regex in the form of `{regex: "???"}` (or the corresponding expanded indented `YAML` format), which will match any code that matches that regular expression. + - For MEDS: a string matching values in a column named `code`. + - For ESGPT: a string with a `//` sequence separating the column name and the matching column value. + - Multiple values: a list of strings in the form of `{any: \[???, ???, ...\]}` (or the corresponding expanded indented `YAML` format), which will match any of the listed codes. + - Pattern matching: a regex in the form of `{regex: "???"}` (or the corresponding expanded indented `YAML` format), which will match any code that matches that regular expression.🧰 Tools
🪛 LanguageTool
[style] ~223-~223: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...will match any of the listed codes. - a regex in the form of{regex: "???"}
(...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- README.md (1 hunks)
- src/aces/query.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/aces/query.py
🧰 Additional context used
🪛 LanguageTool
README.md
[style] ~223-~223: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...will match any of the listed codes. - a regex in the form of{regex: "???"}
(...(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🔇 Additional comments (1)
README.md (1)
230-231
: LGTM! Excellent performance optimization guidance.The added note about memory optimization is valuable, providing clear guidance on using List of Values or Regular Expression formats to reduce memory usage. This is particularly important for scalability.
Also closes #94 |
…erence ranges for male/female)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
src/aces/config.py (1)
1198-1214
: Enhance error message clarity for invalid predicate definitions.The error message could be more helpful by providing an example of the correct format.
Consider updating the error message to include an example:
- raise ValueError( - f"Predicate '{n}' is not defined correctly in the configuration file. " - f"Currently defined as the string: {p}. " - "Please refer to the documentation for the supported formats." - ) + raise ValueError( + f"Predicate '{n}' is not defined correctly in the configuration file. " + f"Currently defined as the string: {p}. " + "Predicates must be defined as dictionaries with a 'code' field, for example: " + "{'code': 'admission', 'static': false}" + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- src/aces/config.py (5 hunks)
- src/aces/query.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/aces/query.py
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/aces/config.py
[warning] 1311-1311: src/aces/config.py#L1311
Added line #L1311 was not covered by tests
[warning] 1313-1314: src/aces/config.py#L1313-L1314
Added lines #L1313 - L1314 were not covered by tests
[warning] 1316-1317: src/aces/config.py#L1316-L1317
Added lines #L1316 - L1317 were not covered by tests
[warning] 1323-1326: src/aces/config.py#L1323-L1326
Added lines #L1323 - L1326 were not covered by tests
[warning] 1329-1329: src/aces/config.py#L1329
Added line #L1329 was not covered by tests
🔇 Additional comments (4)
src/aces/config.py (4)
1338-1343
: Improved validation for string predicates.The added validation prevents silent failures by explicitly checking for and rejecting string-based predicate definitions. This is a good defensive programming practice.
1392-1394
: Enhanced error message readability.The error message formatting has been improved to show the count of missing relationships, making it easier to understand the scope of configuration issues.
1292-1305
: Improved referenced predicates validation.The code now properly collects and validates all referenced predicates, including those from windows and triggers. This helps catch configuration errors early.
1310-1330
: Verify recursive predicate validation implementation.The new recursive validation logic for derived predicates is not covered by tests. While the logic appears sound, we should ensure it handles all edge cases correctly.
Consider adding test cases that cover:
- Deeply nested derived predicates
- Circular references
- Missing nested predicates
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 1311-1311: src/aces/config.py#L1311
Added line #L1311 was not covered by tests
[warning] 1313-1314: src/aces/config.py#L1313-L1314
Added lines #L1313 - L1314 were not covered by tests
[warning] 1316-1317: src/aces/config.py#L1316-L1317
Added lines #L1316 - L1317 were not covered by tests
[warning] 1323-1326: src/aces/config.py#L1323-L1326
Added lines #L1323 - L1326 were not covered by tests
[warning] 1329-1329: src/aces/config.py#L1329
Added line #L1329 was not covered by tests
…s nested derived predicates
@mmcdermott I realized nested derived predicates will most likely be needed for creating these different reference ranges so I tried to enable that. Let me know if the following seems reasonable: To create different reference ranges (for illustration purposes, not following actual clinical guidelines): Different normal ranges:
Different abnormal ranges (may be worth looking into
|
For #144
Summary by CodeRabbit
New Features
Documentation
Chores