Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre-commit hook + ruff usage #2013

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

pre-commit hook + ruff usage #2013

wants to merge 12 commits into from

Conversation

cpelley
Copy link
Contributor

@cpelley cpelley commented Jul 12, 2024

  • ruff usage (setup to be equivalent to flake8 + isort + black), as per dagruner, improver_suite etc.
    • Performance improvements from its use.
    • Standardisation across repositories.
  • copyright and init checks also turned into pre-commit checks.
    • No longer any need improver_tests/test_source_code.py (removed in this PR).
    • __init__.py file checks handled via init_check script. Auto-fix supported (creates missing __init__.py files).
    • Copyright check will add missing copyright headers.
  • pre-commit checking against changed files only in actions.
  • Updating ci.yml formatting while I was at it to increase readability (line spacing between steps).
  • CI workflow running the pre-commit (formally flake8 and black) now only look at files touched by the PR.

Note that you can run pre-commit manually (you needn't wait for a commit):

All files:

pre-commit run --all-files

Specific files:

pre-commit run --files <some-file> <some-other-file>

Issues

Why ruff?

This PR switches our use from black, isort, and flake8 to ruff for code formatting and linting. Ruff is gaining popularity due to its performance, efficiency and flexibility. It combines the functionality of black, isort, and flake8 into a single tool (and more if we enable additional rulesets), reducing the complexity of our tooling setup. Ruff offers excellent performance, faster execution times, and the ability to auto-fix issues, ensuring a consistent codebase. Adopting ruff aligns with industry trends and provides a unified approach to code quality with our other repositories.

Note

Some differences between black and ruff:
https://docs.astral.sh/ruff/formatter/#black-compatibility
(CI now only runs pre-commit on files changed so changes will be made to files as people touch them)

@cpelley cpelley marked this pull request as draft July 12, 2024 10:29
@cpelley cpelley force-pushed the PRE_COMMIT branch 2 times, most recently from b890199 to 9aba4cf Compare July 12, 2024 10:57
@cpelley cpelley marked this pull request as ready for review July 12, 2024 11:09
@cpelley cpelley self-assigned this Jul 12, 2024
@cpelley cpelley changed the title pre-commit hook usage pre-commit hook + ruff usage Jul 12, 2024
@cpelley
Copy link
Contributor Author

cpelley commented Jul 12, 2024

May be of interest to you @s-boardman - towards consistency between repositories.

@cpelley
Copy link
Contributor Author

cpelley commented Jul 16, 2024

@gavinevans - you happy for us to proceed with this? I can chase-up a second review from the technical team. I just wasn't sure whether science would be happy with leaving this as a purely technical team concern change.
cheers

@cpelley
Copy link
Contributor Author

cpelley commented Jul 23, 2024

I have updated the code style guide to provide more context.

pyproject.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@SamGriffithsMO SamGriffithsMO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more points, I have been testing improver/nbhood/nbhood.py;

  1. There appears to be an issue with the copyright_check, it is failing when I don't think it should. Can you add a fix option to copyright_check so users don't have to figure out what the correct header is? (this will also tell me if it is working correctly)
  2. bin/improver_tests still contains references to old tools-black/isort/flake8 (is this used/ can it be removed? Or does it need updating)
  3. This change appears to update the line-length from 100 down to 88 (not total clear to me why) - note, I am okay with it going to 88. Best guess for cause is the flake8 configuration in setup.cfg (which is still there and presumably could be removed?)
  4. There is a formatting change that I think are actually quite bad (for readability), can we do anything about this. Specifically spacing in explicit arrays, e.g.
-                    [[[ 0.75,  0.75,  0.5 ,  0.5 ,  0.5 ,  0.75,  0.75],
-                      [ 0.75,  0.55,  0.55,  0.5 ,  0.55,  0.55,  0.55],
-                      [ 0.55,  0.55,  0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.5 ],
-                      [ 0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.5 ],
-                      [ 0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.5 ,  0.55,  0.55],
-                      [ 0.55,  0.55,  0.55,  0.5 ,  0.55,  0.55,  0.75],
-                      [ 0.75,  0.75,  0.5 ,  0.5 ,  0.5 ,  0.75,  0.75]]],
+                    (
+                        [
+                            [
+                                [0.75, 0.75, 0.5, 0.5, 0.5, 0.75, 0.75],
+                                [0.75, 0.55, 0.55, 0.5, 0.55, 0.55, 0.55],
+                                [0.55, 0.55, 0.5, 0.5, 0.5, 0.5, 0.5],
+                                [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
+                                [0.5, 0.5, 0.5, 0.5, 0.5, 0.55, 0.55],
+                                [0.55, 0.55, 0.55, 0.5, 0.55, 0.55, 0.75],
+                                [0.75, 0.75, 0.5, 0.5, 0.5, 0.75, 0.75],
+                            ]
+                        ],
+                    )

@cpelley
Copy link
Contributor Author

cpelley commented Jul 23, 2024

There appears to be an issue with the copyright_check, it is failing when I don't think it should.

Can you give an example of this case that failed when you think it shouldn't have?

Can you add a fix option to copyright_check so users don't have to figure out what the correct header is?

The copyright check will already add a copyright header automatically if it is missing. When it appears to be present but isn't correct, it's very difficult to automatically fix or present a diff in the general case. Essentially, checking whether it 'appears' to have a copyright defined is achieved by searching for the term 'copyright' within a comment line ('#'). Beyond that, comments can include arbitrary information so there is no sure way to know where this incorrect copyright finishes, only that is doesn't match.

EDIT: I suppose we could derive a diff based on the number of lines of the correct copyright notice we expect to be there. We could do this, but then wouldn't it be simpler to delete the copyright header and have it automatically populated for you?? (rather than modifying them I mean)

EDIT2: Since the improver_tests/test_source_code.py:test_py_licence that this copyright check replaces doesn't provide this diff, I'm going to consider it out of scope for this PR. Perhaps useful thing to consider in a future mind, though I'm not sure how much so, considering the time people have been managing with their existing solution without such differencing capability 🤷‍♂️

bin/improver_tests still contains references to old tools-black/isort/flake8 (is this used/ can it be removed? Or does it need updating)

I'll grep for references to flake8 and black 👍

This change appears to update the line-length from 100 down to 88 (not total clear to me why) - note, I am okay with it going to 88. Best guess for cause is the flake8 configuration in setup.cfg (which is still there and presumably could be removed?)

Good spotting, yes I didn't think to look at setup.cfg for the flake8 config.
I'll remove and update the ruff limit to 100 (for now at least).

There is a formatting change that I think are actually quite bad (for readability), can we do anything about this. Specifically spacing in explicit arrays, e.g.

Yes, I'm not sure. Will look into it. I think ideally this would be done with an in-line rule exclusion for such things.

@cpelley cpelley marked this pull request as draft July 23, 2024 11:00
@cpelley
Copy link
Contributor Author

cpelley commented Jul 23, 2024

Pulled back into draft to address some points raised by @SamGriffithsMO (thanks for taking a close look at this).

@SamGriffithsMO
Copy link
Contributor

Can you give an example of this case that failed when you think it shouldn't have?

I just ran it on nbhood/nbhood.py, it already has a header that, to me, looks like it should pass. Let me know if you can't reproduce it

(improver_production) [sgriffit@vld456:/net/home/h03/sgriffit/repos/improver]$ git status
HEAD detached at origin/PRE_COMMIT
nothing to commit, working tree clean
(improver_production) [sgriffit@vld456:/net/home/h03/sgriffit/repos/improver]$ pre-commit run -v --files improver/nbhood/nbhood.py
ruff.....................................................................Failed
- hook id: ruff
- duration: 0.12s
- exit code: 1

improver/nbhood/nbhood.py:320:89: E501 Line too long (99 > 88)
improver/nbhood/nbhood.py:329:89: E501 Line too long (89 > 88)
improver/nbhood/nbhood.py:331:89: E501 Line too long (93 > 88)
improver/nbhood/nbhood.py:332:89: E501 Line too long (96 > 88)
improver/nbhood/nbhood.py:336:89: E501 Line too long (91 > 88)
improver/nbhood/nbhood.py:338:89: E501 Line too long (89 > 88)
improver/nbhood/nbhood.py:347:89: E501 Line too long (95 > 88)
improver/nbhood/nbhood.py:349:89: E501 Line too long (93 > 88)
improver/nbhood/nbhood.py:352:89: E501 Line too long (92 > 88)
improver/nbhood/nbhood.py:353:89: E501 Line too long (92 > 88)
improver/nbhood/nbhood.py:381:89: E501 Line too long (99 > 88)
improver/nbhood/nbhood.py:382:89: E501 Line too long (96 > 88)
improver/nbhood/nbhood.py:388:89: E501 Line too long (96 > 88)
improver/nbhood/nbhood.py:642:89: E501 Line too long (92 > 88)
Found 14 errors.

ruff-format..............................................................Failed
- hook id: ruff-format
- duration: 0.05s
- files were modified by this hook

1 file reformatted

Check copyright header...................................................Failed
- hook id: copyright_check
- duration: 0.11s
- exit code: 1

Incorrect Copyright header in 'improver/nbhood/nbhood.py'

Check for missing __init__.py files......................................Passed
- hook id: init_check
- duration: 0.02s

Since the improver_tests/test_source_code.py:test_py_licence that this copyright check replaces doesn't provide this diff, I'm going to consider it out of scope for this PR.

👍

@mo-robert-purvis
Copy link

Copyright check fails on that nbhood.py because first line has Crown copyright not Crown Copyright (case on second word) and second line has released under a BSD 3-Clause license rather than released under the BSD 3-Clause license - a vs the.

Copy link

In order to maintain a backlog of relevant PRs, we automatically label them as stale after 60 days of inactivity.

If this PR is still important to you, then please comment on this PR and the stale label will be removed.

Otherwise this PR will be automatically closed in 30 days time.

@github-actions github-actions bot added the Stale label Sep 22, 2024
@cpelley cpelley force-pushed the PRE_COMMIT branch 3 times, most recently from 969fdfd to 546c212 Compare October 22, 2024 09:24
@cpelley
Copy link
Contributor Author

cpelley commented Oct 22, 2024

Summary of new changes

  • Pre-commit now runs on all files in CI, not just files changed.
  • Updated all copyright headers to be consistent.
  • Updated all code to conform to ruff (some differences to black).
    • I ran black --skip-magic-trailing-comma . followed by ruff run --all-files (reflection of inconsistent trailing comma usage within IMPROVER which gives rise to inconsistent formatting).
  • Added --verbose and --fix argument to copyright header check (--verbose will output a diff when copyright exists but doesn't match, not verbose by default).
  • Updated environment conda files, removing including black, isort and flake8. Managed through pre-commit.

There is a formatting change that I think are actually quite bad (for readability), can we do anything about this. Specifically spacing in explicit arrays, e.g.

Best approach to apply a noqa exception to it. Peoples requirements for vertical alignment isn't consistent so ruff hasn't gone down this route to support such scenarios.

bin/improver_tests still contains references to...

Done. Good spotting.

@cpelley cpelley added the BoM review required PRs opened by non-BoM developers that require a BoM review label Oct 22, 2024
@cpelley cpelley marked this pull request as ready for review October 22, 2024 10:09
@cpelley
Copy link
Contributor Author

cpelley commented Oct 22, 2024

@nivnac, you may be interested to look at this.

In summary, this is motivated by standardising formatting methods and testing across our repositories. Also:

  1. Smaller and Cleaner Production Environment: By removing developer tools like ruff from the package's core dependencies, the environment remains lightweight and free from unnecessary developer tools.

  2. Easier Dependency Management: When developer-only dependencies are managed separately (e.g., using pre-commit), it becomes easier to maintain and update them without affecting the conda environment. This reduces the risk of conflicts between package dependencies and development tools.

  3. Improved Reproducibility: A clear separation between core and development dependencies ensures that production environments are consistent and reproducible. Development tools can introduce updates or changes that aren’t relevant to production, and by keeping them separate, you reduce potential variability across environments.

  4. Flexibility for Developers: Pre-commit hooks allow each developer to manage their local development environment while keeping the project’s core environment isolated. For example, developers can update or modify their linting tools without altering the package dependencies.

By keeping developer dependencies separate, we can streamline both development and deployment processes, reducing complexity and ensuring a clear distinction between development tools and production requirements.

Note that IMPROVER uses a number of other developer packages and this PR is likely just the first step (though I have no plans to look at this again anytime soon).

Copy link
Contributor

@SamGriffithsMO SamGriffithsMO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good 👍 , just need to give some special attention to nbhood/nbhood.py and good to go

improver/nbhood/nbhood.py Outdated Show resolved Hide resolved
improver/nbhood/nbhood.py Outdated Show resolved Hide resolved
improver/nbhood/nbhood.py Outdated Show resolved Hide resolved
improver/nbhood/nbhood.py Show resolved Hide resolved
@cpelley cpelley dismissed SamGriffithsMO’s stale review October 24, 2024 06:45

Thanks Sam, I have made changes to reflect feedback.

line_length = 88
[tool.ruff.lint]
extend-select = ["E", "F", "W", "I"] # add C90 later
ignore = ["E203", "E731", "E501", "E741"] # remove "E501", "E741" later
Copy link
Contributor Author

@cpelley cpelley Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to draw attention to this: Ideally we should work towards removing any exclusions. These exist here now so as not to attempt to cause too many changes (raising standards) whilst first adopting ruff here.

Copy link
Contributor

@SamGriffithsMO SamGriffithsMO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link

@mo-robert-purvis mo-robert-purvis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a lot of files!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blue_team BoM review required PRs opened by non-BoM developers that require a BoM review test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants