Implement PEP 723 support (script dependencies in a TOML block) #96

pfmoore · 2024-01-09T13:59:56Z

bswck

Great! 🚀
Just a few minor suggestions from me.

pip_run/scripts.py

bswck · 2024-01-19T18:30:51Z

pip_run/scripts.py

+        matches = list(
+            filter(lambda m: m.group('type') == name, re.finditer(TOML_BLOCK_REGEX, self.script))
+        )
+        if len(matches) > 1:
+            raise ValueError(f'Multiple {name} blocks found')
+        elif len(matches) == 1:
+            content = ''.join(
+                line[2:] if line.startswith('# ') else line[1:]
+                for line in matches[0].group('content').splitlines(keepends=True)
+            )
+            deps = tomllib.loads(content).get("dependencies", [])
+        else:
+            deps = []


The performance gain from filter() is already lost due to lambda. I'd suggest using a clean generator expression to enhance readability.

In the suggested change, because only one match will be needed, we take only one. Next up, we check if there's another one left and raise immediately, not knowing about the rest of the matches as we don't need them.
It's also safer to use .get("dependencies") or [] instead of .get("dependencies", []) in case "dependencies" is defined as null; and if there's an empty list already, we save time on not creating another default empty list.

line[2:] if line.startswith('# ') else line[1:] can also either be replaced with line[2 if line.startswith('# ') else 1:] or line[1+line.startswith('# '):].

Suggested change

matches = list(

filter(lambda m: m.group('type') == name, re.finditer(TOML_BLOCK_REGEX, self.script))

)

if len(matches) > 1:

raise ValueError(f'Multiple {name} blocks found')

elif len(matches) == 1:

content = ''.join(

line[2:] if line.startswith('# ') else line[1:]

for line in matches[0].group('content').splitlines(keepends=True)

)

deps = tomllib.loads(content).get("dependencies", [])

else:

deps = []

deps = []

iter_matches = (

m for m in re.finditer(TOML_BLOCK_REGEX, self.script)

if m.group('type') == name

)

match = next(iter_matches, None)

if match:

if any(iter_matches): # Check if there are any more matches left.

raise ValueError(f'Multiple {name} blocks found')

content = ''.join(

line[1 + line.startswith('# '):]

for line in match.group('content').splitlines(keepends=True)

)

deps[:] = tomllib.loads(content).get("dependencies") or ()

This is taken straight from the PEP, where it's the canonical implementation of parsing. I'd rather not mess with it, simply because I don't see the benefit in risking the possibility that we introduce bugs by doing so.

As the author, I am a strong -1 on this change because it is not as easy to translate into other languages as before.

edit: I didn't realize which repository this was, feel free to do as you wish if it works but I agree with Paul that there is an inherent risk of introducing a bug.

As the author, I am a strong -1 on this change because it is not as easy to translate into other languages as before.

Thank you for your feedback. I was suggesting the change for pip-run specifically. Is there any other reason you don't like the suggestions?

Nope sorry about that, as I mentioned in the edit to my comment feel free to do as you wish!

I tested my changes against tests in this PR and observed no regression. In my opinion, a more detailed analysis of the change can disprove the concern regarding the risk of introducing a bug.

What is more, the suggested change costlessly handles an edge case (It's also safer to use .get("dependencies") or [] instead of .get("dependencies", []) in case "dependencies" is defined as null) not handled by the original version which does assume the input data might be invalid (judging by tomllib.loads(content).get("dependencies") used instead of tomllib.loads(content)["dependencies"]).

Hi, sorry for joining an already crowded conversation, but @bswck asked for my thoughts, so here they are:

I do think it's worth checking that dependencies is a list of strings. Currently it looks like the user will get a confusing traceback if it's null or a number. Most importantly, if it's a single string, then each character will be treated as a dependency.

The rest of the suggestion doesn't seem to have any significant benefit and isn't worth this much discussion. If @pfmoore doesn't want to change this or talk about it then I support that, I'm sure he has plenty of things to do.

Checking the length of a list is vastly more readable than using next/any. I don't think it makes sense to worry about creating a big list of matches. If it does, I'd suggest using islice instead like here.

deps[:] = instead of deps = is jarring and weird to look at. And even if deps[:] = () is faster than deps = [] (is it?), even if the difference was significant (it obviously isn't), setting deps = () would be fine anyway.

line[1 + line.startswith('# '):] is fun and clever but not super readable. line[2 if line.startswith('# ') else 1:] is maybe an improvement but it's subjective and insignificant.

I do prefer a list comprehension over filter/lambda, but 🤷

I'll see if I can find the time to review the various suggestions. TBH, this was very much a "drive by" PR, motivated by the fact that I use pip-run and I wanted to ensure that it supported the new standard. Copying the reference implementation was the fastest way to achieve that. I didn't want to spend a lot of time over details that could be hashed out later.

In many ways I'd argue that a robust PEP 723 parser should go in a library somewhere, although I can sympathise if people don't want a new dependency just for that one thing. I'm a little bit sad if we end up with lots of different implementations of the parsing code, all with their own quirks and trade-offs.

My recommendation would be that unless I get the time to do an update, the code can go in as is, and it can be fixed in a follow-up PR. I'm absolutely not going to get upset about someone updating the code in a follow-up.

Co-authored-by: Bartosz Sławecki <[email protected]>

bswck · 2024-01-19T19:17:31Z

pip_run/scripts.py

+        >>> DepsReader('# /// script\n# dependencies = ["foo", "bar"]\n').read_toml()
+        []
+        """
+        TOML_BLOCK_REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$'


I would also suggest moving this out of the local scope to a global constant somewhere in the upper part of the module.

I'm inclined not to. Partly because I prefer keeping all the code that was directly copied from the PEP together, but also I don't see the benefit. Nothing else uses this value, and referencing it as a local is marginally faster than referencing it as a global. The performance is, of course, utterly irrelevant compared to the cost of reading the file we're parsing, so that's not a compelling argument. But equally, moving it to the top "because that's where global constants go" is also not a compelling argument (to me, anyway).

I see your point. My suggestion was not in any way motivated by performance.
This just looks like a constant and according to PEP 8, constants should be placed after globals, after imports.
When it comes to performance, it might be improved by compiling the pattern in the module scope, which I didn't suggest for the irrelevancy reasons you've mentioned.

If I may put my two cents in, I would make that pattern a global precompiled constant if it were up to me, it just sounds like good and consistent practice to me even though perf is irrelevant.
I personally don't find the "all the code that was directly copied from the PEP together" bit convincing at all 😄

If I may put my two cents in, I would make that pattern a global precompiled constant if it were up to me, it just sounds like good and consistent practice to me even though perf is irrelevant. I personally don't find the "all the code that was directly copied from the PEP together" bit convincing at all 😄

Pre-compiling the pattern in the module scope has one underlying function that was not mentioned: validation.

I don't particularly want to get into an extended debate here. Ultimately, I'd like to see this merged, and I'm not entirely sure if any of the current participants in this discussion are able to merge a PR here. I'm willing to move TOML_BLOCK_REGEX to a global compiled regex, or even just a global string, if @jaraco or another project maintainer feels it would be a useful change. But otherwise I'd prefer just to leave it as it stands. I'm not particularly convinced by "because PEP 8 says so" arguments, TBH.

(Yes, my comment about performance was silly. Sorry.)

I'm willing to move TOML_BLOCK_REGEX to a global compiled regex, or even just a global string, if @jaraco or another project maintainer feels it would be a useful change. But otherwise I'd prefer just to leave it as it stands.

I was assigned to the issue this PR is meant to solve (#95) 1 week after your PR had been submitted. That's why I allowed myself to leave a review. But of course, I agree, everything is up to jaraco.

jaraco · 2024-01-19T21:38:39Z

Thanks Paul for the contrib. I apologize I missed it until recently (I limit my github notifications out of necessity). I'll definitely be merging this or something similar soon. Thanks @bswck for the additional review. I'll take the suggestions under consideration.

pfmoore · 2024-01-19T21:51:13Z

Thanks @jaraco - if you want me to make any changes, just let me know.

@bswck Sorry if it seemed I was suggesting your review wasn't welcome - on the contrary, I appreciate the feedback. My only concern was that I didn't want to make too many changes (increasing churn on the PR) without getting an indication from the project maintainers on their opinion. I'm mostly indifferent to matters of code structure (unless they actively harm readability/maintainability IMO) and so I find it all to easy to get sucked into making too many changes that are of marginal benefit if I don't push back a little.

pfmoore · 2024-01-19T21:56:27Z

One specific point - the coverage check is failing, which I assume is related to the issue @bswck pointed out, and so I need to add # pragma: no cover somewhere. I'm not 100% sure where, though - I assume what gets covered will depend on the Python version, so do I need to add it to all of the try...except block?

Also, if anyone can save me getting it wrong, what's the right form for combining a type: ignore and pragma: no cover comments on the same line?

jaraco

I agree with Paul - no need to belabor the details. Let's get something working first and then refine the implementation. If the implementation is meant to match the "reference implementation" in the PEP, I'd prefer that the implementation be in a separate module or dependency that encapsulates that behavior. If on the other hand, this package is going to own the implementation (my slight preference), then we can address the coverage issues and whatever other tweaks it deserves to match the style and philosophy of this project.

A few things I'd like to see before merging:

Fix the coverage error (preferably by actually covering the code) or decide to accept the failure (and address it later).
Update the docs (readme) to include this new format alongside the others.
Does this format supersede the "comment" format or should all three formats be supported indefinitely? If the former, let's log an issue to mark the comment form as deprecated.
Add a changelog entry (I'll do that).

…rage check there.

jaraco · 2024-01-20T16:59:08Z

Hmm. While working on a test for the "multiple blocks" error, I've encountered an unexpected behavior with the regex. Consider this script:

# /// script
# ///

In my mind, that should be legal, a degenerate (empty) block, but the regex expects at least one line between the starting and ending line. Since the text of the PEP is silent on the matter, it seems it should be valid form (in the same way that the empty string is valid Python and valid TOML).

pfmoore · 2024-01-20T17:23:53Z

Does this format supersede the "comment" format or should all three formats be supported indefinitely? If the former, let's log an issue to mark the comment form as deprecated.

IMO, this should supersede the comment format (PEP 722, which standardised the comment format, was rejected in favour of PEP 723). I've raised #97 for this.

Implement PEP 723 support (script dependencies in a TOML block)

2816515

ssweber mentioned this pull request Jan 9, 2024

PEP 723 compatibility #95

Closed

bswck suggested changes Jan 19, 2024

View reviewed changes

Update pip_run/scripts.py

f239978

Co-authored-by: Bartosz Sławecki <[email protected]>

bswck reviewed Jan 19, 2024

View reviewed changes

jaraco requested changes Jan 20, 2024

View reviewed changes

jaraco added 2 commits January 20, 2024 11:36

Add changelog entry.

d3df4b8

Move Python 3.10 compatibility logic into its own module and fix cove…

21a641e

…rage check there.

jaraco added 3 commits January 20, 2024 12:10

Allow the content of a block to be empty.

36a2ee6

Add test covering the 'multiple scripts' error case.

55c7aec

Update README to illustrate the PEP 723 support.

42ae216

jaraco approved these changes Jan 20, 2024

View reviewed changes

jaraco merged commit 3166619 into jaraco:main Jan 20, 2024
14 checks passed

pfmoore deleted the pep723 branch January 20, 2024 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement PEP 723 support (script dependencies in a TOML block) #96

Implement PEP 723 support (script dependencies in a TOML block) #96

pfmoore commented Jan 9, 2024 •

edited by jaraco

Loading

bswck left a comment

bswck Jan 19, 2024 •

edited

Loading

pfmoore Jan 19, 2024

ofek Jan 19, 2024 •

edited

Loading

bswck Jan 19, 2024

ofek Jan 19, 2024

bswck Jan 19, 2024 •

edited

Loading

bswck Jan 19, 2024 •

edited

Loading

alexmojaki Jan 19, 2024

pfmoore Jan 19, 2024

bswck Jan 19, 2024

pfmoore Jan 19, 2024 •

edited

Loading

bswck Jan 19, 2024 •

edited

Loading

trag1c Jan 19, 2024

bswck Jan 19, 2024 •

edited

Loading

pfmoore Jan 19, 2024

bswck Jan 19, 2024 •

edited

Loading

jaraco commented Jan 19, 2024

pfmoore commented Jan 19, 2024

pfmoore commented Jan 19, 2024

jaraco left a comment •

edited

Loading

jaraco commented Jan 20, 2024

pfmoore commented Jan 20, 2024

Implement PEP 723 support (script dependencies in a TOML block) #96

Implement PEP 723 support (script dependencies in a TOML block) #96

Conversation

pfmoore commented Jan 9, 2024 • edited by jaraco Loading

bswck left a comment

Choose a reason for hiding this comment

bswck Jan 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ofek Jan 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bswck Jan 19, 2024 • edited Loading

Choose a reason for hiding this comment

bswck Jan 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pfmoore Jan 19, 2024 • edited Loading

Choose a reason for hiding this comment

bswck Jan 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bswck Jan 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bswck Jan 19, 2024 • edited Loading

Choose a reason for hiding this comment

jaraco commented Jan 19, 2024

pfmoore commented Jan 19, 2024

pfmoore commented Jan 19, 2024

jaraco left a comment • edited Loading

Choose a reason for hiding this comment

jaraco commented Jan 20, 2024

pfmoore commented Jan 20, 2024

pfmoore commented Jan 9, 2024 •

edited by jaraco

Loading

bswck Jan 19, 2024 •

edited

Loading

ofek Jan 19, 2024 •

edited

Loading

bswck Jan 19, 2024 •

edited

Loading

bswck Jan 19, 2024 •

edited

Loading

pfmoore Jan 19, 2024 •

edited

Loading

bswck Jan 19, 2024 •

edited

Loading

bswck Jan 19, 2024 •

edited

Loading

bswck Jan 19, 2024 •

edited

Loading

jaraco left a comment •

edited

Loading