-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] A plugin/extension architecture for pip
#12766
Comments
Thanks for putting together this proposal @woodruffw! Wanted to add some more evidence that such a plugin system would be used. A while ago I created a tool for generating Software Bill-of-Materials documents from Python environments/requirements files using existing (and proposed) PEP standards. Since Python packages + SBOMs is one area I'm focusing on in this upcoming year it would be great to provide a more native-feeling experience as a pip extension :) |
I remain sympathetic in principle to having a plugin API for pip. And I think we should acknowledge the reality that even though we have consistently stated that people must not rely on pip's internal API, nevertheless people do, and the sky hasn't (yet) fallen. But that doesn't mean we're now going to support using pip as a library, or be comfortable adding features that suggest we will. The proposal for The main plugin proposal feels a bit light on detail. Defining an entry point namespace is fine, but unless there's some contract that defines at least how and when pip calls registered plugins, it doesn't say anything useful. And any such contract is, at some level, a guaranteed API that pip provides. So whether the plugin proposal is acceptable depends entirely on what that contract is. I'm willing to be open-minded about the possibility of being able to define something acceptable, but we've been around this loop many times, so I want to know what's different about this proposal. |
I think the strongest arguments here are consistency and discovery: In terms of verbosity: my original thought was to propose it without the
This sounds like the right opportunity for me to go into detail, then! To make things more concrete, here's a kind of plugin that I could imagine: something that allows distributions to be introspected at various points before they're unarchived. Here's an example signature set for that plugin: def plugin_type() -> PluginType:
# name subject to discussion!
return "dist-inspector"
def pre_download(url: str) -> None:
# contract: `pre_download` raises `ValueError` to terminate
# the operation that intends to download `url`
pass
def pre_extract(dist: Path) -> None:
# contract: `pre_extract` raises `ValueError` to terminate
# the operation that intends to unarchive `dist`
pass when registered, This plugin is a little bit contrived, but demonstrates that useful things can be achieved while not committing
(There are pieces of this that would need to be hammered out, e.g. how caching is handled, and whether every download is trapped by the plugin, or only ones that follow completed candidate selection. But I think it demonstrates a workable approach that can be used to conservatively carve out very small stable API commitments.) |
Right. But that's just one example. And every example comes with a requirement for pip to add the infrastructure to call the plugin at the right time(s). So the proposal needs to define all the allowed plugin types. And that's basically a programmatic API. For example, does the
Yes, this is the problem. In effect, the plugin API becomes a "stealth" attempt to get pip to commit to API stability guarantees. And that's what we've been against doing from the start. There are a bunch of reasons for this:
Remember, pip is an application, not a library - and that's a choice, not an accident. Even applications which have a plugin interface don't let plugins dig around in the application internals - instead, they provide a carefully designed and controlled API that plugins can use. And pip doesn't have such an API. Let's give a really simple example. How can a plugin print a message to the user? It can't use raw Yes, there are pip internal routines for all of this. And yes, as I said above, people are using pip's internals and the sky hasn't fallen. But they know we don't support what they are doing, and they accept that. Once plugin support is added to pip, how do we ensure that message stays clear? Because we can't simply close every bug that says "I have a plugin..." saying we don't support plugins (which is what we do now when people say "I am importing pip..."). So I think this leaves us back at my previous comment - for this proposal to be considered, you'll need to specify what plugin types you're suggesting, and what the contract of each one is. And given that there are bound to be requests for additional hooks, how do we limit the scope up front so that maintainers aren't faced with an ongoing job of rejecting proposals for "just one more small hook"? Also, and I'm somewhat surprised it's me having to ask this, what are the security implications here? Installing a wheel is currently (relatively) safe, because it doesn't run arbitrary code from the internet. But I'm pretty sure that if we had plugins I could write a wheel that installed a pip plugin which (1) did malicious stuff whenever pip is run, and (2) hijacked |
I might not be understanding, but I don't follow why this would be the case, for two reasons:
Sorry if my example didn't make this clear: my thinking for these hooks is that their execution contract is "pure" -- they have no side effects from (This brings up a good design point however, which is that loading these plugins probably shouldn't be unidirectional, i.e. shouldn't be triggered just by
I think this is a good example of what a (We're talking Python of course, so there's no formal enforcement of a plugin's contract. But someone who violates the plugin contract in this way is categorically indistinguishable from people who already violate
Thank you for calling this out. I would be happy to discuss this further as well, but per the points above about an execution contract, here are some fundamentals that I think would serve you as the
I would be happy to flesh these out more as well, and of course to contribute them to your developer and user docs as part of the implementation effort.
Thank you for calling this out as well! @sethmlarson and I discussed this a bit, and had a couple of thoughts on it:
I think (2) is probably preferable, at the cost of a slightly less smooth UX. But I'll note that the Python packaging model is fundamentally brittle from a security perspective, regardless of what we do here: the game is "over" as soon as an sdist is allowed to run arbitrary code, meaning that even bidirectional controls can be circumvented (e.g. by the sdist setting the config itself). So this is really just card shuffling to some degree, unless plugins could be identified further "up the stack" in a way that precludes installation from sdists (e.g. a TL;DR: As currently specified, I don't believe this plugin architecture represents a change to the default security posture in Python packaging, but only because it's already non-ideal 🙂. So it doesn't make the problem any worse, but it doesn't make it better either. |
One of us is misunderstanding, but it might be me. What I think you're saying is that as a plugin author, I register an entry point that links to my module. Now, pip needs to call (at some point)
Well, isn't that the point? We're on record as stating, on numerous occasions, that we are not willing to support any public API. So unless I'm misunderstanding you, you're asking us to alter that stance and to support this public API, at least.
OK. No, that wasn't clear. Add to that the fact that plugin hooks need to adhere to the standard rule that they are not allowed to import any of pip's functionality, or use any pip internal APIs, and I guess they are basically just a notification API from pip to the plugin. Still an API (see my comment above), but certainly an extremely constrained one. I'm concerned that this would be the "thin end of the wedge", though, and we'd get requests for allowing additional interaction as the limitations of the pure notification interface become clearer.
I bet I can find reasons why plugins shouldn't do pretty much anything useful that you propose 😉 I'm not being facetious by saying this, I'm trying to point out that without any real motivating examples of plugin types or uses, it's impossible to pin down what plugins are allowed to do (beyond "nothing, just to be safe"). Again, this is part of the "we don't provide an API" issue - we don't offer any guarantees that the global Python interpreter is in any sort of usable state for code injected into the pip process. It probably is (that's what I meant when I said "the sky hasn't fallen") but there can be problems (we've had bug reports from people using pip in-process who have found the logging system isn't in the right state for them, for example).
While this is true in principle, it's much harder to write a tight plugin contract that excludes behaviours that we don't want to allow than it is to make a blanket statement that we don't support importing pip into your own process. And that's the fundamental issue here - I don't trust our ability to keep ourselves out of trouble once we loosen the current rules. This isn't to say I'm totally against this idea. But my instincts are to retain our current stance, and simply declare that while we have plugins, all uses of plugins are unsupported and may break at any time, without warning. Essentially, that's the same footing that projects like pip-tools have to live with right now, and if plugin authors don't like that, then so be it.
That's not really the sticking point here (well, it is a sticking point, but it's not the most important one). The important issue is that we don't have maintainer time1 to review multiple plugin proposals. So there's a good chance that new plugin types can expect delays of months, or quite possibly years, before getting approved. There's much more important pip features that have been stalled for that sort of time period. So the additional constraint is that anyone proposing a new plugin type must be prepared to stick with the proposal for that sort of timescale.
That's the sort of implied constraint that concerns me. There is ongoing work to try to switch pip to wheel-only downloads by default. One of the benefits of that proposal is that it improves security by removing the risk of running arbitrary code at install time. If the plugin proposal re-opens that risk then our arguments for making the change to wheel-only get undermined. In addition, people who currently choose wheel-only installs because they are inherently more secure, will now be exposed to new risks that they might not be in a position to mitigate. So we should have a transition plan - we could require an opt-in flag to install wheels that include a pip plugin hook, for example. But this further complicates the whole proposal, in terms of both implementation and UI.
All the above being said, I do think you're broadly right here. The Python packaging ecosystem is not in a particularly good state right now as far as tight, auditable security is concerned. But does that mean we're OK with adding more features that we might not accept if we did have a strong security position? Sorry - another long message. And I don't think I'm saying much that I haven't already said in one form or another. Basically, I see the value of the feature, but I'm not sure I'm willing to accept the cost of supporting the feature. Footnotes
|
Our (Datadog) primary interest would be a plugin that verifies downloads using the attestations provided by PEP 740. This cannot be done by pip alone because only pure-Python packages may be vendored and |
I might have caused some confusion with the "spec vs. not spec" distinction, sorry! I didn't mean to imply that plugin type like
So,
Yeah. This is perhaps too fine of a hair to split -- what I was trying to say there was that it would be a public API being committed to, but not one that's "uniquely" stable. In other words you could deprecate/remove plugin APIs per your current deprecation policy, much like But in retrospect this is an obvious thing to say, and doesn't change the story for you at all (since it's still a public API). So I think this point is moot 🙂
Yeah, this is how I'm conceiving the API here -- I think only being able to notify I unfortunately agree with your concern, though: I think people will probably ask for all kinds of inadvisable things, and attempt to use the minimal interface propose here as a lever. But I also think that people can be politely (but firmly) redirected to docs/guidelines that explain why
It'd be interesting to hear from @ofek and @sethmlarson, but for my part: I'm personally okay with this! IMO this is a suitable footing, so long as plugin authors perform nightly and beta testing against
This seems like an exceedingly fair constraint to me 🙂
Fair point (along with your points above about this potentially undermining a move to a secure wheel-only default). Per your points about not making any stable API promises: maybe a mandatory From there, there could be a longer term pivot towards a special marker for
I'm curious if explicitly considering this unstable (with the burden for breakage being 100% on plugin authors) changes your mind at all here (and also what the other If you think this is still too onerous in the current state of affairs, I'd like to propose just the P.S.: No problem with the long messages! I'm also guilty of them, and I appreciate the effort you've put into reviewing this ideas and helping me clarify them so far. P.P.S: Sorry for the delay -- I thought I responded with this yesterday but found this tab unsent this morning. |
I'll keep it short, just for variety 😉
If plugins are explicitly unsupported, I'd view them as essentially the same as build backends. We'd still get users raising issues, but "speak to the plugin project" would be our answer. As with build backends, plugin authors would get no formal support1. For me, that's acceptable. But the other maintainers may be more cautious than me. Also, note that if a PR to add plugin support is large and/or complex, getting someone to review and merge it might be a problem independently of any approval in principle of the idea. PS When I say "unsupported" that includes
Because guaranteed deprecation processes come under "support". Footnotes
|
Makes sense! I'll await other maintainer opinions here as well 🙂.
Understood -- I think this work should be decomposable into PRs of no more than 2-300 lines each, which is hopefully not too big for independent reviews. But this is something I'll keep an eye on, and include as a design factor. |
That's quite unsupported 😅 |
Indeed 🤣 In reality this may never happen, and we wouldn't deliberately do it, but I'm thinking very specifically of things like refactoring the internals to do things like parallel downloads, or partial downloads, or weird caching tricks. We could end up in a situation where we do a download far down a code path that doesn't have access to the active plugin list. Or we could fire a bunch of subprocesses to do downloads, which wouldn't be able to access plugins in the parent process. In any of these cases, I'd want to reserve the right to make the improvement and not worry about plugins (which I assume are going to be a niche part of our overall user base). And you should remember that the follow up to the conversation would be "you're welcome to submit a PR to fix this" - which actually isn't that different from the response you'd get if the plugin mechanism were supported 😉 |
The use-case that I'd like to support would be okay with this outcome as well, being a plugin means you'd need to be integrating and testing against pip aggressively so I expect any breakages that do arise could be handled. Thanks for your consideration on this, @pfmoore! |
Following up here: my colleague @facutuesca has been working on the architectural side of this ( (As discussed above, we understand that it'll be important to emphasize the lack of stability guarantees around any changes that do get approved here. I'd love to have more discussions about how we can communicate that + contribute any and all docs necessary to keep users from expecting/burdening |
Please note that as I've said previously, I'm a strong -1 on "basic entrypoint detection" in the absence of specific, documented entry point type definitions. It will be a waste of time to submit a proposal for entry point detection without any explicit API contracts, because there's nothing useful to debate/agree/reject. As regards Footnotes
|
I just did that at work actually (with help of a library I had to write for it), but arbitrarily extensible rather than under a command group: |
How would you like these documented? Based on the conversation upthread I thought there was rough consensus on an (explicitly unstable) entrypoint for "dist-inspector", i.e. an interface capable of inspecting download and extraction states without actually being able to mutate them. I put a rough sketch of that idea in #12766 (comment), but I'm happy to create a break-out issue for it if you'd prefer. But it'd also be good to know the "directionality" of the review process here, i.e. whether you and the other
It's not so much unhappiness as that I think the two encompass distinct, but equally valuable, use cases. I think some of this was already articulated upthread, but to coalesce it:
In particular I think there's a strong value case for (2) even without extensive access to |
I'm saying that I'd want a PR adding an actual plugin, not just one that adds the infrastructure. I don't care whether you do a PR for the infrastructure and a second PR for the dist-inspector plugin type, or put both in the same PR, but I don't want to do anything until both parts exist. I want to see how that plugin would be documented, and what the impact is on pip's codebase. I'd like to see the tests that would be added, as they are, in a fundamental sense, the minimum guarantees that we'll provide. I don't want to reason about this in the abstract, I want to see how it would work in practice, with a non-artificial example. Specifically, I'm not comfortable just adding an "architecture". This has to satisfy an actual, real-world, use case. And it can only do that if we add a plugin type that provides some genuine benefit at the same time as we add the architecture.
Sorry, I wasn't sufficiently clear. What I was talking about when I said "this would be a lot simpler" was specifically about subcommands, and not about entrypoints/plugins. And in particular, I was pointing out that we don't need an entry point mechanism to support subcommands. If we want to allow users to add custom subcommands, we can do this by simply saying that What I was asking was for you to articulate why you feel that you need more than this (if, in fact, you do). I know that people have a reluctance to write standalone utilities, but I've never got anyone to say why, and I'm always left with a feeling that the answer is something like "so that I can use pip internals", or "so that the pip maintainers will look after the code for me". The above "run a subprocess" API strips away all of those benefits (that we don't want to allow anyway), and leaves us with the pure question - is it only for consistency of naming? And if not, what is the reason? |
Understood, thank you for elaborating! I suspect the simplest thing for us to do is start with one big PR with the full "big picture," and then break it down as necessary once it passes muster.
Thank you for clarifying here, this was my misunderstanding! I am in 100% agreement that doing extensions via |
I created a draft PR for the implementation here: #12985 (note that it only covers the in-process plugins loaded by entrypoint, not the external |
On the Checking the system executable path would be a good supplement to support non-Python extensions, but entry points should have priority for Python tools. (the "inline activity monitor" proposal and the "external command" proposal feel like they should be separate issues, though) |
What's the problem this feature will solve?
Hello,
pip
maintainers!This is (another) proposal for a plugin/extension system for
pip
. My goals with it are twofold:pip
's API internalspip ext CMD
hierarchy, allowing existingpip-
tooling (including tools that can't easily be integrated intopip
itself or shouldn't be) to provide a better and more consistent UX.TL;DR: a minimal plugin architecture for
pip
would allow for better integrations with external tooling, including codebases (e.g. cryptographic codebases with native components) that cannot be easily or desirably vendored intopip
itself.Describe the solution you'd like
I have two things in mind:
pip
, allowing third-party packages to register plugins.pip ext ...
subcommand hierarchy, populated by plugins that register the appropriate entry point, allowing third-party packages to register wholly independent subcommands.I think both of these would be nice to have, but I think either also makes a good proposal. So I'm curious to hear what others think!
Plugin architecture
My high level idea:
pip
gains awareness of thepip.plugins
entry point group.For example, a plugin might register as:
...where
plugin
is a module object with the following minimal interface:and
PluginType
is:from here, the remaining attributes of the
plugin
module are determined byPluginType
; the intended contract betweenpip
and the plugin is thatpip
will ignore (and warn on?) any plugin of a type it does not recognize.(I have ideas for an initial trial-run
PluginType
, but I want to make sure this basic approach/architecture is amenable before I get into the details there!)pip ext
commandspip ext
subcommands would be a specialization of the above architecture. For example, to registerpip ext frobulate
, a third-party package might register the following:From here, the
cli
attribute is expected to be a module with the following attributes:...where
args
is the list of arguments passed afterpip ext frobulate
.Under this model, subcommands under
pip ext
are entirely responsible for their own lifecycle:pip
provides no public APIs, no additional context besidesargs
(andos.environ
), and the subcommand is expected to handle its own errors.The
description
callable is used solely to populatepip ext --list
, e.g. to an effect like this (probably more nicely rendered):$ pip ext --list plugin description frobulate a brief oneline description of the command wangjangle randomly install a python package compile run pip-compile
Timeline
Either (or both) of these would be a significant feature addition to
pip
. As such, my thinking is that they should go throughpip --use-feature
like other experimental features, e.g.:pip --use-feature=plugins pip --use-feature=extensions # or combined, no distinction? pip --use-feature=plugins
From there, plugin/
pip ext
developers could experiment with either feature before they're fully stabilized, withoutpip
committing to an exact API/interface until stabilization.Alternative Solutions
The minimal alternative here is "do nothing." 🙂
However, for each of the above:
pip
plugins: expect people to wrappip
instead via its public CLI (or a wrapper likepip-api
. Where this isn't sufficiently introspective, users/communities could build their own one-off tools. This is more or less the status quo, and results in a lot of duplication/tools that buggily wrappip
(like some of my tools).pip ext
subcommands: Continue the status quo of people (informally) signaling the adjacency of their tool topip
viapip-
, e.g.pip-compile
,pip-tools
,pip-audit
, etc. This is workable, although it's not the nicest UX compared to a unified subcommand CLI. Moreover, it can result in weird mismatches (e.g. wherepip
uses one Python/environment andpip-compile
uses another).Additional context
A lot of ink has been spilled over plugin architectures before: #3999 and #3121 are probably the oldest and most immediately relevant, but there are references to user requests for various plugin/API architectures scattered throughout the issues. I can try to collate all of them, if desired 🙂
After discussion, if some variant of this proposal is amenable, I (and my colleagues) will happily implement it and provide ongoing maintenance for it (like we do for PyPI, twine, gh-action-pypi-publish, etc.) -- our objective is not to drop a pile of new code on
pip
and run away, but to work closely with you all and make sure that anything we propose strikes the right balance between value provided to end users, potential new error modes, and your limited maintenance time.Code of Conduct
The text was updated successfully, but these errors were encountered: