-
Notifications
You must be signed in to change notification settings - Fork 764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gatekeeper needs a default+override functionality #3082
Comments
I think it's worthwhile noting that Kyverno is actively working on this, and has a default-and-overrides mechanism already in alpha: https://kyverno.io/docs/writing-policies/exceptions/. But it's rather bolted-on and I think my proposal would be a lot more elegant in Gatekeeper. |
This is an interesting idea. I'm wondering how generalizable it is. It seems like it depends on a set of exceptions fitting in to a hierarchy of permissiveness (e.g. sets of parameters that are increasingly more strict such that the first match will provide an exception that "short circuits" stricter scrutiny). This definitely makes sense for numeric-type constraints (CPU, RAM limits and similar). It could also make sense for something like pod security standards, where privileged > baseline > restricted. How many other styles of policy fit into that mold? One example of a probably poor fit would be allowed container registries. It's possible to imagine a system of registries that are tiered by sensitivity, but I don't know if that's a common use case. The simplicity also depends on the sets of exemption labels being disjoint. For example, let's say a user wants to use "baseline" but needs to have "host ports" disabled in their "networking" namespace. They also have to use "privileged" for their "system" namespace. Once the labels stop being disjoint, the hierarchical nature of the exceptions breaks down slightly (it's still expressible, but the complexity starts to creep back in). I think knowing the range of use cases and how these exceptions grow in practice would be important for grounding/evaluating a design. If the overall usefulness is limited to a few constraints (specifically the numeric ones), one possibility may be to do the override in Rego, taking the override directly from the label. Here is a (partially implemented, pseudocode) example of what that might look like: apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8smaxram
spec:
crd:
spec:
names:
kind: K8sMaxRAM
validation:
openAPIV3Schema:
type: object
properties:
defaultMaxRAM:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8smaxram
get_max_ram_from_namespace(object) = max {
ns := data.inventory.cluster.v1.Namespace[object.metadata.namespace]
max := ns.metadata.labels["max-ram-exception"]
}
get_max_ram(object) = max {
not get_max_ram_from_namespace(object)
max := input.parameters.defaultMaxRAM
}
get_max_ram(object) = max {
max := get_max_ram_from_namespace(object)
}
violation[{"msg": msg}] {
max_ram := get_max_ram(input.review.object)
ram_used := input.review.object.spec.ram
ram_used > max_ram
msg := sprintf("ram used must be less than %v", [max_ram])
} The advantage of using the label value directly would be that admins can easily set whatever maximum they want on a per-namespace basis without needing to modify any constraints or label selectors. This should work for all constraints if you want a per-namespace override and is possible today. |
No, I wouldn't say that. Creating such a heirarchy is of course possible with the construct I've proposed, but the policy exceptions I'm descrbing have no need to be "tiered" or "ordered" in any way. The key feature I'm looking for is a default. That just can't be done today with Gatekeeper without a lot of customization in the ConstraintTemplates themselves. What I would like to see is the ability to set a default policy enforcement behavior, and then punch out specific exceptions to the rule, without having to manually manage a collection of mutually-exclusive-collectively-exhaustive matching selectors. In programming languages, this is just a
Why would allowed container registries be a poor fit? Seems like a perfectly reasonable fit to me, if you think of it in terms of a multi-tenant cluster where you want a default behavior for most tenants with overrides for a few. Using my own proposed syntax, the "allowed container registries" policy would look something like this:
Observe that in this case the default case is just the Special Team 1
Special Team 2
Everybody else
That "everybody else" part is what is unique and desirable about this way of configuring a Constraint. In the absence of a default+overrides mechanism, I would have to create three separate Constraints, and very carefully ensure that the
That's only if you think of this strictly in terms of a heirarchy. It's not a heirarchy. It's just a default behavior, with the ability to express overrides. Who cares if the labels and selectors are disjoint, because it's still "first one wins" (like a
It would not be limited to just "orderable" or "sortable" things like numeric values. It will work for any set of parameters to a Constraint. The challenge with your proposal of using labels on the namespace itself to create these sorts of exceptions, is that it's limited to just a namespace. That only works for namespaced multi-tenancy. With Project-based multi-tenancy, where groups of namespaces are assigned automatically to tenants upon creation (this is how Rancher does things and it's glorious), we would need some kind of controller that synchronizes these exception labels across all the namespaces in a given Project. But Rancher already does that for us -- it provides
I can provide dozens of use cases if desired. As I mentioned earlier, we are already doing this in production across nearly 100 multi-tenant Kubernetes clusters with nearly 1000 different development teams. And it works really well...except for the fact that I have to add quite a bit of custom Rego to every Constraint Template I write, to patch in the default-and-overrides behavior I need. |
I updated the top level description based on the discussion in #3081. I should also clarify that my objective here is not to create a single, monolithic Constraint that handles everything in the cluster. I would still have dozens of Constraints. It's just that each Constraint would focus on a single policy/security "topic" with the parameters selected at runtime based on a ruleset (e.g. a I should also clarify that I'm not looking to uproot the current model -- I was quite careful in my proposal to ensure that existing Constraints with a typical For cluster administrators that have no need for a Constraint that has a "default" application across the cluster, or for those that don't do multi-tenancy, or for those that don't mind using the existing methods provided by |
I'll reinforce again that deciding whether the basic concept of "default+overrides" is applicable to Gatekeeper, has already been answered by the Kyverno team, as they're already doing it. They've created CRDs for it and everything. I don't personally like their solution (and I find Kyverno rather obtuse) but clearly there is demand out there for the ability to establish some kind of default behavior and then punching out overrides to that default as needed. |
That's fair, I got distracted by the cascading nature of the RAM example.
Agreed. I assumed it was a poor fit due to the "ordered" lens I put on the initial idea.
+1 An alternative API design with the same effect could be to create a "MultiConstraint" object (better name TBD). Something like below:
Or perhaps, to preserve schema validation, create a per-template GVK, something like Would need to think about the correct balance between expressiveness/DRY-ness, but doing it this way could allow for things like overriding enforcement actions and would decouple the proposed default/exception model from how constraints may evolve in the future (example, multi-enforcement-point-actions) I'm guessing there are plenty of other roughly-logically-equivalent ways to express this MECE-achieving intent worth exploring. There are a couple of ancillary concerns here: K8s Validating Admission PolicyK8s Validating Admission Policy is intended to have the API server itself enforce admission policies. One plan Gatekeeper has is to project templates (at least those which can be projected) onto VAP primitives. I'm not sure VAP currently has enough expressiveness to make projection of this viable. I don't think this is a deal breaker -- there will be plenty of constraint templates that will not be compatible with VAP (for instance, anything that uses referential data). However, it's worth considering if there's anything to be done about that, or how to make the UX/asymmetry grok-able to users. Rego performanceRight now we pre-cache constraints in Rego's data store. IIRC this is to avoid the serialization/deserialization penalty that injecting constraints at eval time would create. If we need to dynamically choose which constraint to use, that approach becomes less viable. @davis-haba the work on iterable cached data may be useful here. I could imagine sideloading a "constraints" data storage object per-request, which could be dynamic AND bypass serialization. This would have the side benefit of allowing us to remove all of these methods from the |
I agree that there is room for debate and discussion about the technical implementation, and also that whatever model is accepted should fit cleanly into the rest of the Gatekeeper code stack and not create unnecessary diversion from upstream K8s architectural direction. Really the key thing that would satisfy my particular use case is a concise and flexible way to express MECE collections of Constraints. It's technically possible today but it's not very concise and requires repetition across multiple Constraints to achieve MECE. Perhaps we can come up with a small but simple approach that would get us 80% of the way there with 20% of the effort. The |
If you're looking for suggestions, I like the names |
Curious if you have any thoughts on this direction, especially if it is an incremental move to a more fully-realized solution
Yep! That was my thought.
Curious if there are benefits other than clarity here. E.g. useful evaluation methods beyond "first matched only". Another question would be if all the constraints in a collection resource must be the same kind or not: mixed kinds may be more useful, but the accounting may be harder as templates are added/removed. Of course, mixed kinds could always be added later if room were left in the API. One design feature of constraints I don't want to lose is composability (AFACT nothing here threatens this). Basically that the consequences of adding/removing a constraint are well-defined and that constraints are non-interacting amongst each other. This allows G8r to play nicely with K8s eventual consistency and means users can mix constraints from various sources without worrying about a regression in enforcement.
Definitely like the names for the ideas proposed. I think knowing what a good path forward would be to know the shape of what we want (e.g. is there an 80% story in here somewhere), then we'll be in a good place to name things. |
Really just for clarity. With traditional Constraints, grouping a bunch together requires something like a naming convention, to make it obvious which Constraints are in or out of the group. The only "built in" grouping mechanism is for the
I could see the value in allowing mixed Which leaves the
Based on the discussion so far (which has been great!) I feel like we've landed on a few key architectural decisions:
Where it sounds like we still have some discussion is:
I feel like this is a way more elegant approach than Kyverno's, which implemented a special CRD just for expressing overrides. With what we're hopefully converging on here, one can start with normal Constraint that behaves in some global fashion. Then when that first exception case comes along, instead of having to create an entirely new resource (which one then has to remember interacts with the original "default" resource), the administrator just has to add a handful of lines of YAML into the original Constraint to clearly express the exceptional condition. As more exceptions appear it's trivial to continue adding to the list. It's self-documenting (which after all is one of the hallmarks of Gatekeeper's approach over competing solutions) and transparent. |
This is a very good feature, I hope it will be implemented in the short time. My 2 cents:
The problem:
Possible solution?
Example: Template: apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8sallowedhostports
...
spec:
crd:
spec:
names:
kind: K8sPSPHostNetworkingPorts
... Baseline constraint: apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sPSPHostNetworkingPorts
metadata:
name: psp-host-network-ports
annotations:
metadata.gatekeeper.sh/weight: 100 # example new feature
spec:
enforcementAction: warn
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
parameters:
hostNetwork: true
min: 8000
max: 9000 Specific constraint: apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sPSPHostNetworkingPorts
metadata:
name: psp-host-network-ports-prometheus-node-exporter
annotations:
metadata.gatekeeper.sh/weight: 90 # example new feature
spec:
...
namespaces:
- monitoring
name: kube-prometheus-stack-prometheus-node-exporter
parameters:
hostNetwork: true
min: 9100
max: 9100 |
Describe the solution you'd like
I need the ability to define a default behavior for the cluster, and then selectively implement overrides to that default behavior. With the toolkit available in Constraints today (notably, the
match
field), combined with the fact that ALL matching Constraints are evaluated for a resource to make a policy decision, this leads to operationally infeasible complexity when trying to implement something that behaves with a default behavior.Let's work through an example. I have a large multi-tenant cluster, using Rancher's Project model. This means I have hundreds of teams, each with the ability to create namespaces in their Project. Rancher Projects are just special labels on namespaces. All the namespaces with the same
field.cattle.io/projectId
label are in the same Project and subject to the same set of security and behavioral policies. Rancher has its own validating webhook that prevents unprivileged tenants from modifying that label, and only allows them to create namespaces in their own Project.Now I, the Kubernetes administrator, want to impose some security or behavioral policy on the cluster. For this example I'll use memory limits, but the principle applies to virtually any Constraint I'd want to create in the cluster.
I start by creating a Constraint that declares that all Pods must have memory limits defined, which may not exceed 16GB.
Immediately I have a problem...this enforcement has now applied to all of my administrative workloads as well, which is undesirable. So I need to exempt all those namespaces from the Constraint. What I need to do is exclude all namespaces with the "System" projectId, e.g. exclude namespaces with
field.cattle.io/projectId=system
or whatever. I can update the Constraint like so:Ok great...now my administrative workloads are exempted. But I still have a problem -- my 16GB limit applies to ALL other namespaces in the cluster. Inevitably, some team is going to come to me and ask for an exception. They need to be able to run Pods with a 32Gi limit. And after evaluation I agree that it's OK. Now I need two Constraints, that implement MECE to ensure that every namespace in the cluster only matches one (and only one) of these memory limit constraints.
This was a little awkward, because I had to make mention of the
32g-team
label in the "primary" (default) Constraint as well as the override Constraint. And I don't like repeating myself. But a few weeks later, another team needs a limit of 64Gi... now we have three Constraints...Now imagine this same system but with hundreds of tenants in the same cluster. And dozens of Constraints that each, as a group, have to be MECE-managed, and a single slip-up that breaks MECE in the
match
sections of a group of Constraints, will cause a production outage. Not good.What I would very much rather, is that each Constraint for a given security/policy enforcement, can be authored with a built-in ability to match the resource being evaluated against a list of matching rules, and the first one that "wins" gets that set of parameters (or overrides to a default set of parameters) applied. So in the example above with 16, 32, and 64 gig teams, instead of needing three separate Constraints, I can instead author a single constraint that guarantees MECE (because there's only one).
My proposal is to add a new field to Constraints called
parameterSelection
. This would be a list of objects. Each object may optionally contain amatch
field (ifmatch
is not specified then it matches all resources), and a requiredparameters
field.The Constraint's top-level
match
still behaves as it does today, providing initial guidance to Gatekeeper on whether the Constraint should be evaluated for the resource at all. Assuming the resource matches the top levelspec.match
field, then Gatekeeper then evaluates each of theparameterSelection
items in order. The first one that has amatch
object that matches the object under review, has itsparameters
object merged over thespec.parameters
object (or sets it, ifspec.parameters
is not present). Upon the first match, further evaluation ofparameterSelection
is halted. If the entire list ofparameterSelection
is exhausted with no match, then the Constraint is evaluated withspec.parameters
unmodified (or, another top level field can be added to Constraint to control the behavior -- I can imagine situations where aborting evaluation of the Constraint would be preferred if nothing inparameterSelection
matches).Additonal attributes can be added to the top level
spec
to further refine the behavior ofparameterSelection
. For example:spec.onParameterSelectionNoMatch: {proceed,allow,warn,deny}
-proceed
continues, usingspec.parameters
unmodified.allow
andwarn
skips evaluating the Constraint and assumes allow (withwarn
issuing a warning in the log and an Event).deny
skips evaluating the Constraint and assumes a denial.spec.parameterSelectionBehavior: {MatchFirst,MatchAny}
-MatchFirst
will stop when the first match is encountered.MatchAny
will continue and merge each matching item'sparameters
overspec.parameters
. This would allow for far more expressive default-and-override behaviors.Environment:
The text was updated successfully, but these errors were encountered: