Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add best practices policies in CEL expressions #925

Merged
merged 69 commits into from
Jun 3, 2024

Conversation

Chandan-DK
Copy link
Contributor

@Chandan-DK Chandan-DK commented Mar 6, 2024

Related Issue(s)

Partially addresses #891

Description

This PR includes the conversion of Kubernetes best practices policies to Kyverno CEL policies. Conversions of other validate policies will be addressed in separate pull requests.

Policies converted in this PR:

  • disallow-cri-sock-mount
  • disallow-default-namespace
  • disallow-empty-ingress-host
  • disallow-helm-tiller
  • disallow-latest-tag
  • require-drop-all
  • require-drop-cap-net-raw
  • require-labels
  • require-pod-requests-limits
  • require-probes
  • require-ro-rootfs
  • restrict-image-registries
  • restrict-node-port
  • restrict-service-external-ips

Checklist

  • I have read the policy contribution guidelines.
  • I have added test manifests and resources covering both positive and negative tests that prove this policy works as intended.
  • I have added the artifacthub-pkg.yml file and have verified it is complete and correct.

@Chandan-DK Chandan-DK force-pushed the convert-best-practices-to-cel branch from 923a7bd to 2908df9 Compare March 8, 2024 06:31
@Chandan-DK Chandan-DK force-pushed the convert-best-practices-to-cel branch from 06231b4 to a448e2e Compare March 9, 2024 14:15
@Chandan-DK Chandan-DK force-pushed the convert-best-practices-to-cel branch from a448e2e to 13f8cb5 Compare March 9, 2024 14:20
Copy link
Contributor

@MariamFahmy98 MariamFahmy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some minor comments. Great job!

Copy link
Contributor

@MariamFahmy98 MariamFahmy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good to me. We are waiting for this fix to be merged in the Kyverno repo and then we can merge this one.
Good Job @Chandan-DK!

@Chandan-DK
Copy link
Contributor Author

Thanks Mariam!

@Chandan-DK
Copy link
Contributor Author

Once the fix is merged, I will add the kyverno tests in this PR itself and move it out of draft mode

@Chandan-DK Chandan-DK marked this pull request as ready for review April 19, 2024 17:43
@MariamFahmy98
Copy link
Contributor

MariamFahmy98 commented Apr 22, 2024

There are some flake tests:

  1. require-drop-all:
    l.go:53: ���������| 11:28:28 | require-drop-all | step-01  | PATCH     | ERROR | kyverno.io/v1/ClusterPolicy @ drop-all-capabilities
            === ERROR
            Operation cannot be fulfilled on clusterpolicies.kyverno.io "drop-all-capabilities": the object has been modified; please apply your changes to the latest version and try again
        l.go:53: ���������| 11:28:28 | require-drop-all | step-01  | PATCH     | ERROR | kyverno.io/v1/ClusterPolicy @ drop-all-capabilities
            === ERROR
            Patch "https://127.0.0.1:39511/apis/kyverno.io/v1/clusterpolicies/drop-all-capabilities": context deadline exceeded
        l.go:53: ���������| 11:28:28 | require-drop-all | step-01  | PATCH     | ERROR | kyverno.io/v1/ClusterPolicy @ drop-all-capabilities
            === ERROR
            Patch "https://127.0.0.1:39511/apis/kyverno.io/v1/clusterpolicies/drop-all-capabilities": context deadline exceeded
    
  2. restrict-image-registries:
    l.go:53: ���������| 11:24:11 | restrict-image-registries | step-01  | PATCH     | ERROR | kyverno.io/v1/ClusterPolicy @ restrict-image-registries
       === ERROR
       Operation cannot be fulfilled on clusterpolicies.kyverno.io "restrict-image-registries": the object has been modified; please apply your changes to the latest version and try again
    l.go:53: ���������| 11:24:14 | restrict-image-registries | step-01  | PATCH     | ERROR | kyverno.io/v1/ClusterPolicy @ restrict-image-registries
       === ERROR
       Operation cannot be fulfilled on clusterpolicies.kyverno.io "restrict-image-registries": the object has been modified; please apply your changes to the latest version and try again
    l.go:53: ���������| 11:24:15 | restrict-image-registries | step-01  | PATCH     | ERROR | kyverno.io/v1/ClusterPolicy @ restrict-image-registries
       === ERROR
       Patch "https://127.0.0.1:39511/apis/kyverno.io/v1/clusterpolicies/restrict-image-registries": context deadline exceeded
    l.go:53: ���������| 11:24:15 | restrict-image-registries | step-01  | PATCH     | ERROR | kyverno.io/v1/ClusterPolicy @ restrict-image-registries
       === ERROR
       Patch "https://127.0.0.1:39511/apis/kyverno.io/v1/clusterpolicies/restrict-image-registries": context deadline exceeded
    
  3. require-run-as-non-root-user: same error
  4. disallow-host-namespaces: same error

It seems that the issue arises when we patch the policy. Could you please have a look?

@MariamFahmy98 MariamFahmy98 self-assigned this Apr 23, 2024
@Chandan-DK
Copy link
Contributor Author

Chandan-DK commented Apr 29, 2024

CEL policies cause a lot of noise in the CI. If we check the past CI runs, we see that there's always a chainsaw test for a CEL policy failing due to the error you have mentioned. It is happening on versions 1.25 and 1.26 specifically. CEL flaky tests don't seem to occur on version 1.27 and above for some reason, I haven't observed any such failures recently.

I tried the following steps to reproduce it:

  1. Create a version 1.26 kind cluster
~/my-space/policies$ kind create cluster --image kindest/node:v1.26.14 --config ./.github/kind.yml
  1. Install latest kyverno
~/my-space/policies$ kubectl create -f https://github.com/kyverno/kyverno/raw/main/config/install-latest-testing.yaml
  1. Run chainsaw test for require-drop-all policy
~/my-space/policies/best-practices-cel/require-drop-all$ chainsaw test
Version: (devel)
Loading default configuration...
- Using test file: chainsaw-test
- TestDirs [.]
- SkipDelete false
- FailFast false
- ReportFormat ''
- ReportName 'chainsaw-report'
- Namespace ''
- FullName false
- IncludeTestRegex ''
- ExcludeTestRegex ''
- ApplyTimeout 5s
- AssertTimeout 30s
- CleanupTimeout 30s
- DeleteTimeout 15s
- ErrorTimeout 30s
- ExecTimeout 5s
- NoCluster false
Loading tests...
- require-drop-all (.chainsaw-test)
Loading values...
Running tests...
=== RUN   chainsaw
=== PAUSE chainsaw
=== CONT  chainsaw
=== RUN   chainsaw/require-drop-all
=== PAUSE chainsaw/require-drop-all
=== CONT  chainsaw/require-drop-all
    | 06:45:49 | require-drop-all | @setup   | CREATE    | OK    | v1/Namespace @ chainsaw-unified-longhorn
    | 06:45:49 | require-drop-all | step-01  | TRY       | RUN   |
    | 06:45:49 | require-drop-all | step-01  | APPLY     | RUN   | kyverno.io/v1/ClusterPolicy @ drop-all-capabilities
    | 06:45:50 | require-drop-all | step-01  | CREATE    | OK    | kyverno.io/v1/ClusterPolicy @ drop-all-capabilities
    | 06:45:50 | require-drop-all | step-01  | APPLY     | DONE  | kyverno.io/v1/ClusterPolicy @ drop-all-capabilities
    | 06:45:50 | require-drop-all | step-01  | PATCH     | RUN   | kyverno.io/v1/ClusterPolicy @ drop-all-capabilities
    | 06:45:52 | require-drop-all | step-01  | PATCH     | ERROR | kyverno.io/v1/ClusterPolicy @ drop-all-capabilities
        === ERROR
        Operation cannot be fulfilled on clusterpolicies.kyverno.io "drop-all-capabilities": the object has been modified; please apply your changes to the latest version and try again
    | 06:45:54 | require-drop-all | step-01  | PATCH     | ERROR | kyverno.io/v1/ClusterPolicy @ drop-all-capabilities
        === ERROR
        Operation cannot be fulfilled on clusterpolicies.kyverno.io "drop-all-capabilities": the object has been modified; please apply your changes to the latest version and try again
    | 06:45:55 | require-drop-all | step-01  | PATCH     | ERROR | kyverno.io/v1/ClusterPolicy @ drop-all-capabilities
        === ERROR
        Patch "https://127.0.0.1:43343/apis/kyverno.io/v1/clusterpolicies/drop-all-capabilities": context deadline exceeded
    | 06:45:55 | require-drop-all | step-01  | PATCH     | ERROR | kyverno.io/v1/ClusterPolicy @ drop-all-capabilities
        === ERROR
        Patch "https://127.0.0.1:43343/apis/kyverno.io/v1/clusterpolicies/drop-all-capabilities": context deadline exceeded
    | 06:45:55 | require-drop-all | step-01  | TRY       | DONE  |
    | 06:45:55 | require-drop-all | @cleanup | DELETE    | RUN   | kyverno.io/v1/ClusterPolicy @ drop-all-capabilities
    | 06:45:55 | require-drop-all | @cleanup | DELETE    | OK    | kyverno.io/v1/ClusterPolicy @ drop-all-capabilities
    | 06:45:55 | require-drop-all | @cleanup | DELETE    | DONE  | kyverno.io/v1/ClusterPolicy @ drop-all-capabilities
    | 06:45:55 | require-drop-all | @cleanup | DELETE    | RUN   | v1/Namespace @ chainsaw-unified-longhorn
    | 06:45:55 | require-drop-all | @cleanup | DELETE    | OK    | v1/Namespace @ chainsaw-unified-longhorn
    | 06:46:00 | require-drop-all | @cleanup | DELETE    | DONE  | v1/Namespace @ chainsaw-unified-longhorn
--- FAIL: chainsaw (0.00s)
    --- FAIL: chainsaw/require-drop-all (11.16s)
FAIL
Tests Summary...
- Passed  tests 0
- Failed  tests 1
- Skipped tests 0
Done with failures.
Error: some tests failed

Logs from the admission controller:

2024-04-29T06:45:50Z    INFO    setup.cluster-policy    logging/controller.go:45        resource added  {"type": "ClusterPolicy", "name": "drop-all-capabilities"}
2024-04-29T06:45:51Z    LEVEL(-2)       klog    rest/request.go:697     Waited for 1.048965431s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/kyverno.io/v1beta1
2024-04-29T06:45:53Z    LEVEL(-2)       klog    rest/request.go:697     Waited for 1.04967128s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/coordination.k8s.io/v1
2024-04-29T06:45:55Z    INFO    setup.cluster-policy    logging/controller.go:68        resource deleted        {"type": "ClusterPolicy", "name": "drop-all-capabilities"}
2024-04-29T06:45:55Z    LEVEL(-2)       klog    rest/request.go:697     Waited for 1.008841763s due to client-side throttling, not priority and fairness, request: GET:https://10.96.0.1:443/apis/node.k8s.io/v1
2024-04-29T06:45:56Z    ERROR   webhooks.policy.validate        handlers/error.go:12    an error has occurred   {"gvk": "kyverno.io/v1, Kind=ClusterPolicy", "gvr": {"group":"kyverno.io","version":"v1","resource":"clusterpolicies"}, "namespace": "", "name": "drop-all-capabilities", "operation": "UPDATE", "uid": "1eb28bc8-51c1-476d-ac2c-3e12be7d3ee0", "user": {"username":"kubernetes-admin","groups":["system:masters","system:authenticated"]}, "url": "/policyvalidate?timeout=10s", "error": "write tcp 10.244.3.2:9443->172.18.0.5:54374: write: broken pipe"}
2024-04-29T06:45:56Z    INFO    webhooks.server logging/log.go:182      2024/04/29 06:45:56 http: superfluous response.WriteHeader call from go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*respWriterWrapper).WriteHeader (wrap.go:88)

It seems like the problem is client-side throttling.

The troubleshooting guide suggests increasing clientRateLimitBurst and clientRateLimitQPS. I set both flags to 120 and kept running the same test. The test passed 14 out of 20 times. Without the flags set, the test passed 4 out of 20 times. I'm not sure if there's a perfect value to ensure that flakes don't occur.

I ran the same test on a kind 1.29.2 cluster without the flags set, and it didn't fail even once after 20 times.
However, I still got this error:

=== ERROR
        Operation cannot be fulfilled on clusterpolicies.kyverno.io "drop-all-capabilities": the object has been modified; please apply your changes to the latest version and try again
    | 08:09:14 | require-drop-all | step-01  | PATCH     | ERROR | kyverno.io/v1/ClusterPolicy @ drop-all-capabilities
        === ERROR
        Operation cannot be fulfilled on clusterpolicies.kyverno.io "drop-all-capabilities": the object has been modified; please apply your changes to the latest version and try again

But there was no error saying: context deadline exceeded

I'm not sure why this problem consistently occurs specifically on versions 1.25 and 1.26.

Anything we can do about it?

Edit: Not sure what has happened but flaky tests seem to have reduced on the latest CI runs

@MariamFahmy98
Copy link
Contributor

@Chandan-DK - Could you please check the failed tests?

@Chandan-DK
Copy link
Contributor Author

@Chandan-DK - Could you please check the failed tests?

Sure, I will send a fix. The tests fail due to chainsaw templating. @eddycharly recently bumped the chainsaw version and it causes this issue.
The solution is to set spec.template to false in the chainsaw test file (https://github.com/kyverno/policies/pull/1029/files#diff-d4f62d0006ca20570a1672eb23621e9dd77120edb09b685c50f0cda968f34753R9)

@Chandan-DK Chandan-DK force-pushed the convert-best-practices-to-cel branch from 08b7e8c to d6ad7cd Compare May 30, 2024 13:15
@MariamFahmy98
Copy link
Contributor

@Chandan-DK - Could you please check the failed tests?

Sure, I will send a fix. The tests fail due to chainsaw templating. @eddycharly recently bumped the chainsaw version and it causes this issue. The solution is to set spec.template to false in the chainsaw test file (https://github.com/kyverno/policies/pull/1029/files#diff-d4f62d0006ca20570a1672eb23621e9dd77120edb09b685c50f0cda968f34753R9)

Thank you!

@MariamFahmy98 MariamFahmy98 merged commit 7908f02 into kyverno:main Jun 3, 2024
174 checks passed
@Chandan-DK Chandan-DK deleted the convert-best-practices-to-cel branch June 5, 2024 03:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants