Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create cluster by Application Credentials without role admin #2131

Open
nguyenhuukhoi opened this issue Jun 24, 2024 · 8 comments
Open
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@nguyenhuukhoi
Copy link
Contributor

nguyenhuukhoi commented Jun 24, 2024

/kind bug

What steps did you take and what happened:

Creating cluster by Application Credentials without admin role will cause create router and network forever util exceed quota. But it is ok when using password method

What did you expect to happen:

Create cluster properly by Application Credentials without role admin role

Anything else you would like to add:

Reconciler error err=<
failed to reconcile network: Expected HTTP response code [201 202] when accessing [POST https://x.x.net:9696/v2.0/networks], but got 409 instead
{"NeutronError": {"type": "OverQuota", "message": "Quota exceeded for resources: ['network'].", "detail": ""}}
controller="openstackcluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="OpenStackCluster" OpenStackCluster="default/capi-quickstartdck" namespace="default" name="capi-quickstartdck" reconcileID="24ecea61-905e-40e5-8266-6cc0b4d95918"

Environment:

  • Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built): v0.10.3
  • Cluster-API version: v1.7.2
  • OpenStack version: 2024.01
  • Minikube/KIND version:
  • Kubernetes version (use kubectl version): 1.27.4
  • OS (e.g. from /etc/os-release): 22.04
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 24, 2024
@nguyenhuukhoi
Copy link
Contributor Author

I have updates:
if I use Application Credentials with load-balancer_member, member, admin(or reader), it is ok.
but
Application Credentials with load-balancer_member, member, it is not ok.

@mdbooth
Copy link
Contributor

mdbooth commented Jul 2, 2024

@nguyenhuukhoi From the error you've pasted, the problem is that you're over quota. This is presumably why admin can do this, because quotas don't apply to admin.

I'm going to close this because it looks like it's working as intended. I think you need to increase your networks quota.

@mdbooth mdbooth closed this as completed Jul 2, 2024
@nguyenhuukhoi
Copy link
Contributor Author

Hello. if i have 10 network, it will take all and 100 network, it is same.

@nguyenhuukhoi
Copy link
Contributor Author

It is ok. I get what you mean. Dont create cluster with admin role? Pls correct me.

@mdbooth
Copy link
Contributor

mdbooth commented Jul 2, 2024

Hello. if i have 10 network, it will take all and 100 network, it is same.

Can you paste some logs from the first network creation failure? The one you posted is just because it's out of quota. If you're saying the controller is looping creating networks until it runs out of whatever quota you gave it, that would be a bug.

@mdbooth mdbooth reopened this Jul 2, 2024
@nguyenhuukhoi
Copy link
Contributor Author

"If you're saying the controller is looping creating networks until it runs out of whatever quota you gave it". Yes, that what i mean. I will collect and post as you say.

@yankcrime
Copy link

The bug (CAPO retrying network / subnet / router creation until you hit quota limits) might be trigged by a number of things, but specifically I've hit this recently and it was caused by changes to Neutron RBAC policies.

The original error which causes CAPO to get stuck in a reconciliation loop until resources are exhausted in my case was:

  "err": "failed to reconcile router: unable to create router interface: Resource not found: [PUT https://xxx.xxx:9696/v2.0/routers/cf143c1c-96c9-4467-b61b-d5e9be704163/add_router_interface], error message: {\"NeutronError\": {\"type\": \"
HTTPNotFound\", \"message\": \"The resource could not be found.\", \"detail\": \"\"}}"

The root cause was related to how the application credential had been created and new Neutron API RBAC policies that were introduced and made the default as of 2023.2:

https://docs.openstack.org/releasenotes/neutron/2023.2.html#upgrade-notes

From the Neutron side, you'll see something like this corresponding with the CAPO router creation request:

2024-07-23 13:29:01.713 26 DEBUG neutron.policy [None req-71805b6e-b6ab-4b34-94b5-dc37a2a2cd25 5a3bb6dcdbe142aeb20df4743e6a0dd0 c3ea7eb8de0c4ff8bfd98ab6aabeefce - - 14b8ac337152470eae38c67237eb59be 14b8ac337152470eae38c67237eb59be] Enforcing rules: ['get_router'] log_rule_list /var/lib/kolla/venv/lib/python3.10/site-packages/neutron/policy.py:457
2024-07-23 13:29:01.714 26 DEBUG neutron.policy [None req-71805b6e-b6ab-4b34-94b5-dc37a2a2cd25 5a3bb6dcdbe142aeb20df4743e6a0dd0 c3ea7eb8de0c4ff8bfd98ab6aabeefce - - 14b8ac337152470eae38c67237eb59be 14b8ac337152470eae38c67237eb59be] Failed policy enforce for 'get_router' enforce /var/lib/kolla/venv/lib/python3.10/site-packages/neutron/policy.py:530                                                                  
2024-07-23 13:29:01.714 26 INFO neutron.api.v2.resource [None req-71805b6e-b6ab-4b34-94b5-dc37a2a2cd25 5a3bb6dcdbe142aeb20df4743e6a0dd0 c3ea7eb8de0c4ff8bfd98ab6aabeefce - - 14b8ac337152470eae38c67237eb59be 14 b8ac337152470eae38c67237eb59be] add_router_interface failed (client error): The resource could not be found.                                                                                                    
2024-07-23 13:29:01.715 26 INFO neutron.wsgi [None req-71805b6e-b6ab-4b34-94b5-dc37a2a2cd25 5a3bb6dcdbe142aeb20df4743e6a0dd0 c3ea7eb8de0c4ff8bfd98ab6aabeefce - - 14b8ac337152470eae38c67237eb59be 14b8ac337152470eae38c67237eb59be] 10.20.1.75,10.20.3.5 "PUT /v2.0/routers/26f86b86-5f5f-4e36-930d-877a075987b2/add_router_interface HTTP/1.1" status: 404  len: 285 time: 0.0809276

Updating the Neutron server configuration as recommended in the release notes solved the problem.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
Status: Inbox
Development

No branches or pull requests

5 participants