Bug 2004542
Summary: | [osp][octavia lb] cannot create LoadBalancer type svcs | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jon Uriarte <juriarte> | |
Component: | Cloud Compute | Assignee: | Pierre Prinetti <pprinett> | |
Cloud Compute sub component: | OpenStack Provider | QA Contact: | Jon Uriarte <juriarte> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | andbartl, andcosta, aos-bugs, cshepher, emacchi, gferrazs, mabajodu, m.andre, mbooth, mdulko, mfedosin, mfojtik, nagrawal, pprinett | |
Version: | 4.8 | Keywords: | Triaged | |
Target Milestone: | --- | |||
Target Release: | 4.10.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Known Issue | ||
Doc Text: |
Cause: There's a race condition between OpenStack credentials secret creation and kube-controller-manager startup.
Consequence: If it happens OpenStack cloud-provider will not get configured with OpenStack credentials, effectively breaking support for creating Octavia load balancers for LoadBalancer services.
Workaround (if any): Workaround is to restart the kube-controller-manager pods (note that those are static pods, so just deleting them in OpenShift API doesn't do the job, you got to manipulate manifests on the master nodes to do that).
Result: After kube-controller-manager restart, the problem should never repeat on the cluster.
|
Story Points: | --- | |
Clone Of: | ||||
: | 2039373 (view as bug list) | Environment: | ||
Last Closed: | 2022-03-10 16:10:42 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2039373 |
Description
Jon Uriarte
2021-09-15 14:24:33 UTC
this looks like an RBAC bug It seems it's affecting 4.8 as well. This should also mean that self signed certificates are broken I believe Another data point - it seems like the issue is a race condition between kube-controller-manager startup and secret creation. Possibly a workaround could be to restart the kube-controller-manager container (it's a static pod, deleting it from the API isn't doing the job). I can confirm that this is the race condition described above. To work it around you got to restart kube-controller-manager, but it isn't trivial, as it's a static pod defined in /etc/kubernetes/manifests on the masters. A simple way to do it is to make a dummy change to openshift-config/cloud-provider-config ConfigMap. First edit it: oc edit cm cloud-provider-config -n openshift-config Then you can make a dummy change. I just added a "#foobar" comment at the "config" key like this: config: | [Global] secret-name = openstack-credentials secret-namespace = kube-system region = regionOne ca-file = /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem [LoadBalancer] #foobar use-octavia = True This will force reconfiguration of all the nodes, effectively restarting the kube-controller-manager pods. It'll take a long time, depending on the size of the cluster. You got to wait until no nodes will be "NotReady,SchedulingDisabled" in `oc get nodes`. Once done you can edit the ConfigMap again to remove the comment but you'll need to wait for reconfiguration again. An alternative is to SSH into each of the master nodes and moving /etc/kubernetes/manifests/kube-controller-manager-pod.yaml somewhere and then back to that location to force the pod deletion and creation. Given that there's a fairly simple workaround I'm setting blocker-. I was able to verify this issue does not consistently reproduce on fresh deployments. For example, I didn't see the issue on MOC, on a fresh deployment: moc-dev ❯ oc -n test1-ns describe svc test1-svc Name: test1-svc Namespace: test1-ns Labels: app=test1-dep Annotations: <none> Selector: app=test1-dep Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.143.54 IPs: 172.30.143.54 LoadBalancer Ingress: 128.31.26.238 Port: <unset> 80/TCP TargetPort: 8080/TCP NodePort: <unset> 30753/TCP Endpoints: 10.129.2.11:8080,10.131.0.27:8080 Session Affinity: None External Traffic Policy: Cluster Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal EnsuringLoadBalancer 4m18s service-controller Ensuring load balancer Normal EnsuredLoadBalancer 2m1s service-controller Ensured load balancer moc-dev ❯ openstack loadbalancer show 12e0e9a3-c259-46f7-809f-71c176a02236 +---------------------+--------------------------------------------------------------+ | Field | Value | +---------------------+--------------------------------------------------------------+ | admin_state_up | True | | availability_zone | | | created_at | 2021-09-30T13:15:28 | | description | Kubernetes external service a6aebc446d2454b2c9499a6ee4a8a259 | | flavor_id | None | | id | 12e0e9a3-c259-46f7-809f-71c176a02236 | | listeners | 5c068bf4-3792-4587-a032-003d2f12a728 | | name | a6aebc446d2454b2c9499a6ee4a8a259 | | operating_status | ONLINE | | pools | dd5654be-625e-4857-82d2-62909fcc280f | | project_id | f12f928576ae4d21bdb984da5dd1d3bf | | provider | amphora | | provisioning_status | ACTIVE | | updated_at | 2021-09-30T13:17:35 | | vip_address | 10.0.131.95 | | vip_network_id | 6f0026f1-7f4a-4d34-8552-2662d8514616 | | vip_port_id | 2460bb42-a2c0-48c7-99c7-f8df5b2e615f | | vip_qos_policy_id | None | | vip_subnet_id | 80af4c30-64a7-431d-b7ca-4cc5c7a9a7f5 | +---------------------+--------------------------------------------------------------+ moc-dev ❯ curl http://128.31.26.238 test1-dep-86cd46676d-kjwcp: HELLO! I AM ALIVE!!! The KCM pods haven't been restarted: moc-dev ❯ oc get pods -A | grep kube-controller-manager- openshift-kube-controller-manager-operator kube-controller-manager-operator-7c94444795-jcxk5 1/1 Running 1 (27m ago) 32m openshift-kube-controller-manager kube-controller-manager-mandre-rrf85-master-0 4/4 Running 0 14m openshift-kube-controller-manager kube-controller-manager-mandre-rrf85-master-1 4/4 Running 0 15m openshift-kube-controller-manager kube-controller-manager-mandre-rrf85-master-2 4/4 Running 0 14m Moving to the KCM team, so they can have a look. Please let us know if that's something we can help. Honestly, I don't see here a problem with a kube-controller-manager but an integration issue with a specific cloud provider, here OpenStack. Hi Jon, can you provide us with a must-gather next time you encounter the issue? *** Bug 2033632 has been marked as a duplicate of this bug. *** *** Bug 1938188 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |