Hide Forgot
Description of problem: LoabBalancer type services cannot be created due to what seems an error getting openstack-credentials secret (the secret exists though). Version-Release number of selected component (if applicable): OCP 4.9.0-0.nightly-2021-09-14-200602 OSP 16.1.6 How reproducible: always Steps to Reproduce: Install OCP 4.9 on OSP oc new-project test1-ns oc create deployment test1-dep --image=quay.io/kuryr/demo oc scale deployments/test1-dep --replicas=2 oc expose deployment test1-dep --name test1-svc --type=LoadBalancer --port 80 --target-port=8080 Actual results: The LB is not being created in Openstack Expected results: LB created in Openstack Additional info: $ oc get cm cloud-provider-config -n openshift-config -o yaml ... [LoadBalancer] use-octavia = True $ oc -n test1-ns describe svc test1-svc Name: test1-svc Namespace: test1-ns Labels: app=test1-dep Annotations: <none> Selector: app=test1-dep Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.153.52 IPs: 172.30.153.52 Port: <unset> 80/TCP TargetPort: 8080/TCP NodePort: <unset> 31099/TCP Endpoints: 10.128.2.10:8080,10.131.0.63:8080 Session Affinity: None External Traffic Policy: Cluster Events: <none> openshift-kube-controller-manager pod shows: E0915 11:16:45.097228 1 openstack.go:284] cannot get secret openstack-credentials in namespace kube-system. error: "secret \"openstack-credentials\" not found" E0915 11:16:45.097305 1 core.go:91] Failed to start service controller: the cloud provider does not support external load balancers The secret openstack-credentials does exist: $ oc -n kube-system describe secret openstack-credentials Name: openstack-credentials Namespace: kube-system Labels: <none> Annotations: <none> Type: Opaque Data ==== clouds.conf: 300 bytes clouds.yaml: 437 bytes
this looks like an RBAC bug
It seems it's affecting 4.8 as well.
This should also mean that self signed certificates are broken I believe
Another data point - it seems like the issue is a race condition between kube-controller-manager startup and secret creation. Possibly a workaround could be to restart the kube-controller-manager container (it's a static pod, deleting it from the API isn't doing the job).
I can confirm that this is the race condition described above. To work it around you got to restart kube-controller-manager, but it isn't trivial, as it's a static pod defined in /etc/kubernetes/manifests on the masters. A simple way to do it is to make a dummy change to openshift-config/cloud-provider-config ConfigMap. First edit it: oc edit cm cloud-provider-config -n openshift-config Then you can make a dummy change. I just added a "#foobar" comment at the "config" key like this: config: | [Global] secret-name = openstack-credentials secret-namespace = kube-system region = regionOne ca-file = /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem [LoadBalancer] #foobar use-octavia = True This will force reconfiguration of all the nodes, effectively restarting the kube-controller-manager pods. It'll take a long time, depending on the size of the cluster. You got to wait until no nodes will be "NotReady,SchedulingDisabled" in `oc get nodes`. Once done you can edit the ConfigMap again to remove the comment but you'll need to wait for reconfiguration again. An alternative is to SSH into each of the master nodes and moving /etc/kubernetes/manifests/kube-controller-manager-pod.yaml somewhere and then back to that location to force the pod deletion and creation. Given that there's a fairly simple workaround I'm setting blocker-.
I was able to verify this issue does not consistently reproduce on fresh deployments. For example, I didn't see the issue on MOC, on a fresh deployment: moc-dev ❯ oc -n test1-ns describe svc test1-svc Name: test1-svc Namespace: test1-ns Labels: app=test1-dep Annotations: <none> Selector: app=test1-dep Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.143.54 IPs: 172.30.143.54 LoadBalancer Ingress: 128.31.26.238 Port: <unset> 80/TCP TargetPort: 8080/TCP NodePort: <unset> 30753/TCP Endpoints: 10.129.2.11:8080,10.131.0.27:8080 Session Affinity: None External Traffic Policy: Cluster Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal EnsuringLoadBalancer 4m18s service-controller Ensuring load balancer Normal EnsuredLoadBalancer 2m1s service-controller Ensured load balancer moc-dev ❯ openstack loadbalancer show 12e0e9a3-c259-46f7-809f-71c176a02236 +---------------------+--------------------------------------------------------------+ | Field | Value | +---------------------+--------------------------------------------------------------+ | admin_state_up | True | | availability_zone | | | created_at | 2021-09-30T13:15:28 | | description | Kubernetes external service a6aebc446d2454b2c9499a6ee4a8a259 | | flavor_id | None | | id | 12e0e9a3-c259-46f7-809f-71c176a02236 | | listeners | 5c068bf4-3792-4587-a032-003d2f12a728 | | name | a6aebc446d2454b2c9499a6ee4a8a259 | | operating_status | ONLINE | | pools | dd5654be-625e-4857-82d2-62909fcc280f | | project_id | f12f928576ae4d21bdb984da5dd1d3bf | | provider | amphora | | provisioning_status | ACTIVE | | updated_at | 2021-09-30T13:17:35 | | vip_address | 10.0.131.95 | | vip_network_id | 6f0026f1-7f4a-4d34-8552-2662d8514616 | | vip_port_id | 2460bb42-a2c0-48c7-99c7-f8df5b2e615f | | vip_qos_policy_id | None | | vip_subnet_id | 80af4c30-64a7-431d-b7ca-4cc5c7a9a7f5 | +---------------------+--------------------------------------------------------------+ moc-dev ❯ curl http://128.31.26.238 test1-dep-86cd46676d-kjwcp: HELLO! I AM ALIVE!!! The KCM pods haven't been restarted: moc-dev ❯ oc get pods -A | grep kube-controller-manager- openshift-kube-controller-manager-operator kube-controller-manager-operator-7c94444795-jcxk5 1/1 Running 1 (27m ago) 32m openshift-kube-controller-manager kube-controller-manager-mandre-rrf85-master-0 4/4 Running 0 14m openshift-kube-controller-manager kube-controller-manager-mandre-rrf85-master-1 4/4 Running 0 15m openshift-kube-controller-manager kube-controller-manager-mandre-rrf85-master-2 4/4 Running 0 14m
Moving to the KCM team, so they can have a look. Please let us know if that's something we can help.
Honestly, I don't see here a problem with a kube-controller-manager but an integration issue with a specific cloud provider, here OpenStack.
Hi Jon, can you provide us with a must-gather next time you encounter the issue?
*** Bug 2033632 has been marked as a duplicate of this bug. ***
*** Bug 1938188 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056