Created attachment 1858720 [details] must-gather Description of problem: Changing the lb-provider on cloud-provider-config is not being applied to the cluster when a new loadbalancer service is created. Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2022-02-02-000921 RHOS-16.1-RHEL-8-20210903.n.0 How reproducible: always Steps to Reproduce: 1. Install OCP cluster enabling the TP features by adding the featureGate manifest. The cloud-provider-config created by the installation is: apiVersion: v1 data: [...] config: | [Global] secret-name = openstack-credentials secret-namespace = kube-system ca-file = /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem [LoadBalancer] use-octavia = True kind: ConfigMap metadata: creationTimestamp: "2022-02-02T10:18:11Z" name: cloud-provider-config namespace: openshift-config resourceVersion: "1803" uid: 7699608f-f7a9-42ee-9823-5dc533bc3b88 FeatureGate shows that TP features are enabled: $ oc get featureGate/cluster -o yaml apiVersion: config.openshift.io/v1 kind: FeatureGate metadata: annotations: include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" release.openshift.io/create-only: "true" creationTimestamp: "2022-02-02T10:17:55Z" generation: 1 name: cluster resourceVersion: "1368" uid: 48d43fd3-ee78-40ab-9c5e-5832de4faa0f spec: featureSet: TechPreviewNoUpgrade and the CCM pods are ready: $ oc get pods -n openshift-cloud-controller-manager NAME READY STATUS RESTARTS AGE openstack-cloud-controller-manager-6cc678df8b-nhhmr 1/1 Running 0 31m openstack-cloud-controller-manager-6cc678df8b-vkwbz 1/1 Running 0 31m 2. Performing the changes to use OVN as load-balancer: $ oc get cm cloud-provider-config -n openshift-config -o yaml [...] config: | [Global] secret-name = openstack-credentials secret-namespace = kube-system ca-file = /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem [LoadBalancer] use-octavia = True lb-provider = ovn <------- lb-method = SOURCE_IP_PORT <---------- kind: ConfigMap metadata: creationTimestamp: "2022-02-02T10:18:11Z" name: cloud-provider-config namespace: openshift-config resourceVersion: "1803" uid: 7699608f-f7a9-42ee-9823-5dc533bc3b88 The nodes are going to unschedulable and then back to ready, so the change is supposedly applied: $ oc get nodes -w NAME STATUS ROLES AGE VERSION ostest-6cvll-master-0 Ready master 4h v1.23.3+b63be7f ostest-6cvll-master-1 Ready master 4h v1.23.3+b63be7f ostest-6cvll-master-2 Ready master 4h v1.23.3+b63be7f ostest-6cvll-worker-0-9zvbg Ready worker 3h43m v1.23.3+b63be7f ostest-6cvll-worker-0-k6qk4 Ready worker 3h43m v1.23.3+b63be7f ostest-6cvll-worker-0-x884z Ready worker 3h43m v1.23.3+b63be7f ostest-6cvll-master-1 Ready,SchedulingDisabled master 4h1m v1.23.3+b63be7f ostest-6cvll-worker-0-x884z Ready,SchedulingDisabled worker 3h44m v1.23.3+b63be7f ostest-6cvll-worker-0-x884z Ready,SchedulingDisabled worker 3h44m v1.23.3+b63be7f ostest-6cvll-master-1 Ready,SchedulingDisabled master 4h1m v1.23.3+b63be7f ostest-6cvll-worker-0-x884z NotReady,SchedulingDisabled worker 3h46m v1.23.3+b63be7f ostest-6cvll-worker-0-x884z NotReady,SchedulingDisabled worker 3h46m v1.23.3+b63be7f ostest-6cvll-worker-0-x884z NotReady,SchedulingDisabled worker 3h46m v1.23.3+b63be7f ostest-6cvll-master-1 NotReady,SchedulingDisabled master 4h3m v1.23.3+b63be7f ostest-6cvll-master-1 NotReady,SchedulingDisabled master 4h3m v1.23.3+b63be7f ostest-6cvll-master-1 NotReady,SchedulingDisabled master 4h3m v1.23.3+b63be7f ostest-6cvll-worker-0-x884z Ready,SchedulingDisabled worker 3h46m v1.23.3+b63be7f ostest-6cvll-worker-0-x884z Ready,SchedulingDisabled worker 3h46m v1.23.3+b63be7f [...] $ oc get nodes NAME STATUS ROLES AGE VERSION ostest-6cvll-master-0 Ready master 4h15m v1.23.3+b63be7f ostest-6cvll-master-1 Ready master 4h15m v1.23.3+b63be7f ostest-6cvll-master-2 Ready master 4h15m v1.23.3+b63be7f ostest-6cvll-worker-0-9zvbg Ready worker 3h58m v1.23.3+b63be7f ostest-6cvll-worker-0-k6qk4 Ready worker 3h58m v1.23.3+b63be7f ostest-6cvll-worker-0-x884z Ready worker 3h58m v1.23.3+b63be7f Then create the loadbalancer service test with below manifest: cat <<EOF | oc apply -f - --- apiVersion: project.openshift.io/v1 kind: Project metadata: name: lb-test-ns labels: kubernetes.io/metadata.name: lb-test-ns --- apiVersion: apps/v1 kind: Deployment metadata: name: lb-test-dep namespace: lb-test-ns labels: app: lb-test-dep spec: replicas: 2 selector: matchLabels: app: lb-test-dep template: metadata: labels: app: lb-test-dep spec: containers: - image: quay.io/kuryr/demo name: demo --- apiVersion: v1 kind: Service metadata: name: lb-test-svc namespace: lb-test-ns labels: app: lb-test-dep spec: ports: - port: 80 targetPort: 8080 selector: app: lb-test-dep type: LoadBalancer EOF and the ccm shows below logs: I0202 16:44:24.514476 1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer" I0202 16:44:24.600205 1 loadbalancer.go:1917] "EnsureLoadBalancer" cluster="kubernetes" service="lb-test-ns/lb-test-svc" I0202 16:44:26.324139 1 loadbalancer.go:1810] "Creating fully populated loadbalancer" lbName="kube_service_kubernetes_lb-test-ns_lb-test-svc" service="lb-test-ns/lb-test-svc" I0202 16:44:29.458643 1 loadbalancer.go:151] "Waiting for load balancer ACTIVE" lbID="7f67a1b7-f878-4d67-9c40-fc690fe728e4" I0202 16:45:59.636211 1 loadbalancer.go:165] "Load balancer ACTIVE" lbID="7f67a1b7-f878-4d67-9c40-fc690fe728e4" E0202 16:45:59.636338 1 controller.go:310] error processing service lb-test-ns/lb-test-svc (will retry): failed to ensure load balancer: load balancer 7f67a1b7-f878-4d67-9c40-fc690fe728e4 is not ACTIVE, current provisioning status: PENDING_CREATE I0202 16:45:59.636673 1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: load balancer 7f67a1b7-f878-4d67-9c40-fc690fe728e4 is not ACTIVE, current provisioning status: PENDING_CREATE" I0202 16:46:04.636700 1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer" I0202 16:46:04.636865 1 loadbalancer.go:1917] "EnsureLoadBalancer" cluster="kubernetes" service="lb-test-ns/lb-test-svc" E0202 16:46:08.804873 1 controller.go:310] error processing service lb-test-ns/lb-test-svc (will retry): failed to ensure load balancer: failed to patch service object lb-test-ns/lb-test-svc: services "lb-test-svc" is forbidden: User "system:serviceaccount:kube-system:cloud-provider-openstack" cannot patch resource "services" in API group "" in the namespace "lb-test-ns" I0202 16:46:08.805117 1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: failed to patch service object lb-test-ns/lb-test-svc: services \"lb-test-svc\" is forbidden: User \"system:serviceaccount:kube-system:cloud-provider-openstack\" cannot patch resource \"services\" in API group \"\" in the namespace \"lb-test-ns\"" I0202 16:46:18.806492 1 loadbalancer.go:1917] "EnsureLoadBalancer" cluster="kubernetes" service="lb-test-ns/lb-test-svc" I0202 16:46:18.807001 1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer" I0202 16:46:19.648158 1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Normal" reason="EnsuredLoadBalancer" message="Ensured load balancer" I0202 16:46:19.661117 1 loadbalancer.go:1917] "EnsureLoadBalancer" cluster="kubernetes" service="lb-test-ns/lb-test-svc" I0202 16:46:19.662576 1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer" E0202 16:46:20.564024 1 controller.go:310] error processing service lb-test-ns/lb-test-svc (will retry): failed to ensure load balancer: failed to patch service object lb-test-ns/lb-test-svc: services "lb-test-svc" is forbidden: User "system:serviceaccount:kube-system:cloud-provider-openstack" cannot patch resource "services" in API group "" in the namespace "lb-test-ns" I0202 16:46:20.564142 1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: failed to patch service object lb-test-ns/lb-test-svc: services \"lb-test-svc\" is forbidden: User \"system:serviceaccount:kube-system:cloud-provider-openstack\" cannot patch resource \"services\" in API group \"\" in the namespace \"lb-test-ns\"" I0202 16:46:25.564293 1 loadbalancer.go:1917] "EnsureLoadBalancer" cluster="kubernetes" service="lb-test-ns/lb-test-svc" I0202 16:46:25.565090 1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer" I0202 16:46:26.402386 1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Normal" reason="EnsuredLoadBalancer" message="Ensured load balancer" The loadbalancer svc is working fine: $ oc get svc -n lb-test-ns NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE lb-test-svc LoadBalancer 172.30.108.184 10.46.44.59 80:30057/TCP 4m6s $ curl 10.46.44.59 lb-test-dep-68d6754b4d-rxt4l: HELLO! I AM ALIVE!!! But the provider used is still amphora: $ openstack loadbalancer list +--------------------------------------+------------------------------------------------+----------------------------------+-------------+---------------------+----------+ | id | name | project_id | vip_address | provisioning_status | provider | +--------------------------------------+------------------------------------------------+----------------------------------+-------------+---------------------+----------+ | 7f67a1b7-f878-4d67-9c40-fc690fe728e4 | kube_service_kubernetes_lb-test-ns_lb-test-svc | 54b324430f664e01a95a18d4b9b1ae02 | 10.196.2.96 | ACTIVE | amphora | +--------------------------------------+------------------------------------------------+----------------------------------+-------------+---------------------+----------+ $ o floating ip list +--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------------------+----------------------------------+ | ID | Floating IP Address | Fixed IP Address | Port | Floating Network | Project | +--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------------------+----------------------------------+ | 91b72dca-275c-4a87-a578-88f6179771c9 | 10.46.44.31 | 10.196.0.7 | 9235483b-b1a7-4018-baa9-bdcf77e649d4 | b8dcdaf3-e982-4333-8747-36135f6f0016 | 54b324430f664e01a95a18d4b9b1ae02 | | 9c6d9150-a1a3-4a18-b57f-732a9a1837f4 | 10.46.44.38 | 10.196.0.5 | fa403b74-3bfc-4a1f-8009-c57da52d7985 | b8dcdaf3-e982-4333-8747-36135f6f0016 | 54b324430f664e01a95a18d4b9b1ae02 | | bc137a18-48f9-4099-8bb6-bc4d3fede731 | 10.46.44.59 | 10.196.2.96 | 1ef6c458-b835-42c0-94a9-d0c27a24d5e0 | b8dcdaf3-e982-4333-8747-36135f6f0016 | 54b324430f664e01a95a18d4b9b1ae02 | +--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------------------+----------------------------------+ Actual results: Cluster keeps using amphora as lb-provider. Expected results: Cluster should use ovn as lb-provider after the ccm config change. Additional info: must-gather attached.
This is a known issue. We're currently working on a design for it here: https://github.com/openshift/enhancements/pull/1009 It's unlikely to be addressed in 4.10 tech preview, but it would be a blocker for GA.
Removing the Triaged keyword because: * the QE automation assessment (flag qe_test_coverage) is missing
Reassigning to Stephen as he wrote the enhancement proposal and is not working on the implementation in https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/178.
Verified in 4.11.0-0.nightly-2022-04-26-181148 on top of OSP 16.1.6. Verification steps: 1. Install 4.11 with ExternalCloudProvider $ openshift-install create manifests --log-level=debug --dir=/home/stack/ostest/ $ cd ostest/ $ cat <<EOF >manifests/manifest_feature_gate.yaml apiVersion: config.openshift.io/v1 kind: FeatureGate metadata: annotations: include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" release.openshift.io/create-only: "true" name: cluster spec: customNoUpgrade: enabled: - ExternalCloudProvider featureSet: CustomNoUpgrade EOF $ openshift-install create cluster --log-level=debug --dir=/home/stack/ostest/ 2. Change the cloud provider Octavia config in order to use the OVN Octavia driver $ oc get cm cloud-provider-config -n openshift-config -o yaml [...] config: | [Global] secret-name = openstack-credentials secret-namespace = kube-system ca-file = /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem [LoadBalancer] use-octavia = True lb-provider = ovn <------- lb-method = SOURCE_IP_PORT <---------- kind: ConfigMap [...] The nodes are going to unschedulable and then back to ready, so the change is supposedly applied. $ oc get nodes NAME STATUS ROLES AGE VERSION ostest-q6vtt-master-0 Ready master 74m v1.23.3+54654d2 ostest-q6vtt-master-1 Ready master 73m v1.23.3+54654d2 ostest-q6vtt-master-2 Ready master 75m v1.23.3+54654d2 ostest-q6vtt-worker-0-5xjww Ready worker 62m v1.23.3+54654d2 ostest-q6vtt-worker-0-jp9ng Ready worker 61m v1.23.3+54654d2 ostest-q6vtt-worker-0-mdbfs Ready worker 62m v1.23.3+54654d2 3. Create the loadbalancer type svc with below manifest: cat <<EOF | oc apply -f - --- apiVersion: project.openshift.io/v1 kind: Project metadata: name: lb-test-ns labels: kubernetes.io/metadata.name: lb-test-ns --- apiVersion: apps/v1 kind: Deployment metadata: name: lb-test-dep namespace: lb-test-ns labels: app: lb-test-dep spec: replicas: 2 selector: matchLabels: app: lb-test-dep template: metadata: labels: app: lb-test-dep spec: containers: - image: quay.io/kuryr/demo name: demo --- apiVersion: v1 kind: Service metadata: name: lb-test-svc namespace: lb-test-ns labels: app: lb-test-dep spec: ports: - port: 80 targetPort: 8080 selector: app: lb-test-dep type: LoadBalancer EOF 4. Check the LB for the svc is created with the ovn provider LB -- +--------------------------------------+------------------------------------------------+----------------------------------+-------------+---------------------+----------+ | id | name | project_id | vip_address | provisioning_status | provider | +--------------------------------------+------------------------------------------------+----------------------------------+-------------+---------------------+----------+ | 7d1b2af8-817f-4490-91a1-032ac8e71895 | kube_service_kubernetes_lb-test-ns_lb-test-svc | 42a001f17da442e9a0e894a1a7052603 | 10.196.1.15 | ACTIVE | ovn | ... +--------------------------------------+------------------------------------------------+----------------------------------+-------------+---------------------+----------+ svc --- lb-test-ns lb-test-svc LoadBalancer 172.30.18.235 10.0.0.224 80:32343/TCP 5m34s >> Note that the service does not work due to bug 2074606
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069