Bug 2049775 - cloud-provider-config change not applied when ExternalCloudProvider enabled
Summary: cloud-provider-config change not applied when ExternalCloudProvider enabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
low
Target Milestone: ---
: 4.11.0
Assignee: Stephen Finucane
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-02 16:59 UTC by rlobillo
Modified: 2022-08-10 10:46 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
(not doc needed since this is a bug in a newly introduced feature)
Clone Of:
Environment:
Last Closed: 2022-08-10 10:46:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
must-gather (11.60 MB, application/gzip)
2022-02-02 16:59 UTC, rlobillo
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-cloud-controller-manager-operator pull 178 0 None Merged Add generation of config for OpenStack CCM 2022-04-06 13:40:08 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:46:54 UTC

Description rlobillo 2022-02-02 16:59:39 UTC
Created attachment 1858720 [details]
must-gather

Description of problem:

Changing the lb-provider on cloud-provider-config is not being applied to the cluster when a new loadbalancer service is created.

Version-Release number of selected component (if applicable): 
4.10.0-0.nightly-2022-02-02-000921
RHOS-16.1-RHEL-8-20210903.n.0

How reproducible: always


Steps to Reproduce:

1. Install OCP cluster enabling the TP features by adding the featureGate manifest. The cloud-provider-config created by the installation is:

apiVersion: v1
data:
[...]
  config: |
    [Global]
    secret-name = openstack-credentials
    secret-namespace = kube-system
    ca-file = /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem
    [LoadBalancer]
    use-octavia = True
kind: ConfigMap
metadata:
  creationTimestamp: "2022-02-02T10:18:11Z"
  name: cloud-provider-config
  namespace: openshift-config
  resourceVersion: "1803"
  uid: 7699608f-f7a9-42ee-9823-5dc533bc3b88

FeatureGate shows that TP features are enabled:

$ oc get featureGate/cluster -o yaml
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    release.openshift.io/create-only: "true"
  creationTimestamp: "2022-02-02T10:17:55Z"
  generation: 1
  name: cluster
  resourceVersion: "1368"
  uid: 48d43fd3-ee78-40ab-9c5e-5832de4faa0f
spec:
  featureSet: TechPreviewNoUpgrade


and the CCM pods are ready:

$ oc get pods -n openshift-cloud-controller-manager 
NAME                                                  READY   STATUS    RESTARTS   AGE
openstack-cloud-controller-manager-6cc678df8b-nhhmr   1/1     Running   0          31m
openstack-cloud-controller-manager-6cc678df8b-vkwbz   1/1     Running   0          31m


2. Performing the changes to use OVN as load-balancer:

$ oc get cm cloud-provider-config -n openshift-config -o yaml                                                                                                                                             
[...]
  config: |
    [Global]
    secret-name = openstack-credentials
    secret-namespace = kube-system
    ca-file = /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem
    [LoadBalancer]
    use-octavia = True
    lb-provider = ovn <-------
    lb-method = SOURCE_IP_PORT <----------
kind: ConfigMap
metadata:
  creationTimestamp: "2022-02-02T10:18:11Z"
  name: cloud-provider-config
  namespace: openshift-config
  resourceVersion: "1803"
  uid: 7699608f-f7a9-42ee-9823-5dc533bc3b88

The nodes are going to unschedulable and then back to ready, so the change is supposedly applied:

$ oc get nodes -w                                     
NAME                          STATUS   ROLES    AGE     VERSION                          
ostest-6cvll-master-0         Ready    master   4h      v1.23.3+b63be7f
ostest-6cvll-master-1         Ready    master   4h      v1.23.3+b63be7f          
ostest-6cvll-master-2         Ready    master   4h      v1.23.3+b63be7f                  
ostest-6cvll-worker-0-9zvbg   Ready    worker   3h43m   v1.23.3+b63be7f                  
ostest-6cvll-worker-0-k6qk4   Ready    worker   3h43m   v1.23.3+b63be7f                  
ostest-6cvll-worker-0-x884z   Ready    worker   3h43m   v1.23.3+b63be7f                  
ostest-6cvll-master-1         Ready,SchedulingDisabled   master   4h1m    v1.23.3+b63be7f
ostest-6cvll-worker-0-x884z   Ready,SchedulingDisabled   worker   3h44m   v1.23.3+b63be7f
ostest-6cvll-worker-0-x884z   Ready,SchedulingDisabled   worker   3h44m   v1.23.3+b63be7f
ostest-6cvll-master-1         Ready,SchedulingDisabled   master   4h1m    v1.23.3+b63be7f
ostest-6cvll-worker-0-x884z   NotReady,SchedulingDisabled   worker   3h46m   v1.23.3+b63be7f
ostest-6cvll-worker-0-x884z   NotReady,SchedulingDisabled   worker   3h46m   v1.23.3+b63be7f
ostest-6cvll-worker-0-x884z   NotReady,SchedulingDisabled   worker   3h46m   v1.23.3+b63be7f
ostest-6cvll-master-1         NotReady,SchedulingDisabled   master   4h3m    v1.23.3+b63be7f
ostest-6cvll-master-1         NotReady,SchedulingDisabled   master   4h3m    v1.23.3+b63be7f
ostest-6cvll-master-1         NotReady,SchedulingDisabled   master   4h3m    v1.23.3+b63be7f
ostest-6cvll-worker-0-x884z   Ready,SchedulingDisabled      worker   3h46m   v1.23.3+b63be7f
ostest-6cvll-worker-0-x884z   Ready,SchedulingDisabled      worker   3h46m   v1.23.3+b63be7f
[...]
$ oc get nodes
NAME                          STATUS   ROLES    AGE     VERSION
ostest-6cvll-master-0         Ready    master   4h15m   v1.23.3+b63be7f
ostest-6cvll-master-1         Ready    master   4h15m   v1.23.3+b63be7f
ostest-6cvll-master-2         Ready    master   4h15m   v1.23.3+b63be7f
ostest-6cvll-worker-0-9zvbg   Ready    worker   3h58m   v1.23.3+b63be7f
ostest-6cvll-worker-0-k6qk4   Ready    worker   3h58m   v1.23.3+b63be7f
ostest-6cvll-worker-0-x884z   Ready    worker   3h58m   v1.23.3+b63be7f


Then create the loadbalancer service test with below manifest:

cat <<EOF | oc apply -f -
---
apiVersion: project.openshift.io/v1
kind: Project
metadata:
  name: lb-test-ns
  labels:
    kubernetes.io/metadata.name: lb-test-ns
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: lb-test-dep
  namespace: lb-test-ns
  labels:
    app: lb-test-dep
spec:
  replicas: 2
  selector:
    matchLabels:
      app: lb-test-dep
  template:
    metadata:
      labels:
        app: lb-test-dep
    spec:
      containers:
      - image: quay.io/kuryr/demo
        name: demo
---
apiVersion: v1
kind: Service
metadata:
  name: lb-test-svc
  namespace: lb-test-ns
  labels:
    app: lb-test-dep
spec:
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: lb-test-dep
  type: LoadBalancer
EOF

and the ccm shows below logs:

I0202 16:44:24.514476       1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"                                     
I0202 16:44:24.600205       1 loadbalancer.go:1917] "EnsureLoadBalancer" cluster="kubernetes" service="lb-test-ns/lb-test-svc"                                                                                                               
I0202 16:44:26.324139       1 loadbalancer.go:1810] "Creating fully populated loadbalancer" lbName="kube_service_kubernetes_lb-test-ns_lb-test-svc" service="lb-test-ns/lb-test-svc"                                                         
I0202 16:44:29.458643       1 loadbalancer.go:151] "Waiting for load balancer ACTIVE" lbID="7f67a1b7-f878-4d67-9c40-fc690fe728e4"                                                                                                            
I0202 16:45:59.636211       1 loadbalancer.go:165] "Load balancer ACTIVE" lbID="7f67a1b7-f878-4d67-9c40-fc690fe728e4"
E0202 16:45:59.636338       1 controller.go:310] error processing service lb-test-ns/lb-test-svc (will retry): failed to ensure load balancer: load balancer 7f67a1b7-f878-4d67-9c40-fc690fe728e4 is not ACTIVE, current provisioning status:
PENDING_CREATE
I0202 16:45:59.636673       1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: load balancer 7f67a1b7-f878-4d67-9c40-fc690fe728e4 is not ACTIVE, current provisioning status: PENDING_CREATE"
I0202 16:46:04.636700       1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"                                     
I0202 16:46:04.636865       1 loadbalancer.go:1917] "EnsureLoadBalancer" cluster="kubernetes" service="lb-test-ns/lb-test-svc"                                                                                                               
E0202 16:46:08.804873       1 controller.go:310] error processing service lb-test-ns/lb-test-svc (will retry): failed to ensure load balancer: failed to patch service object lb-test-ns/lb-test-svc: services "lb-test-svc" is forbidden: User "system:serviceaccount:kube-system:cloud-provider-openstack" cannot patch resource "services" in API group "" in the namespace "lb-test-ns"                                                                                                
I0202 16:46:08.805117       1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: failed to patch service object lb-test-ns/lb-test-svc: services \"lb-test-svc\" is forbidden: User \"system:serviceaccount:kube-system:cloud-provider-openstack\" cannot patch resource \"services\" in API group \"\" in the namespace \"lb-test-ns\""
I0202 16:46:18.806492       1 loadbalancer.go:1917] "EnsureLoadBalancer" cluster="kubernetes" service="lb-test-ns/lb-test-svc"                                                                                                               
I0202 16:46:18.807001       1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"                                     
I0202 16:46:19.648158       1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Normal" reason="EnsuredLoadBalancer" message="Ensured load balancer"                                       
I0202 16:46:19.661117       1 loadbalancer.go:1917] "EnsureLoadBalancer" cluster="kubernetes" service="lb-test-ns/lb-test-svc"                                                                                                               
I0202 16:46:19.662576       1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"                                     
E0202 16:46:20.564024       1 controller.go:310] error processing service lb-test-ns/lb-test-svc (will retry): failed to ensure load balancer: failed to patch service object lb-test-ns/lb-test-svc: services "lb-test-svc" is forbidden: User "system:serviceaccount:kube-system:cloud-provider-openstack" cannot patch resource "services" in API group "" in the namespace "lb-test-ns"                                                                                                
I0202 16:46:20.564142       1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: failed to patch service object lb-test-ns/lb-test-svc: services \"lb-test-svc\" is forbidden: User \"system:serviceaccount:kube-system:cloud-provider-openstack\" cannot patch resource \"services\" in API group \"\" in the namespace \"lb-test-ns\""
I0202 16:46:25.564293       1 loadbalancer.go:1917] "EnsureLoadBalancer" cluster="kubernetes" service="lb-test-ns/lb-test-svc"                                                                                                               
I0202 16:46:25.565090       1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"                                     
I0202 16:46:26.402386       1 event.go:294] "Event occurred" object="lb-test-ns/lb-test-svc" kind="Service" apiVersion="v1" type="Normal" reason="EnsuredLoadBalancer" message="Ensured load balancer"                                 

The loadbalancer svc is working fine:

$ oc get svc -n lb-test-ns
NAME          TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
lb-test-svc   LoadBalancer   172.30.108.184   10.46.44.59   80:30057/TCP   4m6s

$ curl 10.46.44.59
lb-test-dep-68d6754b4d-rxt4l: HELLO! I AM ALIVE!!!

But the provider used is still amphora:

$ openstack loadbalancer list
+--------------------------------------+------------------------------------------------+----------------------------------+-------------+---------------------+----------+
| id                                   | name                                           | project_id                       | vip_address | provisioning_status | provider |
+--------------------------------------+------------------------------------------------+----------------------------------+-------------+---------------------+----------+
| 7f67a1b7-f878-4d67-9c40-fc690fe728e4 | kube_service_kubernetes_lb-test-ns_lb-test-svc | 54b324430f664e01a95a18d4b9b1ae02 | 10.196.2.96 | ACTIVE              | amphora  |
+--------------------------------------+------------------------------------------------+----------------------------------+-------------+---------------------+----------+
$ o floating ip list
+--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------------------+----------------------------------+
| ID                                   | Floating IP Address | Fixed IP Address | Port                                 | Floating Network                     | Project                          |
+--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------------------+----------------------------------+
| 91b72dca-275c-4a87-a578-88f6179771c9 | 10.46.44.31         | 10.196.0.7       | 9235483b-b1a7-4018-baa9-bdcf77e649d4 | b8dcdaf3-e982-4333-8747-36135f6f0016 | 54b324430f664e01a95a18d4b9b1ae02 |
| 9c6d9150-a1a3-4a18-b57f-732a9a1837f4 | 10.46.44.38         | 10.196.0.5       | fa403b74-3bfc-4a1f-8009-c57da52d7985 | b8dcdaf3-e982-4333-8747-36135f6f0016 | 54b324430f664e01a95a18d4b9b1ae02 |
| bc137a18-48f9-4099-8bb6-bc4d3fede731 | 10.46.44.59         | 10.196.2.96      | 1ef6c458-b835-42c0-94a9-d0c27a24d5e0 | b8dcdaf3-e982-4333-8747-36135f6f0016 | 54b324430f664e01a95a18d4b9b1ae02 |
+--------------------------------------+---------------------+------------------+--------------------------------------+--------------------------------------+----------------------------------+


Actual results: Cluster keeps using amphora as lb-provider.


Expected results: Cluster should use ovn as lb-provider after the ccm config change.


Additional info: must-gather attached.

Comment 1 Matthew Booth 2022-02-02 17:07:34 UTC
This is a known issue. We're currently working on a design for it here: https://github.com/openshift/enhancements/pull/1009

It's unlikely to be addressed in 4.10 tech preview, but it would be a blocker for GA.

Comment 2 ShiftStack Bugwatcher 2022-02-03 07:04:10 UTC
Removing the Triaged keyword because:
* the QE automation assessment (flag qe_test_coverage) is missing

Comment 3 Martin André 2022-03-16 13:57:00 UTC
Reassigning to Stephen as he wrote the enhancement proposal and is not working on the implementation in https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/178.

Comment 5 Jon Uriarte 2022-04-28 09:27:58 UTC
Verified in 4.11.0-0.nightly-2022-04-26-181148 on top of OSP 16.1.6.

Verification steps:
1. Install 4.11 with ExternalCloudProvider

      $ openshift-install create manifests --log-level=debug --dir=/home/stack/ostest/
      $ cd ostest/
      $ cat <<EOF >manifests/manifest_feature_gate.yaml
      apiVersion: config.openshift.io/v1
      kind: FeatureGate
      metadata:
        annotations:
          include.release.openshift.io/self-managed-high-availability: "true"
          include.release.openshift.io/single-node-developer: "true"
          release.openshift.io/create-only: "true"
        name: cluster
      spec:
        customNoUpgrade:
          enabled:
          - ExternalCloudProvider
        featureSet: CustomNoUpgrade
      EOF

      $ openshift-install create cluster --log-level=debug --dir=/home/stack/ostest/

2. Change the cloud provider Octavia config in order to use the OVN Octavia driver

$ oc get cm cloud-provider-config -n openshift-config -o yaml                                                                                                                                             
[...]
  config: |
    [Global]
    secret-name = openstack-credentials
    secret-namespace = kube-system
    ca-file = /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem
    [LoadBalancer]
    use-octavia = True
    lb-provider = ovn <-------
    lb-method = SOURCE_IP_PORT <----------
kind: ConfigMap
[...]

The nodes are going to unschedulable and then back to ready, so the change is supposedly applied.

$ oc get nodes
NAME                          STATUS   ROLES    AGE   VERSION
ostest-q6vtt-master-0         Ready    master   74m   v1.23.3+54654d2
ostest-q6vtt-master-1         Ready    master   73m   v1.23.3+54654d2
ostest-q6vtt-master-2         Ready    master   75m   v1.23.3+54654d2
ostest-q6vtt-worker-0-5xjww   Ready    worker   62m   v1.23.3+54654d2
ostest-q6vtt-worker-0-jp9ng   Ready    worker   61m   v1.23.3+54654d2
ostest-q6vtt-worker-0-mdbfs   Ready    worker   62m   v1.23.3+54654d2

3. Create the loadbalancer type svc with below manifest:

cat <<EOF | oc apply -f -
---
apiVersion: project.openshift.io/v1
kind: Project
metadata:
  name: lb-test-ns
  labels:
    kubernetes.io/metadata.name: lb-test-ns
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: lb-test-dep
  namespace: lb-test-ns
  labels:
    app: lb-test-dep
spec:
  replicas: 2
  selector:
    matchLabels:
      app: lb-test-dep
  template:
    metadata:
      labels:
        app: lb-test-dep
    spec:
      containers:
      - image: quay.io/kuryr/demo
        name: demo
---
apiVersion: v1
kind: Service
metadata:
  name: lb-test-svc
  namespace: lb-test-ns
  labels:
    app: lb-test-dep
spec:
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: lb-test-dep
  type: LoadBalancer
EOF

4. Check the LB for the svc is created with the ovn provider

LB
--
+--------------------------------------+------------------------------------------------+----------------------------------+-------------+---------------------+----------+                                                                  
| id                                   | name                                           | project_id                       | vip_address | provisioning_status | provider |                                                                  
+--------------------------------------+------------------------------------------------+----------------------------------+-------------+---------------------+----------+ 
| 7d1b2af8-817f-4490-91a1-032ac8e71895 | kube_service_kubernetes_lb-test-ns_lb-test-svc | 42a001f17da442e9a0e894a1a7052603 | 10.196.1.15 | ACTIVE              | ovn      |                                                                  
...
+--------------------------------------+------------------------------------------------+----------------------------------+-------------+---------------------+----------+                         

svc
---
lb-test-ns    lb-test-svc            LoadBalancer   172.30.18.235    10.0.0.224        80:32343/TCP          5m34s


>> Note that the service does not work due to bug 2074606

Comment 8 errata-xmlrpc 2022-08-10 10:46:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.