Bug 2051423 - migrate loadbalancers from amphora to ovn not working
Summary: migrate loadbalancers from amphora to ovn not working
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.11.0
Assignee: Michał Dulko
QA Contact: rlobillo
URL:
Whiteboard:
Depends On:
Blocks: 2055661
TreeView+ depends on / blocked
 
Reported: 2022-02-07 08:36 UTC by rlobillo
Modified: 2022-08-10 10:47 UTC (History)
0 users

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 10:47:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 636 0 None open Bug 2051423: Update KLB .spec.provider when required 2022-02-17 08:54:54 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:47:40 UTC

Description rlobillo 2022-02-07 08:36:37 UTC
Description of problem:

Applying below procedure to migrate the loadbalancer from amphora provider to ovn provider is not working in 4.10:

https://docs.openshift.com/container-platform/4.9/networking/load-balancing-openstack.html#installation-osp-kuryr-octavia-configure_load-balancing-openstack

is not migrating the existing amphora loadbalancers to ovn.

Version-Release number of selected component (if applicable):
core_puddle: RHOS-16.1-RHEL-8-20211126.n.1
openshift_puddle: 4.10.0-0.nightly-2022-02-06-060618


How reproducible: Always


Steps to Reproduce:
1. Perform OCP installation with only amphora provider enabled. As a result, 48 load balancers are created:

2022-02-06 14:48:14.212 | TASK [install : Get number of amphoras] ****************************************
2022-02-06 14:48:14.214 | task path: /home/rhos-ci/jenkins/workspace/DFG-osasinfra-shiftstack_periodic-16.1_director-rhel-virthost-3cont_3comp_1ipa-ipv4-geneve,vlan-hybrid_4.10_3master_3worker-updateoctavia/infrared/plugins/openshift/roles/install/tasks/update_octavia_provider.yml:8
2022-02-06 14:48:14.217 | Sunday 06 February 2022  14:48:14 +0000 (0:00:04.665)       0:54:50.222 ******* 
2022-02-06 14:48:19.067 | changed: [undercloud-0] => {
2022-02-06 14:48:19.070 |     "changed": true,
2022-02-06 14:48:19.073 |     "cmd": "source ~/overcloudrc\nopenstack loadbalancer list -c provider -f value | grep amphora | wc -l\n",
2022-02-06 14:48:19.076 |     "delta": "0:00:04.527868",
2022-02-06 14:48:19.078 |     "end": "2022-02-06 14:48:19.029284",
2022-02-06 14:48:19.081 |     "rc": 0,
2022-02-06 14:48:19.083 |     "start": "2022-02-06 14:48:14.501416"
2022-02-06 14:48:19.086 | }
2022-02-06 14:48:19.088 | 
2022-02-06 14:48:19.091 | STDOUT:
2022-02-06 14:48:19.093 | 
2022-02-06 14:48:19.096 | 58

2. Enabling OVN again in octavia.conf in all the controllers:

- name: Enable ovn-octavia
  block:
    - name: Add ovn-octavia on enabled_provider_drivers option
      ini_file:
        path: /var/lib/config-data/puppet-generated/octavia/etc/octavia/octavia.conf
        section: api_settings
        option: enabled_provider_drivers
        value: "amphora: The Octavia Amphora driver.,octavia: Deprecated alias of the Octavia Amphora driver.,ovn: Octavia OVN driver."
      loop: "{{ groups['controller'] }}"
      delegate_to: "{{ item }}"

    - name: Restart Octavia containers
      shell: "podman restart octavia_api"
      loop: "{{ groups['controller'] }}"
      delegate_to: "{{ item }}"
  become: true


3. Annotate kuryr-config:

2022-02-06 14:53:04.737 | TASK [install : Octavia provider update - trigger update removing kuryr-octavia-provider annotation] ***
2022-02-06 14:53:04.740 | task path: /home/rhos-ci/jenkins/workspace/DFG-osasinfra-shiftstack_periodic-16.1_director-rhel-virthost-3cont_3comp_1ipa-ipv4-geneve,vlan-hybrid_4.10_3master_3worker-updateoctavia/infrared/plugins/openshift/roles/install/tasks/update_octavia_provider.yml:46
2022-02-06 14:53:04.743 | Sunday 06 February 2022  14:53:04 +0000 (0:00:06.444)       0:59:40.719 ******* 
2022-02-06 14:53:05.266 | changed: [undercloud-0] => {
2022-02-06 14:53:05.269 |     "changed": true,
2022-02-06 14:53:05.272 |     "cmd": "kubectl -n openshift-kuryr annotate cm kuryr-config networkoperator.openshift.io/kuryr-octavia-provider-\n",
2022-02-06 14:53:05.276 |     "delta": "0:00:00.205740",
2022-02-06 14:53:05.279 |     "end": "2022-02-06 14:53:05.231370",
2022-02-06 14:53:05.283 |     "rc": 0,
2022-02-06 14:53:05.286 |     "start": "2022-02-06 14:53:05.025630"
2022-02-06 14:53:05.290 | }
2022-02-06 14:53:05.293 | 
2022-02-06 14:53:05.297 | STDOUT:
2022-02-06 14:53:05.300 | 
2022-02-06 14:53:05.304 | configmap/kuryr-config annotated

4. Wait until the process starts:

2022-02-06 14:53:05.310 | TASK [install : Active wait for the update to start (max 240 seconds)] *********
2022-02-06 14:53:05.314 | task path: /home/rhos-ci/jenkins/workspace/DFG-osasinfra-shiftstack_periodic-16.1_director-rhel-virthost-3cont_3comp_1ipa-ipv4-geneve,vlan-hybrid_4.10_3master_3worker-updateoctavia/infrared/plugins/openshift/roles/install/tasks/update_octavia_provider.yml:50
2022-02-06 14:53:05.317 | Sunday 06 February 2022  14:53:05 +0000 (0:00:00.593)       0:59:41.313 ******* 
2022-02-06 14:53:05.908 | FAILED - RETRYING: Active wait for the update to start (max 240 seconds) (24 retries left).
2022-02-06 14:53:16.381 | FAILED - RETRYING: Active wait for the update to start (max 240 seconds) (23 retries left).
2022-02-06 14:53:26.852 | FAILED - RETRYING: Active wait for the update to start (max 240 seconds) (22 retries left).
2022-02-06 14:53:37.301 | FAILED - RETRYING: Active wait for the update to start (max 240 seconds) (21 retries left).
2022-02-06 14:53:47.730 | FAILED - RETRYING: Active wait for the update to start (max 240 seconds) (20 retries left).
2022-02-06 14:53:58.199 | FAILED - RETRYING: Active wait for the update to start (max 240 seconds) (19 retries left).
2022-02-06 14:54:08.789 | changed: [undercloud-0] => {
2022-02-06 14:54:08.792 |     "attempts": 7,
2022-02-06 14:54:08.795 |     "changed": true,
2022-02-06 14:54:08.798 |     "cmd": "kubectl get -n openshift-kuryr cm/kuryr-config -o jsonpath='{.metadata.annotations.networkoperator\\.openshift\\.io/kuryr-octavia-provider}'\n",
2022-02-06 14:54:08.801 |     "delta": "0:00:00.249561",
2022-02-06 14:54:08.803 |     "end": "2022-02-06 14:54:08.749260",
2022-02-06 14:54:08.806 |     "rc": 0,
2022-02-06 14:54:08.808 |     "start": "2022-02-06 14:54:08.499699"
2022-02-06 14:54:08.811 | }
2022-02-06 14:54:08.813 | 
2022-02-06 14:54:08.816 | STDOUT:
2022-02-06 14:54:08.818 | 
2022-02-06 14:54:08.821 | ovn

5. Check the existing loadbalancers:

2022-02-06 14:59:20.413 | TASK [install : check number of loadbalancers after octavia update (> 4.8)] ****
2022-02-06 14:59:20.416 | task path: /home/rhos-ci/jenkins/workspace/DFG-osasinfra-shiftstack_periodic-16.1_director-rhel-virthost-3cont_3comp_1ipa-ipv4-geneve,vlan-hybrid_4.10_3master_3worker-updateoctavia/infrared/plugins/openshift/roles/install/tasks/update_octavia_provider.yml:93
2022-02-06 14:59:20.419 | Sunday 06 February 2022  14:59:20 +0000 (0:00:00.059)       1:05:56.433 ******* 
2022-02-06 14:59:20.454 | fatal: [undercloud-0]: FAILED! => {
2022-02-06 14:59:20.457 |     "assertion": "ovnlbs.stdout|int == total_lbs.stdout|int",
2022-02-06 14:59:20.459 |     "changed": false,
2022-02-06 14:59:20.462 |     "evaluated_to": false
2022-02-06 14:59:20.465 | }
2022-02-06 14:59:20.468 | 
2022-02-06 14:59:20.470 | MSG:
2022-02-06 14:59:20.473 | 
2022-02-06 14:59:20.476 | unexpected types of loadbalancer after octavia provider update.
2022-02-06 14:59:20.479 | ovn-octavia lbs: 0
2022-02-06 14:59:20.481 | Total: 59

Actual results: Loadbalancers are not migrated at all.
Expected results: All loadbalancers are moved to ovn provider.

Comment 1 Michał Dulko 2022-02-08 11:30:44 UTC
Okay, I debugged this. I'm setting blocker- due to several reasons:

* This will mostly happen on OSP13->OSP16 upgrade, but 4.10 is not supporting OSP13, so that migration should happen in 4.9, where it works okay.
* There might be folks running OSP16 with OVS+Amphora, in which case they probably won't move to OVN anyway.
* Setups running OSP16 with OVN+Amphora might be affected, but as OVN-Octavia is preferred I bet this is really minor use case.
* For the rare cases described above I can provide a workaround draft:

1. After executing step 3 in [1], manually update all your KuryrLoadBalancer CRDs and set `.spec.provider` to "ovn". This should trigger the migration of the LBs.

[1] https://docs.openshift.com/container-platform/4.9/networking/load-balancing-openstack.html#installation-osp-kuryr-octavia-configure_load-balancing-openstack

Comment 4 rlobillo 2022-03-01 13:58:53 UTC
Verified on 4.11.0-0.nightly-2022-02-27-122819 on top of OSP16.1 (RHOS-16.1-RHEL-8-20211126.n.1)

2022-03-01 13:01:15.024 | TASK [install : Get number of amphoras] ****************************************
2022-03-01 13:01:16.005 |     "invocation": {
2022-03-01 13:01:16.008 |         "module_args": {
2022-03-01 13:01:16.011 |             "_raw_params": "source ~/overcloudrc\nopenstack loadbalancer list -c provider -f value | grep amphora | wc -l\n",
2022-03-01 13:01:16.014 |             "_uses_shell": true,
2022-03-01 13:01:16.016 |             "argv": null,
2022-03-01 13:01:16.019 |             "chdir": null,
2022-03-01 13:01:16.022 |             "creates": null,
2022-03-01 13:01:16.024 |             "executable": null,
2022-03-01 13:01:16.027 |             "removes": null,
2022-03-01 13:01:16.029 |             "stdin": null,
2022-03-01 13:01:16.032 |             "stdin_add_newline": true,
2022-03-01 13:01:16.034 |             "strip_empty_ends": true,
2022-03-01 13:01:16.037 |             "warn": true
2022-03-01 13:01:16.039 |         }
2022-03-01 13:01:16.042 |     },
2022-03-01 13:01:16.044 |     "rc": 0,
2022-03-01 13:01:16.047 |     "start": "2022-03-01 13:01:12.025776"
2022-03-01 13:01:16.049 | }
2022-03-01 13:01:16.052 | 
2022-03-01 13:01:16.054 | STDOUT:
2022-03-01 13:01:16.057 | 
2022-03-01 13:01:16.061 | 58


2022-03-01 13:05:58.431 | TASK [install : Octavia provider update - trigger update removing kuryr-octavia-provider annotation] ***
2022-03-01 13:05:58.898 |     "invocation": {
2022-03-01 13:05:58.901 |         "module_args": {
2022-03-01 13:05:58.903 |             "_raw_params": "kubectl -n openshift-kuryr annotate cm kuryr-config networkoperator.openshift.io/kuryr-octavia-provider-\n",
2022-03-01 13:05:58.906 |             "_uses_shell": true,
2022-03-01 13:05:58.908 |             "argv": null,
2022-03-01 13:05:58.911 |             "chdir": null,
2022-03-01 13:05:58.914 |             "creates": null,
2022-03-01 13:05:58.917 |             "executable": null,
2022-03-01 13:05:58.921 |             "removes": null,
2022-03-01 13:05:58.924 |             "stdin": null,
2022-03-01 13:05:58.926 |             "stdin_add_newline": true,
2022-03-01 13:05:58.929 |             "strip_empty_ends": true,
2022-03-01 13:05:58.932 |             "warn": true
2022-03-01 13:05:58.934 |         }
2022-03-01 13:05:58.937 |     },
2022-03-01 13:05:58.940 |     "rc": 0,
2022-03-01 13:05:58.943 |     "start": "2022-03-01 13:05:58.690031"
2022-03-01 13:05:58.946 | }
2022-03-01 13:05:58.948 | 
2022-03-01 13:05:58.951 | STDOUT:
2022-03-01 13:05:58.954 | 
2022-03-01 13:05:58.956 | configmap/kuryr-config annotated

2022-03-01 13:12:13.168 | TASK [install : Get number of ovn lbs] *****************************************
2022-03-01 13:12:19.280 |     "invocation": {
2022-03-01 13:12:19.283 |         "module_args": {
2022-03-01 13:12:19.286 |             "_raw_params": "source ~/overcloudrc\nopenstack loadbalancer list -c provider -f value | grep ovn | wc -l\n",
2022-03-01 13:12:19.288 |             "_uses_shell": true,
2022-03-01 13:12:19.292 |             "argv": null,
2022-03-01 13:12:19.295 |             "chdir": null,
2022-03-01 13:12:19.298 |             "creates": null,
2022-03-01 13:12:19.302 |             "executable": null,
2022-03-01 13:12:19.305 |             "removes": null,
2022-03-01 13:12:19.308 |             "stdin": null,
2022-03-01 13:12:19.311 |             "stdin_add_newline": true,
2022-03-01 13:12:19.313 |             "strip_empty_ends": true,
2022-03-01 13:12:19.316 |             "warn": true
2022-03-01 13:12:19.319 |         }
2022-03-01 13:12:19.322 |     },
2022-03-01 13:12:19.325 |     "rc": 0,
2022-03-01 13:12:19.328 |     "start": "2022-03-01 13:12:13.975894"
2022-03-01 13:12:19.331 | }
2022-03-01 13:12:19.333 | 
2022-03-01 13:12:19.336 | STDOUT:
2022-03-01 13:12:19.338 | 
2022-03-01 13:12:19.341 | 59


and the verification passes:

$ oc get pods -n demo
NAME                   READY   STATUS    RESTARTS   AGE
demo-96c7f5d59-4kt2d   1/1     Running   0          42m
demo-96c7f5d59-hrvbv   1/1     Running   0          42m
demo-96c7f5d59-rxlpd   1/1     Running   0          42m

$ oc get svc -n demo
NAME   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
demo   ClusterIP   172.30.95.221   <none>        80/TCP    42m

$ openstack loadbalancer list | grep 172.30.95.221
| 9c3413e5-c650-4864-a5bc-cc8b3e74b562 | demo/demo                                                                    | d7e2f9864021463f903b6e2a71c4e7e7 | 172.30.95.221  | ACTIVE              | ovn      |

$ oc rsh -n demo demo-96c7f5d59-4kt2d curl 172.30.95.221
demo-96c7f5d59-hrvbv: HELLO! I AM ALIVE!!!

Comment 6 errata-xmlrpc 2022-08-10 10:47:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.