1808498 – Wrong attempt to recreate LB API when upgrading Octavia

Bug 1808498 - Wrong attempt to recreate LB API when upgrading Octavia

Summary: Wrong attempt to recreate LB API when upgrading Octavia

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Maysa Macedo
QA Contact:	GenadiC
Docs Contact:
URL:
Whiteboard:
Depends On:	1819129
Blocks:	1808797
TreeView+	depends on / blocked

Reported:	2020-02-28 16:34 UTC by Maysa Macedo
Modified:	2020-07-13 17:22 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1808797 (view as bug list)
Environment:
Last Closed:	2020-07-13 17:22:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-network-operator pull 508	0	None	closed	Bug 1808498: Ensure no API LB recreation happens upon Octavia upgrade	2021-02-21 06:55:00 UTC
Red Hat Product Errata	RHBA-2020:2409	0	None	None	None	2020-07-13 17:22:54 UTC

Description Maysa Macedo 2020-02-28 16:34:23 UTC

Description of problem:

When upgrading Octavia from OSP13 to OSP16 the CNO checks if tagging is supported and if it's tries to look for an API load balancer with tag. However, the existent LB was created with OSP13, which does not support tagging and so the tagging is added on description. The CNO then tries to create a new LB API and fails as the address is already in use.

CNO logs:
2020/02/28 16:21:14 Failed to reconcile platform networking resources: failed to create OpenShift API loadbalancer: failed to create LB: Internal Server Error

Octavia logs:
Neutron server returns request_ids: ['req-b0e62a7e-26f7-4310-9749-ddf971dab7dc']: octavia_lib.api.drivers.exceptions.DriverError: IP address 172.30.0.1 already allocated in subnet 5b87bf2e-3038-4d3d-a610-ab8a936d
50ac                                                                                                                                                                                                                
Neutron server returns request_ids: ['req-b0e62a7e-26f7-4310-9749-ddf971dab7dc']                                                                                                                                    
2020-02-28 16:04:20.575 32 ERROR octavia.api.drivers.utils Traceback (most recent call last):                                                                                                                       
2020-02-28 16:04:20.575 32 ERROR octavia.api.drivers.utils   File "/usr/lib/python3.6/site-packages/octavia/network/drivers/neutron/allowed_address_pairs.py", line 450, in allocate_vip                      
2020-02-28 16:04:20.575 32 ERROR octavia.api.drivers.utils     new_port = self.neutron_client.create_port(port)                                                                                                     
2020-02-28 16:04:20.575 32 ERROR octavia.api.drivers.utils   File "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 803, in create_port                                                         
2020-02-28 16:04:20.575 32 ERROR octavia.api.drivers.utils     return self.post(self.ports_path, body=body)                                                                                                         
2020-02-28 16:04:20.575 32 ERROR octavia.api.drivers.utils   File "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 359, in post                                                                
2020-02-28 16:04:20.575 32 ERROR octavia.api.drivers.utils     headers=headers, params=params)                                                                                                                      
2020-02-28 16:04:20.575 32 ERROR octavia.api.drivers.utils   File "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 294, in do_request                                                          
2020-02-28 16:04:20.575 32 ERROR octavia.api.drivers.utils     self._handle_fault_response(status_code, replybody, resp)                                                                                            
2020-02-28 16:04:20.575 32 ERROR octavia.api.drivers.utils   File "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 269, in _handle_fault_response                                              
2020-02-28 16:04:20.575 32 ERROR octavia.api.drivers.utils     exception_handler_v20(status_code, error_body)                                                                                                       
2020-02-28 16:04:20.575 32 ERROR octavia.api.drivers.utils   File "/usr/lib/python3.6/site-packages/neutronclient/v2_0/client.py", line 93, in exception_handler_v20                                            
2020-02-28 16:04:20.575 32 ERROR octavia.api.drivers.utils     request_ids=request_ids)                                                                                                                             
2020-02-28 16:04:20.575 32 ERROR octavia.api.drivers.utils neutronclient.common.exceptions.IpAddressAlreadyAllocatedClient: IP address 172.30.0.1 already allocated in subnet 5b87bf2e-3038-4d3d-a610-ab8a936d50ac  
2020-02-28 16:04:20.575 32 ERROR octavia.api.drivers.utils Neutron server returns request_ids: ['req-b0e62a7e-26f7-4310-9749-ddf971dab7dc']                                                                         


Version-Release number of selected component (if applicable):
Upgrade Octavia OSP13 to Octavia OSP16

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Jon Uriarte 2020-04-02 09:42:16 UTC

Verified in 4.5.0-0.nightly-2020-04-02-004321 on top of OSP 16 RHOS_TRUNK-16.0-RHEL-8-20200324.n.0 compose.

After successful 4.5.0-0.nightly-2020-04-02-004321 installation, the next steps have been followed in order
to reproduce the scenario described in this BZ:

Add a description to the API LB and remove the tag (as it's done in an OSP 13 deployment):

$ openstack loadbalancer list
+--------------------------------------+-------------------------------------+----------------------------------+----------------+---------------------+----------+
| id                                   | name                                | project_id                       | vip_address    | provisioning_status | provider |
+--------------------------------------+-------------------------------------+----------------------------------+----------------+---------------------+----------+
...
| d787e30b-79be-4383-b998-96f7f83465f2 | ostest-xk585-kuryr-api-loadbalancer | bb444bffd5f64283a8ddc9897b149829 | 172.30.0.1     | ACTIVE              | amphora  |
+--------------------------------------+-------------------------------------+----------------------------------+----------------+---------------------+----------+

$ openstack loadbalancer set --description 'openshiftClusterID=ostest-xk585' ostest-xk585-kuryr-api-loadbalancer                                                                                                   

$ openstack loadbalancer show d787e30b-79be-4383-b998-96f7f83465f2                                                                                                              
+---------------------+--------------------------------------+
| Field               | Value                                |
+---------------------+--------------------------------------+
| admin_state_up      | True                                 |
| created_at          | 2020-04-02T08:14:44                  |
| description         | openshiftClusterID=ostest-xk585      |
| flavor_id           | None                                 |
| id                  | d787e30b-79be-4383-b998-96f7f83465f2 |
| listeners           | 2362221d-77ff-4612-b964-cab9bc5a31d0 |
| name                | ostest-xk585-kuryr-api-loadbalancer  |
| operating_status    | DEGRADED                             |
| pools               | df885ed7-f4d3-4f98-91ab-ae0dbf6f71ab |
| project_id          | bb444bffd5f64283a8ddc9897b149829     |
| provider            | amphora                              |
| provisioning_status | ACTIVE                               |
| updated_at          | 2020-04-02T09:35:23                  |
| vip_address         | 172.30.0.1                           |
| vip_network_id      | 0195014d-4745-41dd-b4cf-46b3752d86bd |
| vip_port_id         | 8c2dd26c-33eb-48bc-9d94-50bf4fd5e69c |
| vip_qos_policy_id   | None                                 |
| vip_subnet_id       | 88b4cbb6-16f7-4613-86e2-c3c55b443929 |
+---------------------+--------------------------------------+

The tag needs to be removed from the DB:
[root@controller-0 heat-admin]# podman exec -uroot -it galera-bundle-podman-0 mysql
MariaDB [(none)]> use octavia
MariaDB [octavia]> select * from tags where resource_id='d787e30b-79be-4383-b998-96f7f83465f2';
+--------------------------------------+---------------------------------+
| resource_id                          | tag                             |
+--------------------------------------+---------------------------------+
| d787e30b-79be-4383-b998-96f7f83465f2 | openshiftClusterID=ostest-xk585 |
+--------------------------------------+---------------------------------+

MariaDB [octavia]> delete from tags where resource_id='d787e30b-79be-4383-b998-96f7f83465f2';
Query OK, 1 row affected (0.002 sec)


Now restart CNO - it will start and detect the API LB as it would have been created in OSP 13:
$ oc -n openshift-network-operator delete pod network-operator-cc7649f7-7stdx
$ oc -n openshift-network-operator get pods
NAME                              READY   STATUS    RESTARTS   AGE
network-operator-cc7649f7-vmdmm   1/1     Running   0          25s

Check CNO logs - it detects an existing API LB with it's description, keeps it and tags it:
2020/04/02 09:39:14 Creating OpenShift API loadbalancer with IP 172.30.0.1
2020/04/02 09:39:14 Detected Octavia API v2.13.0
2020/04/02 09:39:14 Tagging existing loadbalancer API d787e30b-79be-4383-b998-96f7f83465f2
2020/04/02 09:39:14 OpenShift API loadbalancer d787e30b-79be-4383-b998-96f7f83465f2 present

Check the tag has been added in DB (as for OSP 16):

MariaDB [octavia]> select * from tags where resource_id='d787e30b-79be-4383-b998-96f7f83465f2';
+--------------------------------------+---------------------------------+
| resource_id                          | tag                             |
+--------------------------------------+---------------------------------+
| d787e30b-79be-4383-b998-96f7f83465f2 | openshiftClusterID=ostest-xk585 |
+--------------------------------------+---------------------------------+

Comment 5 errata-xmlrpc 2020-07-13 17:22:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.