Bug 2004040

Summary: Load Balancers move to ERROR after triggering delete
Product: Red Hat OpenStack Reporter: Maysa Macedo <mdemaced>
Component: python-ovn-octavia-providerAssignee: Terry Wilson <twilson>
Status: CLOSED DUPLICATE QA Contact: Eran Kuris <ekuris>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 16.2 (Train)CC: gthiemon, itbrown, ksambor
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-05 14:00:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Maysa Macedo 2021-09-14 12:16:16 UTC
Description of problem:

Load balancers are often moving to ERROR status after triggering a load-balancer delete, which leads to leftovers after OpenShift cluster is destroyed.

[root@mecha-central octavia]# cat octavia.log |grep 665e8dc9-f71e-44ac-92e5-dc319e559f32
2021-09-14 11:20:39.844 11 INFO octavia.api.v2.controllers.load_balancer [req-ee7f4949-77cc-4b1b-b503-587ef29c2c28 - 86c07f2bcc514e6ca0d4e20aee8e1d32 - default default] Sending delete Load Balancer 665e8dc9-f71e-44ac-92e5-dc319e559f32 to provider ovn
2021-09-14 11:20:42.949 11 DEBUG networking_ovn.octavia.ovn_driver [-] Updating status to octavia: {'loadbalancers': [{'id': '665e8dc9-f71e-44ac-92e5-dc319e559f32', 'provisioning_status': 'ERROR', 'operating_status': 'ERROR'}]} _update_status_to_octavia /usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py:503

Version-Release number of selected component (if applicable):
Octavia 16.2_20210811.1

sh-4.4# rpm -qa |grep ovn
ovn-2021-21.06.0-17.el8fdp.x86_64
rhosp-ovn-2021-4.el8ost.1.noarch
ovn-2021-host-21.06.0-17.el8fdp.x86_64
rhosp-ovn-host-2021-4.el8ost.1.noarch

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Gregory Thiemonge 2021-09-14 12:45:49 UTC
At first glance, it looks similar to BZ 1983607, the following exception was caught by the octavia driver agent:

2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver [-] Exception occurred during deletion of loadbalancer: RuntimeError: dictionary changed size during iteration
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver Traceback (most recent call last):
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver   File "/usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py", line 1081, in lb_delete
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver     status = self._lb_delete(loadbalancer, ovn_lb, status)
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver   File "/usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py", line 1136, in _lb_delete
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver     for ls in self._find_lb_in_table(ovn_lb, 'Logical_Switch'):
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver   File "/usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py", line 464, in _find_lb_in_table
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver     return [item for item in self.ovn_nbdb_api.tables[table].rows.values()
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver   File "/usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py", line 465, in <listcomp>
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver     if lb in item.load_balancer]
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver   File "/usr/lib64/python3.6/site-packages/ovs/db/idl.py", line 1042, in __getattr__
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver     dlist = datum.as_list()
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver   File "/usr/lib64/python3.6/site-packages/ovs/db/data.py", line 430, in as_list
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver     return [k.value for k in self.values.keys()]
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver   File "/usr/lib64/python3.6/site-packages/ovs/db/data.py", line 430, in <listcomp>
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver     return [k.value for k in self.values.keys()]
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver RuntimeError: dictionary changed size during iteration
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver

Comment 3 Maysa Macedo 2021-09-14 15:01:37 UTC
Thanks for taking a look.

Those tracebacks seems related to other lb not the one with ERROR state 665e8dc9-f71e-44ac-92e5-dc319e559f32. And if traceback correspond to what was fixed on osp17, maybe we could backport it?

Comment 5 Gregory Thiemonge 2021-10-04 10:41:09 UTC
In the logs, we have:

2021-09-14 11:20:42.949 11 DEBUG networking_ovn.octavia.ovn_driver [-] Updating status to octavia: {'loadbalancers': [{'id': '665e8dc9-f71e-44ac-92e5-dc319e559f32', 'provisioning_status': 'ERROR', 'operating_status': 'ERROR'}]} _update_status_to_octavia /usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py:503

The only reason the provisioning_status and the operating_status are in ERROR after a load balancer deletion in the OVN provider is when an exception is received in lb_delete:

https://opendev.org/openstack/networking-ovn/src/branch/stable/train/networking_ovn/octavia/ovn_driver.py#L1085-L1089

And it should be always paired with an exception in the logs.

So I still think that this LB in ERROR is related to the "RuntimeError: dictionary changed size during iteration" exception (We have only one exception in the logs and only one resource in ERROR, so they are related)

Comment 6 Gregory Thiemonge 2021-10-05 14:00:32 UTC
Marked as duplicate of BZ 2006852 for the reason that I explained in comment 5 ^^

*** This bug has been marked as a duplicate of bug 2006852 ***