Bug 2004040 - Load Balancers move to ERROR after triggering delete
Summary: Load Balancers move to ERROR after triggering delete
Keywords:
Status: CLOSED DUPLICATE of bug 2006852
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-ovn-octavia-provider
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Terry Wilson
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-14 12:16 UTC by Maysa Macedo
Modified: 2022-06-23 12:30 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-05 14:00:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-9583 0 None None None 2022-06-23 12:30:17 UTC

Description Maysa Macedo 2021-09-14 12:16:16 UTC
Description of problem:

Load balancers are often moving to ERROR status after triggering a load-balancer delete, which leads to leftovers after OpenShift cluster is destroyed.

[root@mecha-central octavia]# cat octavia.log |grep 665e8dc9-f71e-44ac-92e5-dc319e559f32
2021-09-14 11:20:39.844 11 INFO octavia.api.v2.controllers.load_balancer [req-ee7f4949-77cc-4b1b-b503-587ef29c2c28 - 86c07f2bcc514e6ca0d4e20aee8e1d32 - default default] Sending delete Load Balancer 665e8dc9-f71e-44ac-92e5-dc319e559f32 to provider ovn
2021-09-14 11:20:42.949 11 DEBUG networking_ovn.octavia.ovn_driver [-] Updating status to octavia: {'loadbalancers': [{'id': '665e8dc9-f71e-44ac-92e5-dc319e559f32', 'provisioning_status': 'ERROR', 'operating_status': 'ERROR'}]} _update_status_to_octavia /usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py:503

Version-Release number of selected component (if applicable):
Octavia 16.2_20210811.1

sh-4.4# rpm -qa |grep ovn
ovn-2021-21.06.0-17.el8fdp.x86_64
rhosp-ovn-2021-4.el8ost.1.noarch
ovn-2021-host-21.06.0-17.el8fdp.x86_64
rhosp-ovn-host-2021-4.el8ost.1.noarch

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Gregory Thiemonge 2021-09-14 12:45:49 UTC
At first glance, it looks similar to BZ 1983607, the following exception was caught by the octavia driver agent:

2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver [-] Exception occurred during deletion of loadbalancer: RuntimeError: dictionary changed size during iteration
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver Traceback (most recent call last):
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver   File "/usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py", line 1081, in lb_delete
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver     status = self._lb_delete(loadbalancer, ovn_lb, status)
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver   File "/usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py", line 1136, in _lb_delete
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver     for ls in self._find_lb_in_table(ovn_lb, 'Logical_Switch'):
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver   File "/usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py", line 464, in _find_lb_in_table
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver     return [item for item in self.ovn_nbdb_api.tables[table].rows.values()
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver   File "/usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py", line 465, in <listcomp>
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver     if lb in item.load_balancer]
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver   File "/usr/lib64/python3.6/site-packages/ovs/db/idl.py", line 1042, in __getattr__
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver     dlist = datum.as_list()
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver   File "/usr/lib64/python3.6/site-packages/ovs/db/data.py", line 430, in as_list
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver     return [k.value for k in self.values.keys()]
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver   File "/usr/lib64/python3.6/site-packages/ovs/db/data.py", line 430, in <listcomp>
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver     return [k.value for k in self.values.keys()]
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver RuntimeError: dictionary changed size during iteration
2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver

Comment 3 Maysa Macedo 2021-09-14 15:01:37 UTC
Thanks for taking a look.

Those tracebacks seems related to other lb not the one with ERROR state 665e8dc9-f71e-44ac-92e5-dc319e559f32. And if traceback correspond to what was fixed on osp17, maybe we could backport it?

Comment 5 Gregory Thiemonge 2021-10-04 10:41:09 UTC
In the logs, we have:

2021-09-14 11:20:42.949 11 DEBUG networking_ovn.octavia.ovn_driver [-] Updating status to octavia: {'loadbalancers': [{'id': '665e8dc9-f71e-44ac-92e5-dc319e559f32', 'provisioning_status': 'ERROR', 'operating_status': 'ERROR'}]} _update_status_to_octavia /usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py:503

The only reason the provisioning_status and the operating_status are in ERROR after a load balancer deletion in the OVN provider is when an exception is received in lb_delete:

https://opendev.org/openstack/networking-ovn/src/branch/stable/train/networking_ovn/octavia/ovn_driver.py#L1085-L1089

And it should be always paired with an exception in the logs.

So I still think that this LB in ERROR is related to the "RuntimeError: dictionary changed size during iteration" exception (We have only one exception in the logs and only one resource in ERROR, so they are related)

Comment 6 Gregory Thiemonge 2021-10-05 14:00:32 UTC
Marked as duplicate of BZ 2006852 for the reason that I explained in comment 5 ^^

*** This bug has been marked as a duplicate of bug 2006852 ***


Note You need to log in before you can comment on or make changes to this bug.