Bug 2004040
| Summary: | Load Balancers move to ERROR after triggering delete | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Maysa Macedo <mdemaced> |
| Component: | python-ovn-octavia-provider | Assignee: | Terry Wilson <twilson> |
| Status: | CLOSED DUPLICATE | QA Contact: | Eran Kuris <ekuris> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 16.2 (Train) | CC: | gthiemon, itbrown, ksambor |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-10-05 14:00:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
At first glance, it looks similar to BZ 1983607, the following exception was caught by the octavia driver agent: 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver [-] Exception occurred during deletion of loadbalancer: RuntimeError: dictionary changed size during iteration 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver Traceback (most recent call last): 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver File "/usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py", line 1081, in lb_delete 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver status = self._lb_delete(loadbalancer, ovn_lb, status) 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver File "/usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py", line 1136, in _lb_delete 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver for ls in self._find_lb_in_table(ovn_lb, 'Logical_Switch'): 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver File "/usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py", line 464, in _find_lb_in_table 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver return [item for item in self.ovn_nbdb_api.tables[table].rows.values() 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver File "/usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py", line 465, in <listcomp> 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver if lb in item.load_balancer] 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver File "/usr/lib64/python3.6/site-packages/ovs/db/idl.py", line 1042, in __getattr__ 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver dlist = datum.as_list() 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver File "/usr/lib64/python3.6/site-packages/ovs/db/data.py", line 430, in as_list 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver return [k.value for k in self.values.keys()] 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver File "/usr/lib64/python3.6/site-packages/ovs/db/data.py", line 430, in <listcomp> 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver return [k.value for k in self.values.keys()] 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver RuntimeError: dictionary changed size during iteration 2021-09-14 11:20:40.148 11 ERROR networking_ovn.octavia.ovn_driver Thanks for taking a look. Those tracebacks seems related to other lb not the one with ERROR state 665e8dc9-f71e-44ac-92e5-dc319e559f32. And if traceback correspond to what was fixed on osp17, maybe we could backport it? In the logs, we have:
2021-09-14 11:20:42.949 11 DEBUG networking_ovn.octavia.ovn_driver [-] Updating status to octavia: {'loadbalancers': [{'id': '665e8dc9-f71e-44ac-92e5-dc319e559f32', 'provisioning_status': 'ERROR', 'operating_status': 'ERROR'}]} _update_status_to_octavia /usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py:503
The only reason the provisioning_status and the operating_status are in ERROR after a load balancer deletion in the OVN provider is when an exception is received in lb_delete:
https://opendev.org/openstack/networking-ovn/src/branch/stable/train/networking_ovn/octavia/ovn_driver.py#L1085-L1089
And it should be always paired with an exception in the logs.
So I still think that this LB in ERROR is related to the "RuntimeError: dictionary changed size during iteration" exception (We have only one exception in the logs and only one resource in ERROR, so they are related)
Marked as duplicate of BZ 2006852 for the reason that I explained in comment 5 ^^ *** This bug has been marked as a duplicate of bug 2006852 *** |
Description of problem: Load balancers are often moving to ERROR status after triggering a load-balancer delete, which leads to leftovers after OpenShift cluster is destroyed. [root@mecha-central octavia]# cat octavia.log |grep 665e8dc9-f71e-44ac-92e5-dc319e559f32 2021-09-14 11:20:39.844 11 INFO octavia.api.v2.controllers.load_balancer [req-ee7f4949-77cc-4b1b-b503-587ef29c2c28 - 86c07f2bcc514e6ca0d4e20aee8e1d32 - default default] Sending delete Load Balancer 665e8dc9-f71e-44ac-92e5-dc319e559f32 to provider ovn 2021-09-14 11:20:42.949 11 DEBUG networking_ovn.octavia.ovn_driver [-] Updating status to octavia: {'loadbalancers': [{'id': '665e8dc9-f71e-44ac-92e5-dc319e559f32', 'provisioning_status': 'ERROR', 'operating_status': 'ERROR'}]} _update_status_to_octavia /usr/lib/python3.6/site-packages/networking_ovn/octavia/ovn_driver.py:503 Version-Release number of selected component (if applicable): Octavia 16.2_20210811.1 sh-4.4# rpm -qa |grep ovn ovn-2021-21.06.0-17.el8fdp.x86_64 rhosp-ovn-2021-4.el8ost.1.noarch ovn-2021-host-21.06.0-17.el8fdp.x86_64 rhosp-ovn-host-2021-4.el8ost.1.noarch How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: