Description of problem: I was trying to reproduce BZ1947290 on my environment, so I executed the corresponding downstream CI job. That issue has not been reproduced, but a new one has been. Creation and modification of network elements fails 1/3 times. That made many tempest tests fail: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-update-16.1_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable/59//artifact/tempest-results/tempest-results-neutron.3.html This did not happen after OSP was installed and did not happen after the OSP update. It started failing after the overcloud reboot. I checked PUT and GET network requests managed by controller-1 and -2 succeed and those received by controller-0 fail with errors like this: 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers [req-12556105-1f67-4856-8517-318fc54d5421 0feb356857864cdfaaed7e75d4f4eb30 a11ad50cb4164588a9cd4fb31d5da24c - default default] Mechanism driver 'ovn' failed in create_network_postcommit: KeyError: UUID('6581bdeb-f365-43dc-8564-78e8df28451a') 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers Traceback (most recent call last): 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 111, in transaction 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers yield self._nested_txns_map[cur_thread_id] 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers KeyError: 140443610447040 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers During handling of the above exception, another exception occurred: 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers Traceback (most recent call last): 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/managers.py", line 477, in _call_on_drivers 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context) 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 390, in create_network_postcommit 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers self._ovn_client.create_network(network) 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.6/site-packages/networking_ovn/common/ovn_client.py", line 1618, in create_network 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers self.create_provnet_port(network['id'], segment, txn=txn) 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__ 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers next(self.gen) 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.6/site-packages/networking_ovn/ovsdb/impl_idl_ovn.py", line 183, in transaction 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers yield t 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__ 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers next(self.gen) 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 119, in transaction 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers del self._nested_txns_map[cur_thread_id] 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 69, in __exit__ 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers self.result = self.commit() 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers raise result.ex 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 128, in run 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers txn.results.put(txn.do_commit()) 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 123, in do_commit 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers self.post_commit(txn) 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 70, in post_commit 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers command.post_commit(txn) 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 90, in post_commit 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers row = self.api.tables[self.table_name].rows[real_uuid] 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers File "/usr/lib64/python3.6/collections/__init__.py", line 991, in __getitem__ 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers raise KeyError(key) 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers KeyError: UUID('6581bdeb-f365-43dc-8564-78e8df28451a') 2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers 2021-04-09 09:06:32.935 28 ERROR neutron.plugins.ml2.plugin [req-12556105-1f67-4856-8517-318fc54d5421 0feb356857864cdfaaed7e75d4f4eb30 a11ad50cb4164588a9cd4fb31d5da24c - default default] mechanism_manager.create_network_postcommit failed, deleting network 'edac5837-b637-43b7-ba5f-081995021eca': neutron.plugins.ml2.common.exceptions.MechanismDriverError Something I noticed only in controller-0 neutron server logs is that, after networkers were rebooted (~2021-04-08 18:07), OVN tries to run the command following command during 10 minutes: SetLRouterPortInLSwitchPortCommand(lswitch_port=7fc05a31-57a2-4145-b6b8-6bdda8af06c0, lrouter_port=lrp-7fc05a31-57a2-4145-b6b8-6bdda8af06c0, is_gw_port=True, if_exists=True, lsp_address=router) The first attempt is at 2021-04-08 18:07:40.451 The last attempt is at 2021-04-08 18:17:28.254 [root@controller-0 ~]# zgrep -c "SetLRouterPortInLSwitchPortCommand(lswitch_port=7fc05a31-57a2-4145-b6b8-6bdda8af06c0, lrouter_port=lrp-7fc05a31-57a2-4145-b6b8-6bdda8af06c0, is_gw_port=True, if_exists=True, lsp_address=router)" /var/log/containers/neutron/server.log.6.gz 76932 Log files can be found here: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-update-16.1_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable/59/ Version-Release number of selected component (if applicable): RHOS-16.1-RHEL-8-20210323.n.0 How reproducible: only reproduced once Steps to Reproduce: 1. reboot overcloud nodes 2. try to create some network resources: for i in {0..9}; do openstack network create n$i; done
This is worked around in upstream neutron by ignoring the KeyError. It is produced solely to create a return value that we don't use. The ultimate issue is solved in python-ovs by this patch: https://patchwork.ozlabs.org/project/openvswitch/patch/20210901161526.237479-1-twilson@redhat.com/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.9 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:8795