Bug 1947790 - POST and PUT requests responded with status 500 after overcloud reboot
Summary: POST and PUT requests responded with status 500 after overcloud reboot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z9
: 16.1 (Train on RHEL 8.2)
Assignee: Jakub Libosvar
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks: 1986341
TreeView+ depends on / blocked
 
Reported: 2021-04-09 09:23 UTC by Eduardo Olivares
Modified: 2022-12-07 20:25 UTC (History)
10 users (show)

Fixed In Version: python-networking-ovn-7.3.1-1.20220125210409.4e24f4c.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1986341 (view as bug list)
Environment:
Last Closed: 2022-12-07 20:24:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1938766 0 None None None 2021-09-01 20:26:58 UTC
OpenStack gerrit 806938 0 None MERGED Fix neutron_pg_drop-related startup issues 2022-01-24 16:02:53 UTC
Red Hat Issue Tracker OSP-2114 0 None None None 2021-11-17 09:37:24 UTC
Red Hat Product Errata RHBA-2022:8795 0 None None None 2022-12-07 20:25:37 UTC

Description Eduardo Olivares 2021-04-09 09:23:39 UTC
Description of problem:
I was trying to reproduce BZ1947290 on my environment, so I executed the corresponding downstream CI job. That issue has not been reproduced, but a new one has been.

Creation and modification of network elements fails 1/3 times. That made many tempest tests fail: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-network-networking-ovn-update-16.1_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable/59//artifact/tempest-results/tempest-results-neutron.3.html

This did not happen after OSP was installed and did not happen after the OSP update. It started failing after the overcloud reboot.


I checked PUT and GET network requests managed by controller-1 and -2 succeed and those received by controller-0 fail with errors like this:
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers [req-12556105-1f67-4856-8517-318fc54d5421 0feb356857864cdfaaed7e75d4f4eb30 a11ad50cb4164588a9cd4fb31d5da24c - default default] Mechanism driver 'ovn' failed in create_network_postcommit: KeyError: UUID('6581bdeb-f365-43dc-8564-78e8df28451a')
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers Traceback (most recent call last):
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 111, in transaction
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers     yield self._nested_txns_map[cur_thread_id]
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers KeyError: 140443610447040
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers During handling of the above exception, another exception occurred:
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers Traceback (most recent call last):
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/managers.py", line 477, in _call_on_drivers
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers     getattr(driver.obj, method_name)(context)
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 390, in create_network_postcommit
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers     self._ovn_client.create_network(network)
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/networking_ovn/common/ovn_client.py", line 1618, in create_network
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers     self.create_provnet_port(network['id'], segment, txn=txn)
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers   File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers     next(self.gen)
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/networking_ovn/ovsdb/impl_idl_ovn.py", line 183, in transaction
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers     yield t
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers   File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers     next(self.gen)
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 119, in transaction
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers     del self._nested_txns_map[cur_thread_id]
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 69, in __exit__
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers     self.result = self.commit()
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers     raise result.ex
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 128, in run
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers     txn.results.put(txn.do_commit())
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 123, in do_commit
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers     self.post_commit(txn)
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 70, in post_commit
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers     command.post_commit(txn)
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 90, in post_commit
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers     row = self.api.tables[self.table_name].rows[real_uuid]
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers   File "/usr/lib64/python3.6/collections/__init__.py", line 991, in __getitem__
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers     raise KeyError(key)
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers KeyError: UUID('6581bdeb-f365-43dc-8564-78e8df28451a')
2021-04-09 09:06:32.934 28 ERROR neutron.plugins.ml2.managers
2021-04-09 09:06:32.935 28 ERROR neutron.plugins.ml2.plugin [req-12556105-1f67-4856-8517-318fc54d5421 0feb356857864cdfaaed7e75d4f4eb30 a11ad50cb4164588a9cd4fb31d5da24c - default default] mechanism_manager.create_network_postcommit failed, deleting network 'edac5837-b637-43b7-ba5f-081995021eca': neutron.plugins.ml2.common.exceptions.MechanismDriverError



Something I noticed only in controller-0 neutron server logs is that, after networkers were rebooted (~2021-04-08 18:07), OVN tries to run the command following command during 10 minutes:
SetLRouterPortInLSwitchPortCommand(lswitch_port=7fc05a31-57a2-4145-b6b8-6bdda8af06c0, lrouter_port=lrp-7fc05a31-57a2-4145-b6b8-6bdda8af06c0, is_gw_port=True, if_exists=True, lsp_address=router)
The first attempt is at 2021-04-08 18:07:40.451
The last attempt is at 2021-04-08 18:17:28.254
[root@controller-0 ~]# zgrep -c "SetLRouterPortInLSwitchPortCommand(lswitch_port=7fc05a31-57a2-4145-b6b8-6bdda8af06c0, lrouter_port=lrp-7fc05a31-57a2-4145-b6b8-6bdda8af06c0, is_gw_port=True, if_exists=True, lsp_address=router)" /var/log/containers/neutron/server.log.6.gz 
76932

Log files can be found here: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-networking-ovn-update-16.1_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable/59/




Version-Release number of selected component (if applicable):
RHOS-16.1-RHEL-8-20210323.n.0

How reproducible:
only reproduced once

Steps to Reproduce:
1. reboot overcloud nodes
2. try to create some network resources: for i in {0..9}; do openstack network create n$i; done

Comment 3 Terry Wilson 2021-09-01 20:26:58 UTC
This is worked around in upstream neutron by ignoring the KeyError. It is produced solely to create a return value that we don't use. The ultimate issue is solved in python-ovs by this patch: https://patchwork.ozlabs.org/project/openvswitch/patch/20210901161526.237479-1-twilson@redhat.com/

Comment 21 errata-xmlrpc 2022-12-07 20:24:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.9 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8795


Note You need to log in before you can comment on or make changes to this bug.