Bug 2026576 - Neutron ovs agent constantly restarting on some compute nodes
Summary: Neutron ovs agent constantly restarting on some compute nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 16.1 (Train)
Hardware: All
OS: All
high
urgent
Target Milestone: ---
: ---
Assignee: Slawek Kaplonski
QA Contact: Candido Campos
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-25 07:47 UTC by Aman Gupta
Modified: 2022-12-19 03:14 UTC (History)
8 users (show)

Fixed In Version: openstack-neutron-15.2.1-1.20211115153406.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-24 10:56:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1881424 0 None None None 2021-11-25 08:48:39 UTC
OpenStack gerrit 732523 0 None MERGED Make NeutronOvsdbIdl singleton 2021-11-25 08:48:39 UTC
Red Hat Issue Tracker OSP-11047 0 None None None 2021-11-25 07:48:21 UTC
Red Hat Product Errata RHSA-2022:0990 0 None None None 2022-03-24 10:56:43 UTC

Description Aman Gupta 2021-11-25 07:47:03 UTC
Description of problem:
Neutron ovs agent constantly restarting on some compute nodes

Version-Release number of selected component (if applicable):
RHOSP 16.1.6


How reproducible:
11 of 54 compute nodes show the issue. container restarts ~6-7 mins.

Steps to Reproduce:
1. 
2.
3.

Actual results:

We are seeing the neutron ovs agent container constantly restarting on some of our compute hosts. This is causing disruption to heat stack deployments and vm creations. 
The container is logging the error below. This looks similar to https://bugs.launchpad.net/neutron/+bug/1881424. But not all hosts are affected. 

2021-11-24 10:10:33.442 779286 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-32ec105f-74bd-4fab-9566-f975380afa71 - - - - -] rpc_loop doing a full sync.
2021-11-24 10:10:33.443 779286 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-32ec105f-74bd-4fab-9566-f975380afa71 - - - - -] Agent rpc_loop - iteration:1 started
2021-11-24 10:10:33.446 779286 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-32ec105f-74bd-4fab-9566-f975380afa71 - - - - -] Physical bridge br-ex was just re-created.
2021-11-24 10:10:33.447 779286 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-32ec105f-74bd-4fab-9566-f975380afa71 - - - - -] Physical bridge br-ex was just re-created.
2021-11-24 10:10:33.448 779286 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-32ec105f-74bd-4fab-9566-f975380afa71 - - - - -] Mapping physical network datacentre to bridge br-ex
2021-11-24 10:10:33.448 779286 ERROR ovsdbapp.backend.ovs_idl.command [req-32ec105f-74bd-4fab-9566-f975380afa71 - - - - -] Error executing command: ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Bridge with name=br-ex
2021-11-24 10:10:33.448 779286 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2021-11-24 10:10:33.448 779286 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 39, in execute
2021-11-24 10:10:33.448 779286 ERROR ovsdbapp.backend.ovs_idl.command     self.run_idl(None)
2021-11-24 10:10:33.448 779286 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 215, in run_idl
2021-11-24 10:10:33.448 779286 ERROR ovsdbapp.backend.ovs_idl.command     record = self.api.lookup(self.table, self.record)
2021-11-24 10:10:33.448 779286 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 172, in lookup
2021-11-24 10:10:33.448 779286 ERROR ovsdbapp.backend.ovs_idl.command     return self._lookup(table, record)
2021-11-24 10:10:33.448 779286 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 215, in _lookup
2021-11-24 10:10:33.448 779286 ERROR ovsdbapp.backend.ovs_idl.command     row = idlutils.row_by_value(self, rl.table, rl.column, record)
2021-11-24 10:10:33.448 779286 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 130, in row_by_value
2021-11-24 10:10:33.448 779286 ERROR ovsdbapp.backend.ovs_idl.command     raise RowNotFound(table=table, col=column, match=match)
2021-11-24 10:10:33.448 779286 ERROR ovsdbapp.backend.ovs_idl.command ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Bridge with name=br-ex
2021-11-24 10:10:33.448 779286 ERROR ovsdbapp.backend.ovs_idl.command
2021-11-24 10:10:33.472 779286 ERROR neutron.agent.common.async_process [-] Error received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: None
2021-11-24 10:10:33.482 779286 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp [req-32ec105f-74bd-4fab-9566-f975380afa71 - - - - -] Agent main thread died of an exception: TypeError: int() can't convert non-string with ex
plicit base


Expected results:

Neutron-ovs-agent to be stable

Additional info:
sosreport on supportshell : Case ID: 03088584

Prolematic compute sosreport:
0020-sosreport-overcloud-computesriovg9pci-3-03088584-2021-11-24-mogxlyr.tar.xz

Compute working fine sosreport:
0060-sosreport-overcloud-computesriov-5-03088584-2021-11-24-vwbehgj.tar.xz

Comment 14 errata-xmlrpc 2022-03-24 10:56:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenStack Platform 16.1 (openstack-neutron) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0990


Note You need to log in before you can comment on or make changes to this bug.