Bug 2177658

Summary: [OSP16.2] Inconsistency between chassis and chassis_private databases
Product: Red Hat OpenStack Reporter: Matsvei Hauryliuk <mhauryli>
Component: python-networking-ovnAssignee: Fernando Royo <froyo>
Status: CLOSED ERRATA QA Contact: Eran Kuris <ekuris>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: apevec, dhill, froyo, jlibosva, lhh, majopela, mariel, mblue, pgrist, ralonsoh, scohen
Target Milestone: z6Keywords: Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Other   
Whiteboard:
Fixed In Version: python-networking-ovn-7.4.2-2.20220409154881.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2214548 (view as bug list) Environment:
Last Closed: 2023-11-08 19:18:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2214548    

Description Matsvei Hauryliuk 2023-03-13 09:43:47 UTC
Description of problem:
Client is trying to deploy VMs on a particular compute node but the operation failed.

The following traceback could be found in the logs:

2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc [req-d0b73883-67aa-4be4-a9b6-77705ccc1171 c29260b8c93b4f999c4744bee5776360 07639f0089f341ff9edb87c809dc7c4b - - -] Exception while dispatching port events: 'Chassis_Private' object has no attribute 'hostname': AttributeError: 'Chassis_Private' object has no attribute 'hostname'
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc Traceback (most recent call last):
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/ovo_rpc.py", line 133, in dispatch_events
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	self._resource_push_api.push(context, [obj], rpc_event)
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/handlers/resources_rpc.py", line 245, in push
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	self._push(context, resource_type, type_resources, event_type)
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/handlers/resources_rpc.py", line 251, in _push
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	for version in version_manager.get_resource_versions(resource_type):
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/callbacks/version_manager.py", line 250, in get_resource_versions
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	return _get_cached_tracker().get_resource_versions(resource_type)
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/callbacks/version_manager.py", line 226, in get_resource_versions
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	self._check_expiration()
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/callbacks/version_manager.py", line 222, in _check_expiration
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	self._update_consumer_versions()
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/callbacks/version_manager.py", line 211, in _update_consumer_versions
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	neutron_plugin.get_agents_resource_versions(new_tracker)
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/db/agents_db.py", line 468, in get_agents_resource_versions
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	for agent in self._get_agents_considered_for_versions():
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/db/agents_db.py", line 455, in _get_agents_considered_for_versions
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	filters={'admin_state_up': [True]})
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 1076, in fn
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	return op(results, new_method(*args, _driver=self, **kwargs))
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 1140, in get_agents
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	agent_dict = agent.as_dict()
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/neutron_agent.py", line 60, in as_dict
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	'host': self.chassis.hostname,
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc AttributeError: 'Chassis_Private' object has no attribute 'hostname'
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc

Which looks like an inconsistency between the chassis and the chassis_private tables in the Southbound DB. Restarting the "tripleo_neutron_api.service" container on all controller nodes does remediate the issue temporarily, however a permanent solution is needed as the issue is regular.

One of the possible scenarios we were considering was a high load on the deployment that's causing this, to eliminate this possibility we asked the client to increase the probing interval on all nodes:
ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000

This didn't solve the issue either.

I need your help for RCA on this.

Version-Release number of selected component (if applicable):
ovn-15.5.0-2.20220216005905.4f55857.el8ost.noarch
OSP 16.2.3

How reproducible:
Happens regularly.

Steps to Reproduce:
1.Deploy a VM on a compute.
2.
3.

Actual results:


Expected results:
Active VM deployed on a compute node.

Additional info:
Attached to the case are Sosreports from 3 controllers, the compute where the issue took place, also output of "openstack server show" for the instance as well as ovnnb_db.db and ovnsb_db.db files.

Comment 14 Vadim Khitrin 2023-10-08 08:04:08 UTC
Compose `RHOS-16.2-RHEL-8-20231005.n.3` includes the container with this RPM.

Comment 15 Vadim Khitrin 2023-10-16 08:56:58 UTC
Verified on compose `RHOS-16.2-RHEL-8-20231005.n.3`:
```
[root@controller-0 containers]# ovn-sbctl find chassis | grep _uuid | awk '{print $3}' | sort > chassis.txt
[root@controller-0 containers]# ovn-sbctl find chassis_private | grep chassis | awk '{print $3}' | sort > private_chassis.txt
[root@controller-0 containers]# diff chassis.txt private_chassis.txt
```

Comment 22 errata-xmlrpc 2023-11-08 19:18:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.2.6 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6307