Bug 2177658

Summary: [OSP16.2] Inconsistency between chassis and chassis_private databases
Product: Red Hat OpenStack Reporter: Matsvei Hauryliuk <mhauryli>
Component: python-networking-ovnAssignee: Fernando Royo <froyo>
Status: MODIFIED --- QA Contact: Eran Kuris <ekuris>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: apevec, dhill, froyo, jlibosva, lhh, majopela, pgrist, ralonsoh, scohen
Target Milestone: z6Keywords: Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Other   
Whiteboard:
Fixed In Version: python-networking-ovn-7.4.2-2.20220409154877.el8osttrunk Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2214548 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2214548    

Description Matsvei Hauryliuk 2023-03-13 09:43:47 UTC
Description of problem:
Client is trying to deploy VMs on a particular compute node but the operation failed.

The following traceback could be found in the logs:

2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc [req-d0b73883-67aa-4be4-a9b6-77705ccc1171 c29260b8c93b4f999c4744bee5776360 07639f0089f341ff9edb87c809dc7c4b - - -] Exception while dispatching port events: 'Chassis_Private' object has no attribute 'hostname': AttributeError: 'Chassis_Private' object has no attribute 'hostname'
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc Traceback (most recent call last):
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/ovo_rpc.py", line 133, in dispatch_events
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	self._resource_push_api.push(context, [obj], rpc_event)
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/handlers/resources_rpc.py", line 245, in push
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	self._push(context, resource_type, type_resources, event_type)
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/handlers/resources_rpc.py", line 251, in _push
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	for version in version_manager.get_resource_versions(resource_type):
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/callbacks/version_manager.py", line 250, in get_resource_versions
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	return _get_cached_tracker().get_resource_versions(resource_type)
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/callbacks/version_manager.py", line 226, in get_resource_versions
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	self._check_expiration()
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/callbacks/version_manager.py", line 222, in _check_expiration
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	self._update_consumer_versions()
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/callbacks/version_manager.py", line 211, in _update_consumer_versions
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	neutron_plugin.get_agents_resource_versions(new_tracker)
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/db/agents_db.py", line 468, in get_agents_resource_versions
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	for agent in self._get_agents_considered_for_versions():
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/db/agents_db.py", line 455, in _get_agents_considered_for_versions
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	filters={'admin_state_up': [True]})
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 1076, in fn
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	return op(results, new_method(*args, _driver=self, **kwargs))
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 1140, in get_agents
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	agent_dict = agent.as_dict()
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/neutron_agent.py", line 60, in as_dict
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	'host': self.chassis.hostname,
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc AttributeError: 'Chassis_Private' object has no attribute 'hostname'
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc

Which looks like an inconsistency between the chassis and the chassis_private tables in the Southbound DB. Restarting the "tripleo_neutron_api.service" container on all controller nodes does remediate the issue temporarily, however a permanent solution is needed as the issue is regular.

One of the possible scenarios we were considering was a high load on the deployment that's causing this, to eliminate this possibility we asked the client to increase the probing interval on all nodes:
ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000

This didn't solve the issue either.

I need your help for RCA on this.

Version-Release number of selected component (if applicable):
ovn-15.5.0-2.20220216005905.4f55857.el8ost.noarch
OSP 16.2.3

How reproducible:
Happens regularly.

Steps to Reproduce:
1.Deploy a VM on a compute.
2.
3.

Actual results:


Expected results:
Active VM deployed on a compute node.

Additional info:
Attached to the case are Sosreports from 3 controllers, the compute where the issue took place, also output of "openstack server show" for the instance as well as ovnnb_db.db and ovnsb_db.db files.