Bug 2177658 - [OSP16.2] Inconsistency between chassis and chassis_private databases
Summary: [OSP16.2] Inconsistency between chassis and chassis_private databases
Keywords:
Status: MODIFIED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.2 (Train)
Hardware: Unspecified
OS: Other
medium
medium
Target Milestone: z6
: 16.2 (Train on RHEL 8.4)
Assignee: Fernando Royo
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks: 2214548
TreeView+ depends on / blocked
 
Reported: 2023-03-13 09:43 UTC by Matsvei Hauryliuk
Modified: 2023-07-07 09:41 UTC (History)
9 users (show)

Fixed In Version: python-networking-ovn-7.4.2-2.20220409154877.el8osttrunk
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2214548 (view as bug list)
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-23051 0 None None None 2023-03-13 09:45:11 UTC
Red Hat Knowledge Base (Solution) 6958605 0 None None None 2023-07-04 17:49:50 UTC

Description Matsvei Hauryliuk 2023-03-13 09:43:47 UTC
Description of problem:
Client is trying to deploy VMs on a particular compute node but the operation failed.

The following traceback could be found in the logs:

2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc [req-d0b73883-67aa-4be4-a9b6-77705ccc1171 c29260b8c93b4f999c4744bee5776360 07639f0089f341ff9edb87c809dc7c4b - - -] Exception while dispatching port events: 'Chassis_Private' object has no attribute 'hostname': AttributeError: 'Chassis_Private' object has no attribute 'hostname'
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc Traceback (most recent call last):
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/plugins/ml2/ovo_rpc.py", line 133, in dispatch_events
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	self._resource_push_api.push(context, [obj], rpc_event)
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/handlers/resources_rpc.py", line 245, in push
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	self._push(context, resource_type, type_resources, event_type)
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/handlers/resources_rpc.py", line 251, in _push
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	for version in version_manager.get_resource_versions(resource_type):
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/callbacks/version_manager.py", line 250, in get_resource_versions
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	return _get_cached_tracker().get_resource_versions(resource_type)
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/callbacks/version_manager.py", line 226, in get_resource_versions
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	self._check_expiration()
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/callbacks/version_manager.py", line 222, in _check_expiration
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	self._update_consumer_versions()
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/api/rpc/callbacks/version_manager.py", line 211, in _update_consumer_versions
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	neutron_plugin.get_agents_resource_versions(new_tracker)
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/db/agents_db.py", line 468, in get_agents_resource_versions
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	for agent in self._get_agents_considered_for_versions():
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/neutron/db/agents_db.py", line 455, in _get_agents_considered_for_versions
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	filters={'admin_state_up': [True]})
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 1076, in fn
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	return op(results, new_method(*args, _driver=self, **kwargs))
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 1140, in get_agents
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	agent_dict = agent.as_dict()
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc   File "/usr/lib/python3.6/site-packages/networking_ovn/agent/neutron_agent.py", line 60, in as_dict
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc 	'host': self.chassis.hostname,
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc AttributeError: 'Chassis_Private' object has no attribute 'hostname'
2023-01-29 00:32:55.059 24 ERROR neutron.plugins.ml2.ovo_rpc

Which looks like an inconsistency between the chassis and the chassis_private tables in the Southbound DB. Restarting the "tripleo_neutron_api.service" container on all controller nodes does remediate the issue temporarily, however a permanent solution is needed as the issue is regular.

One of the possible scenarios we were considering was a high load on the deployment that's causing this, to eliminate this possibility we asked the client to increase the probing interval on all nodes:
ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000

This didn't solve the issue either.

I need your help for RCA on this.

Version-Release number of selected component (if applicable):
ovn-15.5.0-2.20220216005905.4f55857.el8ost.noarch
OSP 16.2.3

How reproducible:
Happens regularly.

Steps to Reproduce:
1.Deploy a VM on a compute.
2.
3.

Actual results:


Expected results:
Active VM deployed on a compute node.

Additional info:
Attached to the case are Sosreports from 3 controllers, the compute where the issue took place, also output of "openstack server show" for the instance as well as ovnnb_db.db and ovnsb_db.db files.


Note You need to log in before you can comment on or make changes to this bug.