Bug 2233797
Summary: | Neutron agent list request succeeds despite not contacting OVN DBs after controllers reboot | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Eduardo Olivares <eolivare> | |
Component: | python-neutron-lib | Assignee: | Rodolfo Alonso <ralonsoh> | |
Status: | CLOSED ERRATA | QA Contact: | Eduardo Olivares <eolivare> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 17.1 (Wallaby) | CC: | apevec, chrisw, egarciar, ekuris, froyo, jjoyce, jschluet, lhh, mariel, mburns, mtomaska, prgutier, ralonsoh, scohen, ykarel | |
Target Milestone: | z3 | Keywords: | AutomationBlocker, Triaged | |
Target Release: | 17.1 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | python-neutron-lib-2.10.3-17.1.20230706110935.el9ost | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2252947 (view as bug list) | Environment: | ||
Last Closed: | 2024-05-22 20:39:31 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2252947 |
Description
Eduardo Olivares
2023-08-23 13:13:17 UTC
Hello: After several retries trying to reproduce the error with the corresponding logs, we have managed to find the problem. We realized that the "openstack agent list" empty list returned was happening randomly in the Neutron APIs. This issue was not always happening: the same Neutron API was returning the correct result or an empty list. Adding new logs we found the issue was specifically affecting some API workers, after some hard reboots. In the OVN mech driver, the method "get_agents" is patched to call both: * The OVN mech driver "get_agents" method. This method retrieves the OVN agent list stored in a local cache (no DB access). * The extension "agents" "get_agents" method. This method retrieves the Neutron DB "agents" list. The patched method returns the combined results. This method is patched in [1]. The problem we found in the testing environment was that, sometimes, the "_setup_hash_ring" method called before [2] fails [3]. The OVN mech driver is loaded but a list of methods are not correctly patched (for example, "get_agents"). ACTIONS: * It is needed to properly handle any possible error in the "_setup_hash_ring" call. * Log a message at the end of the "post_fork_initialize" method to check that this event method has finished properly. Regards. [1]https://github.com/openstack/neutron/blob/c33805919eb89f5ea6d8b54d4142d4c829d9a2a0/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L355 [2]https://github.com/openstack/neutron/blob/c33805919eb89f5ea6d8b54d4142d4c829d9a2a0/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L349 [3]https://paste.opendev.org/show/bqzDPR5TukLq9d1GIcnz/ Raised priority to high since neutron CI testing is blocked because of this Verified on: RHOS-17.1-RHEL-9-20240415.n.1 openstack-neutron-18.6.1-17.1.20231025110810.el9ost.noarch python3-neutron-lib-2.10.3-17.1.20230706110935.el9ost.noarch After executing the test that reboots the 3/3 controllers 100 times, the output obtained with the command `openstack network agent list` was consistent. The following script was used: ``` $ cat run.sh #!/bin/bash set -ex export OS_CLOUD=overcloud for i in {0..99}; do tox -e faults -- tobiko/tests/faults/ha/test_cloud_recovery.py::DisruptTripleoNodesTest::test_z99_hard_reboot_controllers_recovery | tee report/test_z99_hard_reboot_controllers_recovery-$i.log for j in {0..9}; do AGENTS=`openstack network agent list -f value` NAGENTS=`echo $AGENTS | grep -o " True True " | wc -l` if [ "$NAGENTS" != "7" ]; then echo "Wrong number of agents! $NAGENTS" echo "$AGENTS" exit 1 fi done done ``` Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 17.1.3 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:2741 |