Bug 2269548
| Summary: | Tempest test tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_resize fails randomly | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Slawek Kaplonski <skaplons> |
| Component: | python-oslo-db | Assignee: | Michael Bayer <mbayer> |
| Status: | CLOSED ERRATA | QA Contact: | Nobody <nobody> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.2 (Train) | CC: | alifshit, apevec, dasmith, dciabrin, eglynn, jhakimra, jparker, kchamart, lhh, mariel, mbayer, mwitt, sbauza, sgordon, vromanso |
| Target Milestone: | z4 | Keywords: | Automation, Patch, Triaged |
| Target Release: | 17.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | python-oslo-db-8.5.2-17.1.20240820150750.26fd6fb.el9ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-11-21 09:39:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Slawek Kaplonski
2024-03-14 15:18:34 UTC
I did some web searching and found a few past issues and patch that I think look related: * https://bugzilla.redhat.com/show_bug.cgi?id=2004173 (Many neutron errors "wsrep aborted transaction" after a controller was rebooted) * https://bugs.launchpad.net/oslo.db/+bug/1648818 (WSREP deadlock is not properly wrapped) * https://review.opendev.org/c/openstack/oslo.db/+/409194 (exc_filters: fix deadlock detection for percona xtradb cluster) I think what's happening here is that Nova is not retrying on deadlock because the deadlock error from sqlalchemy is not being matched/detected in oslo.db. In Nova there are certain DB queries that are decorated with @retry_on_deadlock and action_event_start (seen in the traceback) is one of them [1]: @oslo_db_api.wrap_db_retry(max_retries=5, retry_on_deadlock=True) @pick_context_manager_writer def action_event_start(context, values): ... But Nova appears to not be retrying when the deadlock occurs because oslo.db isn't detecting the error "wsrep aborted transaction" [2] and is subsequently not raising its DBDeadlock exception, which is needed by the retry decorator [3]. Based on this, I think we need a fix in oslo.db to detect this variant of deadlock error so that those using the @retry_on_deadlock decorator get the expected behavior of retries. [1] https://github.com/openstack/nova/blob/45e5d213f86dba618f9460f8a860b742723b13f4/nova/db/main/api.py#L3972 [2] https://github.com/openstack/oslo.db/blob/3a94baa0e207a5f85c0f2143d5e3f1dceaf8d994/oslo_db/sqlalchemy/exc_filters.py#L60-L90 [3] https://github.com/openstack/oslo.db/blob/3a94baa0e207a5f85c0f2143d5e3f1dceaf8d994/oslo_db/api.py#L84 Changing the component to oslo.db as that's where the fix is. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHOSP 17.1.4 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:9974 |