Bug 2269548
Summary: | Tempest test tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_resize fails randomly | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Slawek Kaplonski <skaplons> |
Component: | python-oslo-db | Assignee: | Michael Bayer <mbayer> |
Status: | POST --- | QA Contact: | Nobody <nobody> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 16.2 (Train) | CC: | apevec, dasmith, dciabrin, eglynn, jhakimra, kchamart, lhh, mbayer, mwitt, sbauza, sgordon, vromanso |
Target Milestone: | z4 | Keywords: | Automation, Patch, Triaged |
Target Release: | 17.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | Bug | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Slawek Kaplonski
2024-03-14 15:18:34 UTC
I did some web searching and found a few past issues and patch that I think look related: * https://bugzilla.redhat.com/show_bug.cgi?id=2004173 (Many neutron errors "wsrep aborted transaction" after a controller was rebooted) * https://bugs.launchpad.net/oslo.db/+bug/1648818 (WSREP deadlock is not properly wrapped) * https://review.opendev.org/c/openstack/oslo.db/+/409194 (exc_filters: fix deadlock detection for percona xtradb cluster) I think what's happening here is that Nova is not retrying on deadlock because the deadlock error from sqlalchemy is not being matched/detected in oslo.db. In Nova there are certain DB queries that are decorated with @retry_on_deadlock and action_event_start (seen in the traceback) is one of them [1]: @oslo_db_api.wrap_db_retry(max_retries=5, retry_on_deadlock=True) @pick_context_manager_writer def action_event_start(context, values): ... But Nova appears to not be retrying when the deadlock occurs because oslo.db isn't detecting the error "wsrep aborted transaction" [2] and is subsequently not raising its DBDeadlock exception, which is needed by the retry decorator [3]. Based on this, I think we need a fix in oslo.db to detect this variant of deadlock error so that those using the @retry_on_deadlock decorator get the expected behavior of retries. [1] https://github.com/openstack/nova/blob/45e5d213f86dba618f9460f8a860b742723b13f4/nova/db/main/api.py#L3972 [2] https://github.com/openstack/oslo.db/blob/3a94baa0e207a5f85c0f2143d5e3f1dceaf8d994/oslo_db/sqlalchemy/exc_filters.py#L60-L90 [3] https://github.com/openstack/oslo.db/blob/3a94baa0e207a5f85c0f2143d5e3f1dceaf8d994/oslo_db/api.py#L84 Changing the component to oslo.db as that's where the fix is. |