Bug 2269548 - Tempest test tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_resize fails randomly
Summary: Tempest test tempest.scenario.test_network_advanced_server_ops.TestNetworkAdv...
Keywords:
Status: POST
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-oslo-db
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: z4
: 17.1
Assignee: Michael Bayer
QA Contact: Nobody
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-03-14 15:18 UTC by Slawek Kaplonski
Modified: 2024-03-25 20:16 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 2057987 0 None None None 2024-03-15 02:19:34 UTC
OpenStack gerrit 913279 0 None NEW exc_filters: Handle OperationalError for MariaDB/Galera 2024-03-15 02:19:34 UTC
Red Hat Issue Tracker OSP-31617 0 None None None 2024-03-14 15:26:38 UTC

Description Slawek Kaplonski 2024-03-14 15:18:34 UTC
We noticed that scenario test tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_resize is failing randomly in downstream CI. Examples of the failure:

http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-neutron-16.2_director-rhel-virthost-3cont_2comp-ml2ovs-ipv6-all-vxlan-tempest/49/test_results/tempest-results-neutron.1.html where failure was like below:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 90, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python3.6/site-packages/tempest/scenario/test_network_advanced_server_ops.py", line 196, in test_server_connectivity_resize
    'VERIFY_RESIZE')
  File "/usr/lib/python3.6/site-packages/tempest/common/waiters.py", line 78, in wait_for_server_status
    raise exceptions.BuildErrorException(server_id=server_id)
tempest.exceptions.BuildErrorException: Server edba0ed0-d6e1-4b1d-bc35-3a8d89d397f0 failed to build and is in ERROR status
 
Unfortunately there are no logs other than test results from this run.

But there are other 2 examples of similar failures of that test:
* http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-neutron-16.2_director-rhel-virthost-3cont_2comp-ml2ovs-ipv6-all-vxlan-tempest/54/test_results/tempest-results-neutron.1.html
* http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-neutron-16.2_director-rhel-virthost-3cont_2comp-ml2ovs-ipv6-all-vxlan-tempest/46/test_results/tempest-results-neutron.1.html

where failure was like below:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 90, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python3.6/site-packages/tempest/scenario/test_network_advanced_server_ops.py", line 196, in test_server_connectivity_resize
    'VERIFY_RESIZE')
  File "/usr/lib/python3.6/site-packages/tempest/common/waiters.py", line 96, in wait_for_server_status
    raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: (TestNetworkAdvancedServerOps:test_server_connectivity_resize) Server 9460eca7-30ac-4ffe-9ef7-aa7e9b241f3b failed to reach VERIFY_RESIZE status and task state "None" within the required time (300 s). Current status: ACTIVE. Current task state: None. 



After checking nova-compute logs from of of those runs (http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-neutron-16.2_director-rhel-virthost-3cont_2comp-ml2ovs-ipv6-all-vxlan-tempest/54/compute-0/var/log/containers/nova/nova-compute.log.gz) I found out some potentially related deadlock there. It is in http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-neutron-16.2_director-rhel-virthost-3cont_2comp-ml2ovs-ipv6-all-vxlan-tempest/54/compute-0/var/log/containers/nova/nova-compute.log.gz and looks like below:

2024-03-13 22:15:12.943 7 INFO nova.compute.manager [req-184848ab-aa09-4c01-8b19-d44cb5c5e0a1 0e2b931589ca40d4ba7c9d22e3678513 18a715f33c0e4b79bdc8b3265d3ae94a - default default] [instance: 0cf6061f-007f-4b31-afa2-3f4e9b5a739f] Successfully reverted task state from resize_prep on failure for instance.
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server [req-184848ab-aa09-4c01-8b19-d44cb5c5e0a1 0e2b931589ca40d4ba7c9d22e3678513 18a715f33c0e4b79bdc8b3265d3ae94a - default default] Exception during message handling: oslo_messaging.rpc.client.RemoteError: Remote error: OperationalError (pymysql.err.OperationalError) (1213, 'Deadlock: wsrep aborted transaction')
[SQL: UPDATE instance_actions SET updated_at=%(updated_at)s WHERE instance_actions.id = %(instance_actions_id)s]
[parameters: {'updated_at': datetime.datetime(2024, 3, 13, 22, 15, 12, 801841), 'instance_actions_id': 426}]
(Background on this error at: http://sqlalche.me/e/e3q8)
['Traceback (most recent call last):\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context\n    cursor, statement, parameters, context\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute\n    cursor.execute(statement, parameters)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/cursors.py", line 163, in execute\n    result = self._query(query)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/cursors.py", line 321, in _query\n    conn.query(q)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 505, in query\n    self._affected_rows = self._read_query_result(unbuffered=unbuffered)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 724, in _read_query_result\n    result.read()\n', '  File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 1069, in read\n    first_packet = self.connection._read_packet()\n', '  File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 676, in _read_packet\n    packet.raise_for_error()\n', '  File "/usr/lib/python3.6/site-packages/pymysql/protocol.py", line 223, in raise_for_error\n    err.raise_mysql_exception(self._data)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/err.py", line 107, in raise_mysql_exception\n    raise errorclass(errno, errval)\n', "pymysql.err.OperationalError: (1213, 'Deadlock: wsrep aborted transaction')\n", '\nThe above exception was the direct cause of the following exception:\n\n', 'Traceback (most recent call last):\n', '  File "/usr/lib/python3.6/site-packages/nova/conductor/manager.py", line 135, in _object_dispatch\n    return getattr(target, method)(*args, **kwargs)\n', '  File "/usr/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 184, in wrapper\n    result = fn(cls, context, *args, **kwargs)\n', '  File "/usr/lib/python3.6/site-packages/nova/objects/instance_action.py", line 201, in event_start\n    db_event = db.action_event_start(context, values)\n', '  File "/usr/lib/python3.6/site-packages/nova/db/api.py", line 1691, in action_event_start\n    return IMPL.action_event_start(context, values)\n', '  File "/usr/lib/python3.6/site-packages/oslo_db/api.py", line 154, in wrapper\n    ectxt.value = e.inner_exc\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n    self.force_reraise()\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n', '  File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise\n    raise value\n', '  File "/usr/lib/python3.6/site-packages/oslo_db/api.py", line 142, in wrapper\n    return f(*args, **kwargs)\n', '  File "/usr/lib/python3.6/site-packages/nova/db/sqlalchemy/api.py", line 224, in wrapped\n    return f(context, *args, **kwargs)\n', '  File "/usr/lib/python3.6/site-packages/nova/db/sqlalchemy/api.py", line 5290, in action_event_start\n    action.save(context.session)\n', '  File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/models.py", line 50, in save\n    session.flush()\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 2451, in flush\n    self._flush(objects)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 2589, in _flush\n    transaction.rollback(_capture_exception=True)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__\n    compat.reraise(exc_type, exc_value, exc_tb)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 129, in reraise\n    raise value\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 2549, in _flush\n    flush_context.execute()\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute\n    rec.execute(self)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 589, in execute\n    uow,\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 236, in save_obj\n    update,\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 978, in _emit_update_statements\n    statement, multiparams\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 988, in execute\n    return meth(self, multiparams, params)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection\n    return connection._execute_clauseelement(self, multiparams, params)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement\n    distilled_params,\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context\n    e, statement, parameters, cursor, context\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1464, in _handle_dbapi_exception\n    util.raise_from_cause(newraise, exc_info)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 383, in raise_from_cause\n    reraise(type(exception), exception, tb=exc_tb, cause=cause)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 128, in reraise\n    raise value.with_traceback(tb)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context\n    cursor, statement, parameters, context\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute\n    cursor.execute(statement, parameters)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/cursors.py", line 163, in execute\n    result = self._query(query)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/cursors.py", line 321, in _query\n    conn.query(q)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 505, in query\n    self._affected_rows = self._read_query_result(unbuffered=unbuffered)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 724, in _read_query_result\n    result.read()\n', '  File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 1069, in read\n    first_packet = self.connection._read_packet()\n', '  File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 676, in _read_packet\n    packet.raise_for_error()\n', '  File "/usr/lib/python3.6/site-packages/pymysql/protocol.py", line 223, in raise_for_error\n    err.raise_mysql_exception(self._data)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/err.py", line 107, in raise_mysql_exception\n    raise errorclass(errno, errval)\n', "sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1213, 'Deadlock: wsrep aborted transaction')\n[SQL: UPDATE instance_actions SET updated_at=%(updated_at)s WHERE instance_actions.id = %(instance_actions_id)s]\n[parameters: {'updated_at': datetime.datetime(2024, 3, 13, 22, 15, 12, 801841), 'instance_actions_id': 426}]\n(Background on this error at: http://sqlalche.me/e/e3q8)\n"].
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 79, in wrapped
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     function_name, call_dict, binary, tb)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     self.force_reraise()
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     raise value
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 69, in wrapped
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     return f(self, context, *args, **kw)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 191, in decorated_function
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     "Error: %s", e, instance=instance)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     self.force_reraise()
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     raise value
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 161, in decorated_function
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/compute/utils.py", line 1386, in decorated_function
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     with EventReporter(context, event_name, host, instance_uuid):
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/compute/utils.py", line 1356, in __enter__
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     host=self.host)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 177, in wrapper
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     args, kwargs)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/conductor/rpcapi.py", line 241, in object_class_action_versions
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     args=args, kwargs=kwargs)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 181, in call
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     transport_options=self.transport_options)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/transport.py", line 129, in _send
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     transport_options=transport_options)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 674, in send
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     transport_options=transport_options)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 664, in _send
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server     raise result
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server oslo_messaging.rpc.client.RemoteError: Remote error: OperationalError (pymysql.err.OperationalError) (1213, 'Deadlock: wsrep aborted transaction')
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server [SQL: UPDATE instance_actions SET updated_at=%(updated_at)s WHERE instance_actions.id = %(instance_actions_id)s]
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server [parameters: {'updated_at': datetime.datetime(2024, 3, 13, 22, 15, 12, 801841), 'instance_actions_id': 426}]
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server (Background on this error at: http://sqlalche.me/e/e3q8)
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server ['Traceback (most recent call last):\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context\n    cursor, statement, parameters, context\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute\n    cursor.execute(statement, parameters)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/cursors.py", line 163, in execute\n    result = self._query(query)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/cursors.py", line 321, in _query\n    conn.query(q)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 505, in query\n    self._affected_rows = self._read_query_result(unbuffered=unbuffered)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 724, in _read_query_result\n    result.read()\n', '  File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 1069, in read\n    first_packet = self.connection._read_packet()\n', '  File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 676, in _read_packet\n    packet.raise_for_error()\n', '  File "/usr/lib/python3.6/site-packages/pymysql/protocol.py", line 223, in raise_for_error\n    err.raise_mysql_exception(self._data)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/err.py", line 107, in raise_mysql_exception\n    raise errorclass(errno, errval)\n', "pymysql.err.OperationalError: (1213, 'Deadlock: wsrep aborted transaction')\n", '\nThe above exception was the direct cause of the following exception:\n\n', 'Traceback (most recent call last):\n', '  File "/usr/lib/python3.6/site-packages/nova/conductor/manager.py", line 135, in _object_dispatch\n    return getattr(target, method)(*args, **kwargs)\n', '  File "/usr/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 184, in wrapper\n    result = fn(cls, context, *args, **kwargs)\n', '  File "/usr/lib/python3.6/site-packages/nova/objects/instance_action.py", line 201, in event_start\n    db_event = db.action_event_start(context, values)\n', '  File "/usr/lib/python3.6/site-packages/nova/db/api.py", line 1691, in action_event_start\n    return IMPL.action_event_start(context, values)\n', '  File "/usr/lib/python3.6/site-packages/oslo_db/api.py", line 154, in wrapper\n    ectxt.value = e.inner_exc\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n    self.force_reraise()\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n', '  File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise\n    raise value\n', '  File "/usr/lib/python3.6/site-packages/oslo_db/api.py", line 142, in wrapper\n    return f(*args, **kwargs)\n', '  File "/usr/lib/python3.6/site-packages/nova/db/sqlalchemy/api.py", line 224, in wrapped\n    return f(context, *args, **kwargs)\n', '  File "/usr/lib/python3.6/site-packages/nova/db/sqlalchemy/api.py", line 5290, in action_event_start\n    action.save(context.session)\n', '  File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/models.py", line 50, in save\n    session.flush()\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 2451, in flush\n    self._flush(objects)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 2589, in _flush\n    transaction.rollback(_capture_exception=True)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__\n    compat.reraise(exc_type, exc_value, exc_tb)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 129, in reraise\n    raise value\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 2549, in _flush\n    flush_context.execute()\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute\n    rec.execute(self)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 589, in execute\n    uow,\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 236, in save_obj\n    update,\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/persistence.py", line 978, in _emit_update_statements\n    statement, multiparams\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 988, in execute\n    return meth(self, multiparams, params)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection\n    return connection._execute_clauseelement(self, multiparams, params)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement\n    distilled_params,\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context\n    e, statement, parameters, cursor, context\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1464, in _handle_dbapi_exception\n    util.raise_from_cause(newraise, exc_info)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 383, in raise_from_cause\n    reraise(type(exception), exception, tb=exc_tb, cause=cause)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 128, in reraise\n    raise value.with_traceback(tb)\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context\n    cursor, statement, parameters, context\n', '  File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute\n    cursor.execute(statement, parameters)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/cursors.py", line 163, in execute\n    result = self._query(query)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/cursors.py", line 321, in _query\n    conn.query(q)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 505, in query\n    self._affected_rows = self._read_query_result(unbuffered=unbuffered)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 724, in _read_query_result\n    result.read()\n', '  File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 1069, in read\n    first_packet = self.connection._read_packet()\n', '  File "/usr/lib/python3.6/site-packages/pymysql/connections.py", line 676, in _read_packet\n    packet.raise_for_error()\n', '  File "/usr/lib/python3.6/site-packages/pymysql/protocol.py", line 223, in raise_for_error\n    err.raise_mysql_exception(self._data)\n', '  File "/usr/lib/python3.6/site-packages/pymysql/err.py", line 107, in raise_mysql_exception\n    raise errorclass(errno, errval)\n', "sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1213, 'Deadlock: wsrep aborted transaction')\n[SQL: UPDATE instance_actions SET updated_at=%(updated_at)s WHERE instance_actions.id = %(instance_actions_id)s]\n[parameters: {'updated_at': datetime.datetime(2024, 3, 13, 22, 15, 12, 801841), 'instance_actions_id': 426}]\n(Background on this error at: http://sqlalche.me/e/e3q8)\n"].
2024-03-13 22:15:12.950 7 ERROR oslo_messaging.rpc.server

Comment 1 melanie witt 2024-03-15 02:19:34 UTC
I did some web searching and found a few past issues and patch that I think look related:

* https://bugzilla.redhat.com/show_bug.cgi?id=2004173 (Many neutron errors "wsrep aborted transaction" after a controller was rebooted)

* https://bugs.launchpad.net/oslo.db/+bug/1648818 (WSREP deadlock is not properly wrapped)

* https://review.opendev.org/c/openstack/oslo.db/+/409194 (exc_filters: fix deadlock detection for percona xtradb cluster)

I think what's happening here is that Nova is not retrying on deadlock because the deadlock error from sqlalchemy is not being matched/detected in oslo.db.

In Nova there are certain DB queries that are decorated with @retry_on_deadlock and action_event_start (seen in the traceback) is one of them [1]:

    @oslo_db_api.wrap_db_retry(max_retries=5, retry_on_deadlock=True)
    @pick_context_manager_writer
    def action_event_start(context, values):
        ...

But Nova appears to not be retrying when the deadlock occurs because oslo.db isn't detecting the error "wsrep aborted transaction" [2] and is subsequently not raising its DBDeadlock exception, which is needed by the retry decorator [3].

Based on this, I think we need a fix in oslo.db to detect this variant of deadlock error so that those using the @retry_on_deadlock decorator get the expected behavior of retries.

[1] https://github.com/openstack/nova/blob/45e5d213f86dba618f9460f8a860b742723b13f4/nova/db/main/api.py#L3972
[2] https://github.com/openstack/oslo.db/blob/3a94baa0e207a5f85c0f2143d5e3f1dceaf8d994/oslo_db/sqlalchemy/exc_filters.py#L60-L90
[3] https://github.com/openstack/oslo.db/blob/3a94baa0e207a5f85c0f2143d5e3f1dceaf8d994/oslo_db/api.py#L84

Comment 3 melanie witt 2024-03-25 20:16:43 UTC
Changing the component to oslo.db as that's where the fix is.


Note You need to log in before you can comment on or make changes to this bug.