Bug 1109577

Summary: DB locking errors when deleting multiple instances that have floating IPs associated to them.
Product: Red Hat OpenStack Reporter: Lee Yarwood <lyarwood>
Component: openstack-neutronAssignee: Ihar Hrachyshka <ihrachys>
Status: CLOSED ERRATA QA Contact: yfried
Severity: high Docs Contact:
Priority: high    
Version: 5.0 (RHEL 7)CC: chrisw, lpeer, mwagner, nyechiel, oblaut, rdekel, slong, twillber, yeylon
Target Milestone: rc   
Target Release: 5.0 (RHEL 7)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-neutron-2014.1-37.el7ost Doc Type: Bug Fix
Doc Text:
Previously, when deleting a port, a notification was issued while under database transaction that resulted in a current green thread switch. When green thread switched to another thread that attempted to access to a database row that was already locked by the first thread, deadlock occurred. After a delay (50 seconds), database library returned OperationalError due to a time out. With this update, the notification that could result in the green thread switch was moved outside the database transaction, so that a deadlock does not occur in this particular scenario.
Story Points: ---
Clone Of:
: 1117818 (view as bug list) Environment:
Last Closed: 2014-07-24 17:23:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1117818    
Attachments:
Description Flags
Example server.log none

Description Lee Yarwood 2014-06-15 16:59:22 UTC
Description of problem:

I've been able to reproduce this by deleting only 10 instances, each with their own internal and floating IP associated to them.

# nova list
+--------------------------------------+--------------------------------------------+--------+------------+-------------+-----------------------------------+
| ID                                   | Name                                       | Status | Task State | Power State | Networks                          |
+--------------------------------------+--------------------------------------------+--------+------------+-------------+-----------------------------------+
| 02aad532-8abc-491a-a879-4a364f8bec5b | batch-02aad532-8abc-491a-a879-4a364f8bec5b | ACTIVE | -          | Running     | internal=10.10.10.38, 192.2.0.111 |
| 04e080be-9173-44da-8c47-0806d76c8a25 | batch-04e080be-9173-44da-8c47-0806d76c8a25 | ACTIVE | -          | Running     | internal=10.10.10.40, 192.2.0.110 |
| 5fbae796-7037-4539-970e-1c0b6d922f2a | batch-5fbae796-7037-4539-970e-1c0b6d922f2a | ACTIVE | -          | Running     | internal=10.10.10.39, 192.2.0.108 |
| 91c5095d-7319-438c-a167-f4501ea4bb6e | batch-91c5095d-7319-438c-a167-f4501ea4bb6e | ACTIVE | -          | Running     | internal=10.10.10.33, 192.2.0.103 |
| aaedd2cc-beb8-4d6d-8285-a76cfd05daa2 | batch-aaedd2cc-beb8-4d6d-8285-a76cfd05daa2 | ACTIVE | -          | Running     | internal=10.10.10.41, 192.2.0.109 |
| d4d63f23-fa78-40a9-a19e-46543ce874df | batch-d4d63f23-fa78-40a9-a19e-46543ce874df | ACTIVE | -          | Running     | internal=10.10.10.36, 192.2.0.105 |
| e9d1d2d9-0034-4714-8f43-c1b8c452723b | batch-e9d1d2d9-0034-4714-8f43-c1b8c452723b | ACTIVE | -          | Running     | internal=10.10.10.35, 192.2.0.104 |
| ebd7b729-1586-4827-a7bf-053787811b60 | batch-ebd7b729-1586-4827-a7bf-053787811b60 | ACTIVE | -          | Running     | internal=10.10.10.42, 192.2.0.107 |
| f1b1b73d-9f5f-437d-a53b-7acae472b758 | batch-f1b1b73d-9f5f-437d-a53b-7acae472b758 | ACTIVE | -          | Running     | internal=10.10.10.37, 192.2.0.106 |
| fd604530-fb61-4770-98af-f535c80fc2b7 | batch-fd604530-fb61-4770-98af-f535c80fc2b7 | ACTIVE | -          | Running     | internal=10.10.10.34, 192.2.0.102 |
+--------------------------------------+--------------------------------------------+--------+------------+-------------+-----------------------------------+

2014-06-15 12:50:41.052 15781 ERROR neutron.openstack.common.rpc.amqp [-] Exception during message handling
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp Traceback (most recent call last):
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/amqp.py", line 462, in _process_data
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     **args)
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 45, in dispatch
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     neutron_ctxt, version, method, namespace, **kwargs)
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     result = getattr(proxyobj, method)(ctxt, **kwargs)
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib/python2.7/site-packages/neutron/db/l3_rpc_base.py", line 111, in update_floatingip_statuses
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     status)
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib/python2.7/site-packages/neutron/db/l3_db.py", line 684, in update_floatingip_status
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     fip_query.update({'status': status}, synchronize_session=False)
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2690, in update
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     update_op.exec_()
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 816, in exec_
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     self._do_exec()
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 913, in _do_exec
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     update_stmt, params=self.query._params)
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/db/sqlalchemy/session.py", line 597, in _wrap
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     return f(*args, **kwargs)
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/db/sqlalchemy/session.py", line 840, in execute
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     return super(Session, self).execute(*args, **kwargs)
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 934, in execute
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     clause, params or {})
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 662, in execute
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     params)
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 761, in _execute_clauseelement
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     compiled_sql, distilled_params
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 874, in _execute_context
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     context)
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1024, in _handle_dbapi_exception
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     exc_info
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 196, in raise_from_cause
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     reraise(type(exception), exception, tb=exc_tb)
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 867, in _execute_context
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     context)
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 324, in do_execute
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     cursor.execute(statement, parameters)
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     self.errorhandler(self, exc, value)
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp   File "/usr/lib64/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp     raise errorclass, errorvalue
2014-06-15 12:50:41.052 15781 TRACE neutron.openstack.common.rpc.amqp OperationalError: (OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction') 'UPDATE floatingips SET status=%s WHERE floatingips.id = %s' ('ACTIVE', 'a030bb1e-31f0-42d7-84fc-520856f0ee66')

# nova list
+--------------------------------------+--------------------------------------------+--------+------------+-------------+-----------------------------------+
| ID                                   | Name                                       | Status | Task State | Power State | Networks                          |
+--------------------------------------+--------------------------------------------+--------+------------+-------------+-----------------------------------+
| 91c5095d-7319-438c-a167-f4501ea4bb6e | batch-91c5095d-7319-438c-a167-f4501ea4bb6e | ERROR  | deleting   | Running     | internal=10.10.10.33, 192.2.0.103 |
| d4d63f23-fa78-40a9-a19e-46543ce874df | batch-d4d63f23-fa78-40a9-a19e-46543ce874df | ERROR  | deleting   | Running     | internal=10.10.10.36, 192.2.0.105 |
| e9d1d2d9-0034-4714-8f43-c1b8c452723b | batch-e9d1d2d9-0034-4714-8f43-c1b8c452723b | ERROR  | deleting   | Running     | internal=10.10.10.35, 192.2.0.104 |
| ebd7b729-1586-4827-a7bf-053787811b60 | batch-ebd7b729-1586-4827-a7bf-053787811b60 | ERROR  | deleting   | Running     | internal=10.10.10.42, 192.2.0.107 |
| f1b1b73d-9f5f-437d-a53b-7acae472b758 | batch-f1b1b73d-9f5f-437d-a53b-7acae472b758 | ERROR  | deleting   | Running     | internal=10.10.10.37, 192.2.0.106 |
| fd604530-fb61-4770-98af-f535c80fc2b7 | batch-fd604530-fb61-4770-98af-f535c80fc2b7 | ERROR  | deleting   | Running     | internal=10.10.10.34, 192.2.0.102 |
+--------------------------------------+--------------------------------------------+--------+------------+-------------+-----------------------------------+

Version-Release number of selected component (if applicable):
# rpm -qa | grep neutron
openstack-neutron-openvswitch-2014.1-26.el7ost.noarch
python-neutronclient-2.3.4-2.el7ost.noarch
openstack-neutron-2014.1-26.el7ost.noarch
python-neutron-2014.1-26.el7ost.noarch
openstack-neutron-ml2-2014.1-26.el7ost.noarch

How reproducible:
Always.

Steps to Reproduce:
1. Create >= 10 instances and associate a floating IP to each one.
2. Mass delete these instances either through the cli and ui.

Actual results:
A number of instances are not deleted and enter an error state due to locking issues with the DB.

Expected results:
All instances deleted and floating IPs freed.

Additional info:

Comment 1 Lee Yarwood 2014-06-15 17:01:21 UTC
Created attachment 908931 [details]
Example server.log

Comment 3 lpeer 2014-06-17 14:22:20 UTC
Removing the blocker flag as this is a known u/s issue.
Adding exception to track this as high priority.

Comment 4 Ihar Hrachyshka 2014-06-18 12:05:21 UTC
The issue occurs when port delete results in floating IP disassociation which results in L3 notifier to be invoked under transaction that results in green thread yield to another thread that also attempts to update the same row in db (L3 agent updating floating IP status).

To fix the issue, we should make sure we don't notify about floating IP disassociation while under db transaction.

The issue can be reproduced as follows:
1. create multiple ports:
$ for i in `seq 40`; do neutron port-create --name port$i private; done

2. associate a floating IP to each of the ports.
$ for i in `seq 40`; do neutron floatingip-create --port_id `neutron port-show port$i|grep '| id' | awk '{print $4}'` public; done

3. delete ports in parallel.
$ for i in `seq 40`; do neutron port-delete port$i; done

While doing so, I got at least 3 locks.

Comment 5 Ihar Hrachyshka 2014-07-09 13:01:40 UTC
On related note, Red Hat has started to push upstream to switch to another MySQL client library that should not show those and similar deadlocks: https://review.openstack.org/#/c/104905/

Comment 7 yfried 2014-07-16 10:02:50 UTC
On RHEL7

rpm -qa | grep neutron
python-neutronclient-2.3.4-2.el7ost.noarch
python-neutron-2014.1.1-2.el7ost.noarch
openstack-neutron-2014.1.1-2.el7ost.noarch
openstack-neutron-openvswitch-2014.1.1-2.el7ost.noarch

[root@puma45 ~(keystone_admin)]# nova boot --flavor 1 --image cirros --num-instances 10 --nic net-id=1db71fbf-1f97-4e34-b960-8a7b1936a583 test_vm
[root@puma45 ~(keystone_admin)]# nova list
+--------------------------------------+----------------------------------------------+--------+------------+-------------+-------------------+
| ID                                   | Name                                         | Status | Task State | Power State | Networks          |
+--------------------------------------+----------------------------------------------+--------+------------+-------------+-------------------+
| 168189d7-5af3-4902-8985-08da0d739f2d | test_vm-168189d7-5af3-4902-8985-08da0d739f2d | BUILD  | spawning   | NOSTATE     | private=10.0.0.11 |
| 3e8dcc30-0282-496e-ae59-80054457df82 | test_vm-3e8dcc30-0282-496e-ae59-80054457df82 | BUILD  | spawning   | NOSTATE     | private=10.0.0.12 |
| 531bed85-a898-48a9-b36e-67560f0429e9 | test_vm-531bed85-a898-48a9-b36e-67560f0429e9 | BUILD  | spawning   | NOSTATE     | private=10.0.0.9  |
| 563c3070-8596-4e43-8d8c-770696700037 | test_vm-563c3070-8596-4e43-8d8c-770696700037 | BUILD  | spawning   | NOSTATE     | private=10.0.0.10 |
| 66b7a853-78f0-4a32-bd24-ec8b414394b9 | test_vm-66b7a853-78f0-4a32-bd24-ec8b414394b9 | BUILD  | spawning   | NOSTATE     | private=10.0.0.2  |
| 6c92ec99-387c-45a1-b1c1-4f7cc807e4c5 | test_vm-6c92ec99-387c-45a1-b1c1-4f7cc807e4c5 | BUILD  | spawning   | NOSTATE     | private=10.0.0.7  |
| 76702532-d2e9-4e7a-89a2-e8c8f5d9d635 | test_vm-76702532-d2e9-4e7a-89a2-e8c8f5d9d635 | BUILD  | spawning   | NOSTATE     | private=10.0.0.5  |
| a3393005-3b42-42a1-bfb2-a978a1c2fd1e | test_vm-a3393005-3b42-42a1-bfb2-a978a1c2fd1e | BUILD  | spawning   | NOSTATE     | private=10.0.0.8  |
| b9d70aaa-3d2e-45e2-bb8e-fdd2b22fc9fd | test_vm-b9d70aaa-3d2e-45e2-bb8e-fdd2b22fc9fd | BUILD  | spawning   | NOSTATE     | private=10.0.0.3  |
| d07de2cf-bd6e-4b09-90af-7c194e90785e | test_vm-d07de2cf-bd6e-4b09-90af-7c194e90785e | BUILD  | spawning   | NOSTATE     | private=10.0.0.4  |
+--------------------------------------+----------------------------------------------+--------+------------+-------------+-------------------+

[root@puma45 ~(keystone_admin)]# for port in `neutron port-list --device_owner compute:none | awk '/subnet/ {print $2}'`; do neutron floatingip-create public --port-id $port; done
[root@puma45 ~(keystone_admin)]# nova list
+--------------------------------------+----------------------------------------------+--------+------------+-------------+---------------------------------+
| ID                                   | Name                                         | Status | Task State | Power State | Networks                        |
+--------------------------------------+----------------------------------------------+--------+------------+-------------+---------------------------------+
| 168189d7-5af3-4902-8985-08da0d739f2d | test_vm-168189d7-5af3-4902-8985-08da0d739f2d | ACTIVE | -          | Running     | private=10.0.0.11, 10.35.166.6  |
| 3e8dcc30-0282-496e-ae59-80054457df82 | test_vm-3e8dcc30-0282-496e-ae59-80054457df82 | ACTIVE | -          | Running     | private=10.0.0.12, 10.35.166.9  |
| 531bed85-a898-48a9-b36e-67560f0429e9 | test_vm-531bed85-a898-48a9-b36e-67560f0429e9 | ACTIVE | -          | Running     | private=10.0.0.9, 10.35.166.12  |
| 563c3070-8596-4e43-8d8c-770696700037 | test_vm-563c3070-8596-4e43-8d8c-770696700037 | ACTIVE | -          | Running     | private=10.0.0.10, 10.35.166.10 |
| 66b7a853-78f0-4a32-bd24-ec8b414394b9 | test_vm-66b7a853-78f0-4a32-bd24-ec8b414394b9 | ACTIVE | -          | Running     | private=10.0.0.2, 10.35.166.5   |
| 6c92ec99-387c-45a1-b1c1-4f7cc807e4c5 | test_vm-6c92ec99-387c-45a1-b1c1-4f7cc807e4c5 | ACTIVE | -          | Running     | private=10.0.0.7, 10.35.166.8   |
| 76702532-d2e9-4e7a-89a2-e8c8f5d9d635 | test_vm-76702532-d2e9-4e7a-89a2-e8c8f5d9d635 | ACTIVE | -          | Running     | private=10.0.0.5, 10.35.166.13  |
| a3393005-3b42-42a1-bfb2-a978a1c2fd1e | test_vm-a3393005-3b42-42a1-bfb2-a978a1c2fd1e | ACTIVE | -          | Running     | private=10.0.0.8, 10.35.166.4   |
| b9d70aaa-3d2e-45e2-bb8e-fdd2b22fc9fd | test_vm-b9d70aaa-3d2e-45e2-bb8e-fdd2b22fc9fd | ACTIVE | -          | Running     | private=10.0.0.3, 10.35.166.7   |
| d07de2cf-bd6e-4b09-90af-7c194e90785e | test_vm-d07de2cf-bd6e-4b09-90af-7c194e90785e | ACTIVE | -          | Running     | private=10.0.0.4, 10.35.166.11  |
+--------------------------------------+----------------------------------------------+--------+------------+-------------+---------------------------------+


[root@puma45 ~(keystone_admin)]# nova delete `nova list | awk '/ACTIVE/ {print $2}'`
[root@puma45 ~(keystone_admin)]# nova list
+----+------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+----+------+--------+------------+-------------+----------+
+----+------+--------+------------+-------------+----------+

Comment 8 yfried 2014-07-16 10:03:43 UTC
[root@puma45 ~(keystone_admin)]# neutron floatingip-list 
+--------------------------------------+------------------+---------------------+---------+
| id                                   | fixed_ip_address | floating_ip_address | port_id |
+--------------------------------------+------------------+---------------------+---------+
| 0422aa67-6e54-4507-8e38-25011e24b02c |                  | 10.35.166.10        |         |
| 04684362-dd91-4157-a9f5-0962c4a97264 |                  | 10.35.166.9         |         |
| 0686309f-4ee7-4f8c-9c41-2ceddf56ed21 |                  | 10.35.166.8         |         |
| 2005c94c-9160-431d-8c5f-d36903b34f3b |                  | 10.35.166.7         |         |
| 9288b138-35df-4cc0-9d31-0e537a78eebf |                  | 10.35.166.13        |         |
| b1699375-259d-4ea9-9759-21a8693dc04f |                  | 10.35.166.4         |         |
| dde5b717-b452-45fc-8cc7-fd09a05b2166 |                  | 10.35.166.11        |         |
| ed04b256-10c6-4e4f-8f96-d887e12fb792 |                  | 10.35.166.12        |         |
| ef86c585-9359-4199-a7e2-2a311f9a0735 |                  | 10.35.166.5         |         |
| f9d81e29-96e2-4bf9-953d-068c40038e9f |                  | 10.35.166.6         |         |
+--------------------------------------+------------------+---------------------+---------+

Comment 10 errata-xmlrpc 2014-07-24 17:23:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0936.html