Bug 1306690

Summary:	"StaleDataError: UPDATE statement on table 'floatingips' expected to update 1 row(s); 0 were matched" on removing instance
Product:	Red Hat OpenStack	Reporter:	Vadim Rutkovsky <vrutkovs>
Component:	openstack-neutron	Assignee:	Ihar Hrachyshka <ihrachys>
Status:	CLOSED ERRATA	QA Contact:	Toni Freger <tfreger>
Severity:	unspecified	Docs Contact:
Priority:	medium
Version:	7.0 (Kilo)	CC:	amuller, chrisw, ihrachys, nyechiel, srevivo
Target Milestone:	async	Keywords:	ZStream
Target Release:	7.0 (Kilo)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openstack-neutron-2015.1.4-12.el7ost	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-02-15 13:50:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Vadim Rutkovsky 2016-02-11 15:21:46 UTC

Description of problem:
Instance stays in Error state after delete, see neutron logs below.

Upstream issue: https://bugs.launchpad.net/neutron/+bug/1513242


Version-Release number of selected component (if applicable):
openstack-neutron-common-2015.1.2-6.el7ost.noarch
python-neutronclient-2.4.0-2.el7ost.noarch
openstack-neutron-ml2-2015.1.2-6.el7ost.noarch
openstack-neutron-openvswitch-2015.1.2-6.el7ost.noarch
python-neutron-2015.1.2-6.el7ost.noarch
openstack-neutron-2015.1.2-6.el7ost.noarch

How reproducible:
~10% of all 

Steps to Reproduce:
Spawn and delete instance, rinse, repeat

Actual results:
2016-01-28 05:44:15.391 60669 ERROR neutron.api.v2.resource [req-432b001f-2b68-4a16-887f-4ee20e08df5d ] delete failed
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource Traceback (most recent call last):
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib/python2.7/site-packages/neutron/api/v2/resource.py", line 83, in resource
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     result = method(request=request, **args)
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib/python2.7/site-packages/neutron/api/v2/base.py", line 490, in delete
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     obj_deleter(request.context, id, **kwargs)
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 131, in wrapper
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     return f(*args, **kwargs)
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py", line 1272, in delete_port
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     context, id, do_notify=False)
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib/python2.7/site-packages/neutron/db/l3_dvr_db.py", line 274, in disassociate_floatingips
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     do_notify=do_notify)
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib/python2.7/site-packages/neutron/db/l3_db.py", line 1361, in disassociate_floatingips
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     context, port_id)
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib/python2.7/site-packages/neutron/db/l3_db.py", line 1088, in disassociate_floatingips
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     'router_id': None})
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 490, in __exit__
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     self.rollback()
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     compat.reraise(exc_type, exc_value, exc_tb)
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 487, in __exit__
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     self.commit()
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 392, in commit
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     self._prepare_impl()
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 372, in _prepare_impl
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     self.session.flush()
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2004, in flush
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     self._flush(objects)
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2122, in _flush
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     transaction.rollback(_capture_exception=True)
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     compat.reraise(exc_type, exc_value, exc_tb)
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2086, in _flush
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     flush_context.execute()
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     rec.execute(self)
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     uow
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 170, in save_obj
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     mapper, table, update)
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 712, in _emit_update_statements
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource     (table.description, len(records), rows))
2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource StaleDataError: UPDATE statement on table 'floatingips' expected to update 1 row(s); 0 were matched.

Expected results:
Instances are removed as expected

Additional info:

Comment 1 Assaf Muller 2016-06-04 20:46:46 UTC

Any idea about this one? The upstream bug expired, it's not clear if this was fixed in OSP8+ and if so is there a patch that we can backport?

Comment 2 Ihar Hrachyshka 2016-06-07 11:56:01 UTC

Assaf, I spent some time comparing Kilo and Liberty code (since the upstream reporter claimed Liberty is fixed for him). Nothing that would stand out, expect maybe a set of retry decorators applied to base api entry points, starting from https://review.openstack.org/#/c/191540/ That said, it does not seem that StaleDataError is handled by the decorators, though I may miss something since I am not that fluent with oslo.db code.

We probably want to backport those anyway since they drastically enhances API stability for deadlocks.

Now, in upstream Liberty we also switched to another mysql driver (pymysql), and that could also contribute to the issue disappearing for the reporter when Liberty code is used. We definitely cannot change the driver in OSP7.

Should I go and backport retry fixes?

Comment 3 Assaf Muller 2016-10-03 13:34:54 UTC

(In reply to Ihar Hrachyshka from comment #2)
> Assaf, I spent some time comparing Kilo and Liberty code (since the upstream
> reporter claimed Liberty is fixed for him). Nothing that would stand out,
> expect maybe a set of retry decorators applied to base api entry points,
> starting from https://review.openstack.org/#/c/191540/ That said, it does
> not seem that StaleDataError is handled by the decorators, though I may miss
> something since I am not that fluent with oslo.db code.
> 
> We probably want to backport those anyway since they drastically enhances
> API stability for deadlocks.
> 
> Now, in upstream Liberty we also switched to another mysql driver (pymysql),
> and that could also contribute to the issue disappearing for the reporter
> when Liberty code is used. We definitely cannot change the driver in OSP7.
> 
> Should I go and backport retry fixes?

The retry decorators have come up in multiple contexts, I think that the answer is yes, we want them in OSP 7 as well.

Comment 4 Ihar Hrachyshka 2017-01-06 16:58:38 UTC

I think the fix is: https://review.openstack.org/#/c/192063/ and we backported it into the latest package.

Comment 7 Ihar Hrachyshka 2017-02-07 14:41:31 UTC

Steps to reproduce for the bug are not clear, as is often the case with race conditions. You can try to repeat creating and deleting instances (with floating IPs) while also deleting the floating IPs in parallel.

From automation perspective, maybe it's just a matter of regression and stability tests using tempest.

Comment 9 Toni Freger 2017-02-14 12:37:37 UTC

The code is verified on latest OSP7 - openstack-neutron-2015.1.4-12.el7ost.noarch

Comment 11 errata-xmlrpc 2017-02-15 13:50:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0277.html