Hide Forgot
Description of problem: Instance stays in Error state after delete, see neutron logs below. Upstream issue: https://bugs.launchpad.net/neutron/+bug/1513242 Version-Release number of selected component (if applicable): openstack-neutron-common-2015.1.2-6.el7ost.noarch python-neutronclient-2.4.0-2.el7ost.noarch openstack-neutron-ml2-2015.1.2-6.el7ost.noarch openstack-neutron-openvswitch-2015.1.2-6.el7ost.noarch python-neutron-2015.1.2-6.el7ost.noarch openstack-neutron-2015.1.2-6.el7ost.noarch How reproducible: ~10% of all Steps to Reproduce: Spawn and delete instance, rinse, repeat Actual results: 2016-01-28 05:44:15.391 60669 ERROR neutron.api.v2.resource [req-432b001f-2b68-4a16-887f-4ee20e08df5d ] delete failed 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource Traceback (most recent call last): 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/api/v2/resource.py", line 83, in resource 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource result = method(request=request, **args) 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/api/v2/base.py", line 490, in delete 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource obj_deleter(request.context, id, **kwargs) 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 131, in wrapper 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource return f(*args, **kwargs) 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py", line 1272, in delete_port 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource context, id, do_notify=False) 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/db/l3_dvr_db.py", line 274, in disassociate_floatingips 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource do_notify=do_notify) 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/db/l3_db.py", line 1361, in disassociate_floatingips 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource context, port_id) 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/db/l3_db.py", line 1088, in disassociate_floatingips 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource 'router_id': None}) 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 490, in __exit__ 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource self.rollback() 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__ 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource compat.reraise(exc_type, exc_value, exc_tb) 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 487, in __exit__ 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource self.commit() 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 392, in commit 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource self._prepare_impl() 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 372, in _prepare_impl 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource self.session.flush() 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2004, in flush 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource self._flush(objects) 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2122, in _flush 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource transaction.rollback(_capture_exception=True) 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__ 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource compat.reraise(exc_type, exc_value, exc_tb) 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2086, in _flush 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource flush_context.execute() 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource rec.execute(self) 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource uow 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 170, in save_obj 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource mapper, table, update) 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 712, in _emit_update_statements 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource (table.description, len(records), rows)) 2016-01-28 05:44:15.391 60669 TRACE neutron.api.v2.resource StaleDataError: UPDATE statement on table 'floatingips' expected to update 1 row(s); 0 were matched. Expected results: Instances are removed as expected Additional info:
Any idea about this one? The upstream bug expired, it's not clear if this was fixed in OSP8+ and if so is there a patch that we can backport?
Assaf, I spent some time comparing Kilo and Liberty code (since the upstream reporter claimed Liberty is fixed for him). Nothing that would stand out, expect maybe a set of retry decorators applied to base api entry points, starting from https://review.openstack.org/#/c/191540/ That said, it does not seem that StaleDataError is handled by the decorators, though I may miss something since I am not that fluent with oslo.db code. We probably want to backport those anyway since they drastically enhances API stability for deadlocks. Now, in upstream Liberty we also switched to another mysql driver (pymysql), and that could also contribute to the issue disappearing for the reporter when Liberty code is used. We definitely cannot change the driver in OSP7. Should I go and backport retry fixes?
(In reply to Ihar Hrachyshka from comment #2) > Assaf, I spent some time comparing Kilo and Liberty code (since the upstream > reporter claimed Liberty is fixed for him). Nothing that would stand out, > expect maybe a set of retry decorators applied to base api entry points, > starting from https://review.openstack.org/#/c/191540/ That said, it does > not seem that StaleDataError is handled by the decorators, though I may miss > something since I am not that fluent with oslo.db code. > > We probably want to backport those anyway since they drastically enhances > API stability for deadlocks. > > Now, in upstream Liberty we also switched to another mysql driver (pymysql), > and that could also contribute to the issue disappearing for the reporter > when Liberty code is used. We definitely cannot change the driver in OSP7. > > Should I go and backport retry fixes? The retry decorators have come up in multiple contexts, I think that the answer is yes, we want them in OSP 7 as well.
I think the fix is: https://review.openstack.org/#/c/192063/ and we backported it into the latest package.
Steps to reproduce for the bug are not clear, as is often the case with race conditions. You can try to repeat creating and deleting instances (with floating IPs) while also deleting the floating IPs in parallel. From automation perspective, maybe it's just a matter of regression and stability tests using tempest.
The code is verified on latest OSP7 - openstack-neutron-2015.1.4-12.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0277.html