Bug 1373258

Summary: Race condition in L3HA while creating and deleting routers.
Product: Red Hat OpenStack Reporter: John Schwarz <jschwarz>
Component: openstack-neutronAssignee: Assaf Muller <amuller>
Status: CLOSED WONTFIX QA Contact: Toni Freger <tfreger>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: amuller, chrisw, ggillies, jschwarz, nyechiel, srevivo
Target Milestone: ---Keywords: ZStream
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-26 21:41:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description John Schwarz 2016-09-05 15:39:27 UTC
While creating and deleting an HA router, a StaleDataError can occur.

This happens starting openstack-neutron-2015.1.4-3.el7ost (following the fix for bz 1281254).

The trace is:
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     executor_callback))
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     executor_callback)
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     result = func(ctxt, **new_args)
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/neutron/api/rpc/handlers/l3_rpc.py", line 78, in sync_routers
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     self.l3plugin.auto_schedule_routers(context, host, router_ids)
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/neutron/db/l3_agentschedulers_db.py", line 511, in auto_schedule_routers
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     self, context, host, router_ids)
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/neutron/scheduler/l3_agent_scheduler.py", line 157, in auto_schedule_routers
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     plugin, context, l3_agent)
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/neutron/scheduler/l3_agent_scheduler.py", line 343, in schedule_ha_routers_to_additional_agent
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     agent)
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/neutron/scheduler/l3_agent_scheduler.py", line 321, in create_ha_router_binding
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     port_binding.l3_agent_id = agent['id']
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 490, in __exit__
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     self.rollback()
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     compat.reraise(exc_type, exc_value, exc_tb)
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 487, in __exit__
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     self.commit()
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 392, in commit
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     self._prepare_impl()
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 372, in _prepare_impl
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     self.session.flush()
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2004, in flush
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     self._flush(objects)
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2122, in _flush
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     transaction.rollback(_capture_exception=True)
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     compat.reraise(exc_type, exc_value, exc_tb)
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2086, in _flush
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     flush_context.execute()
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     rec.execute(self)
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     uow
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 170, in save_obj
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     mapper, table, update)
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 712, in _emit_update_statements
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher     (table.description, len(records), rows))
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher StaleDataError: UPDATE statement on table 'ha_router_agent_port_bindings' expected to update 1 row(s); 0 were matched.
2016-09-05 06:52:35.646 22198 TRACE oslo_messaging.rpc.dispatcher 
2016-09-05 06:52:35.647 22198 ERROR oslo_messaging._drivers.common [req-4c2a51b3-df0b-4638-8ecd-facba58686d2 ] Returning exception UPDATE statement on table 'ha_router_agent_port_bindings' expected to update 1 row(s); 0 were matched. to caller

Comment 2 Assaf Muller 2017-01-13 19:08:40 UTC
Status update?

Comment 3 John Schwarz 2017-01-15 09:33:35 UTC
The patch [1] was supposed to solve it, but the backport proved to be non-trivial (Jakub had some comments against the current implementation that required quite a bit of work). Do we still want to fix this?

[1]: https://code.engineering.redhat.com/gerrit/#/c/83440/

Comment 4 Assaf Muller 2017-01-15 18:55:21 UTC
(In reply to John Schwarz from comment #3)
> The patch [1] was supposed to solve it, but the backport proved to be
> non-trivial (Jakub had some comments against the current implementation that
> required quite a bit of work). Do we still want to fix this?
> 
> [1]: https://code.engineering.redhat.com/gerrit/#/c/83440/

Yes, we do.

Comment 5 Assaf Muller 2017-04-26 21:41:03 UTC
Fixed in later OSP versions, the backport proved to be too complicated.