Bug 1510162

Summary: Bug in L3 agent code while cleaning up a router namespace
Product: Red Hat OpenStack Reporter: Brian Haley <bhaley>
Component: openstack-neutronAssignee: Brian Haley <bhaley>
Status: CLOSED ERRATA QA Contact: Toni Freger <tfreger>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.0 (Kilo)CC: akaris, amuller, bhaley, chrisw, ihrachys, jjoyce, nyechiel, pneedle, ragiman, sclewis, srevivo, tfreger
Target Milestone: zstreamKeywords: Triaged, ZStream
Target Release: 7.0 (Kilo)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-neutron-2015.1.4-26.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1508091 Environment:
Last Closed: 2017-12-05 10:47:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1508091    
Bug Blocks: 1510157, 1510159    

Comment 3 Toni Freger 2017-11-26 07:04:16 UTC
Brian,

I've ran rally benchmark test, creation and deletion of 30 routers, 3 concurrent iteration.

on version openstack-neutron-2015.1.4-26.el7ost.noarch

You can find the test here - https://github.com/openstack/rally/blob/793735c152a573d72391a8ac21e2d908b631195a/samples/tasks/scenarios/neutron/create-and-delete-routers.json


2017-11-26 06:14:13.156 15052 ERROR neutron.agent.l3.ha_router [-] Unable to process HA router 59fad7c2-d393-464f-820b-334927047e64 without HA port
2017-11-26 06:14:13.156 15052 TRACE neutron.agent.l3.ha_router None
2017-11-26 06:14:13.156 15052 TRACE neutron.agent.l3.ha_router
2017-11-26 06:14:13.157 15052 ERROR neutron.agent.l3.agent [-] Error while initializing router 59fad7c2-d393-464f-820b-334927047e64
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent Traceback (most recent call last):
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 335, in _router_added
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent     ri.initialize(self.process_monitor)
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 83, in initialize
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent     raise Exception(msg)
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent Exception: Unable to process HA router 59fad7c2-d393-464f-820b-334927047e64 without HA port
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent
2017-11-26 06:14:13.157 15052 ERROR neutron.agent.l3.agent [-] Error while deleting router 59fad7c2-d393-464f-820b-334927047e64
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent Traceback (most recent call last):
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 342, in _router_added
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent     ri.delete(self)
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 359, in delete
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent     self.destroy_state_change_monitor(self.process_monitor)
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent AttributeError: 'HaRouter' object has no attribute 'process_monitor'
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent
2017-11-26 06:14:13.157 15052 ERROR neutron.agent.l3.agent [-] Failed to process compatible router '59fad7c2-d393-464f-820b-334927047e64'
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent Traceback (most recent call last):
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 509, in _process_router_update
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent     self._process_router_if_compatible(router)
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 450, in _process_router_if_compatible
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent     self._process_added_router(router)
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 455, in _process_added_router
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent     self._router_added(router['id'], router)
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 345, in _router_added
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent     router_id)
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent     six.reraise(self.type_, self.value, self.tb)
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 335, in _router_added
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent     ri.initialize(self.process_monitor)
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 83, in initialize
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent     raise Exception(msg)
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent Exception: Unable to process HA router 59fad7c2-d393-464f-820b-334927047e64 without HA port
2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent

Comment 5 Brian Haley 2017-11-27 20:03:59 UTC
Hi Toni,

The first backtrace in Comment #3 looks like another bug in this code path that would be present in all releases.  self.process_monitor is only initialized in a super() call from the HA router initialize code.  In this case initialize() failed early and super() was never called.  I need to open an upstream bug and propose a change there.  This would have been triggered even without the new code from what I can tell and was just a race condition waiting to happen.

The second backtrace in Comment #4 is possibly something new, or could have been fixed upstream already as it looks familiar.  Since it's unrelated I guess I wouldn't necessarily hold things for it.

Let me look at the other bug updates you posted to see if the trace is similar.

Comment 7 Brian Haley 2017-11-28 16:38:48 UTC
Hi Scott,

The second issue (from Comment #4) is unrelated to the changes, so I would consider it new to OSP7.

The first issue (from Comment #3) is related to the changes, but is actually a new bug - i.e. fixing one bug uncovered another.  I am fine with this small change and the one for https://bugzilla.redhat.com/show_bug.cgi?id=1496916 merging which are related since they make the original failure more recoverable and do not fill the log files unnecessarily.

Hopefully Toni will agree.

Comment 9 Brian Haley 2017-11-30 21:55:01 UTC
Scott,

I think we should ship this as-is and I can fix any new bugs going forward.

Toni,

I opened https://bugs.launchpad.net/neutron/+bug/1735557 and have a patch up to fix the other l3-agent issue, not sure if you opened a downstream bug for this yet.  I will need to take a look at the other issue you found as time permits.

Comment 10 Toni Freger 2017-12-04 16:22:54 UTC
Since functionality wasn't damaged it reasonable to work on new bugs and to move this one to verified.

Comment 13 errata-xmlrpc 2017-12-05 10:47:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3381