Brian, I've ran rally benchmark test, creation and deletion of 30 routers, 3 concurrent iteration. on version openstack-neutron-2015.1.4-26.el7ost.noarch You can find the test here - https://github.com/openstack/rally/blob/793735c152a573d72391a8ac21e2d908b631195a/samples/tasks/scenarios/neutron/create-and-delete-routers.json 2017-11-26 06:14:13.156 15052 ERROR neutron.agent.l3.ha_router [-] Unable to process HA router 59fad7c2-d393-464f-820b-334927047e64 without HA port 2017-11-26 06:14:13.156 15052 TRACE neutron.agent.l3.ha_router None 2017-11-26 06:14:13.156 15052 TRACE neutron.agent.l3.ha_router 2017-11-26 06:14:13.157 15052 ERROR neutron.agent.l3.agent [-] Error while initializing router 59fad7c2-d393-464f-820b-334927047e64 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent Traceback (most recent call last): 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 335, in _router_added 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent ri.initialize(self.process_monitor) 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 83, in initialize 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent raise Exception(msg) 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent Exception: Unable to process HA router 59fad7c2-d393-464f-820b-334927047e64 without HA port 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent 2017-11-26 06:14:13.157 15052 ERROR neutron.agent.l3.agent [-] Error while deleting router 59fad7c2-d393-464f-820b-334927047e64 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent Traceback (most recent call last): 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 342, in _router_added 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent ri.delete(self) 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 359, in delete 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent self.destroy_state_change_monitor(self.process_monitor) 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent AttributeError: 'HaRouter' object has no attribute 'process_monitor' 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent 2017-11-26 06:14:13.157 15052 ERROR neutron.agent.l3.agent [-] Failed to process compatible router '59fad7c2-d393-464f-820b-334927047e64' 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent Traceback (most recent call last): 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 509, in _process_router_update 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent self._process_router_if_compatible(router) 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 450, in _process_router_if_compatible 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent self._process_added_router(router) 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 455, in _process_added_router 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent self._router_added(router['id'], router) 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 345, in _router_added 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent router_id) 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__ 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb) 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 335, in _router_added 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent ri.initialize(self.process_monitor) 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 83, in initialize 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent raise Exception(msg) 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent Exception: Unable to process HA router 59fad7c2-d393-464f-820b-334927047e64 without HA port 2017-11-26 06:14:13.157 15052 TRACE neutron.agent.l3.agent
Hi Toni, The first backtrace in Comment #3 looks like another bug in this code path that would be present in all releases. self.process_monitor is only initialized in a super() call from the HA router initialize code. In this case initialize() failed early and super() was never called. I need to open an upstream bug and propose a change there. This would have been triggered even without the new code from what I can tell and was just a race condition waiting to happen. The second backtrace in Comment #4 is possibly something new, or could have been fixed upstream already as it looks familiar. Since it's unrelated I guess I wouldn't necessarily hold things for it. Let me look at the other bug updates you posted to see if the trace is similar.
Hi Scott, The second issue (from Comment #4) is unrelated to the changes, so I would consider it new to OSP7. The first issue (from Comment #3) is related to the changes, but is actually a new bug - i.e. fixing one bug uncovered another. I am fine with this small change and the one for https://bugzilla.redhat.com/show_bug.cgi?id=1496916 merging which are related since they make the original failure more recoverable and do not fill the log files unnecessarily. Hopefully Toni will agree.
Scott, I think we should ship this as-is and I can fix any new bugs going forward. Toni, I opened https://bugs.launchpad.net/neutron/+bug/1735557 and have a patch up to fix the other l3-agent issue, not sure if you opened a downstream bug for this yet. I will need to take a look at the other issue you found as time permits.
Since functionality wasn't damaged it reasonable to work on new bugs and to move this one to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3381