Bug 1496916
Summary: | Error while deleting router - infinite log loop | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Andreas Karis <akaris> | ||||
Component: | openstack-neutron | Assignee: | Ihar Hrachyshka <ihrachys> | ||||
Status: | CLOSED ERRATA | QA Contact: | Toni Freger <tfreger> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.0 (Kilo) | CC: | amuller, bhaley, chrisw, jlibosva, nyechiel, sclewis, srevivo, stphilli, tfreger | ||||
Target Milestone: | zstream | Keywords: | Triaged, ZStream | ||||
Target Release: | 7.0 (Kilo) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | openstack-neutron-2015.1.4-24.el7ost | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-12-05 10:47:49 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Andreas Karis
2017-09-28 18:35:24 UTC
About the L3 agent router deletion issue: ~~~ l3-agent.log:2017-09-26 15:01:36.318 7399 ERROR neutron.agent.l3.agent [-] Error while deleting router 12f8b6b7-ce4d-492c-93b6-06c6f2e3a894 l3-agent.log:2017-09-26 15:01:36.323 7399 ERROR neutron.agent.l3.agent [-] Error while deleting router 0c780871-ad14-40f8-8e09-7ee786abf532 ~~~ ~~~ l3-agent.log:2017-09-26 15:01:36.323 7399 ERROR neutron.agent.l3.agent [-] Error while deleting router 0c780871-ad14-40f8-8e09-7ee786abf532 l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent Traceback (most recent call last): l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 355, in _safe_router_removed l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent self._router_removed(router_id) l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 372, in _router_removed l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent ri.delete(self) l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 355, in delete l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent self.destroy_state_change_monitor(self.process_monitor) l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 320, in destroy_state_change_monitor l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent pm = self._get_state_change_monitor_process_manager() l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 291, in _get_state_change_monitor_process_manager l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent default_cmd_callback=self._get_state_change_monitor_callback()) l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 294, in _get_state_change_monitor_callback l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent ha_device = self.get_ha_device_name() l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 138, in get_ha_device_name l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent return (HA_DEV_PREFIX + self.ha_port['id'])[:self.driver.DEV_NAME_LEN] l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent TypeError: 'NoneType' object has no attribute '__getitem__' l3-agent.log-2017-09-26 15:01:36.323 7399 TRACE neutron.agent.l3.agent l3-agent.log-2017-09-26 15:01:36.324 7399 INFO neutron.agent.l3.agent [-] Finished a router update for 0c780871-ad14-40f8-8e09-7ee786abf532 ~~~ This is failing here: ~~~ 30 HA_DEV_PREFIX = 'ha-' (...) 135 136 def get_ha_device_name(self): 137 return (HA_DEV_PREFIX + self.ha_port['id'])[:self.driver.DEV_NAME_LEN] 138 ~~~ So this is getting a substring from ha-<port ID>, and give the error message. self.ha_port['id'] returns None, and the substring operation fails: ~~~ >>> a = None >>> a[:2] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'NoneType' object has no attribute '__getitem__' ~~~ The installed RPMs are: ~~~ cat installed-rpms | grep neutron openstack-neutron-2015.1.4-16.el7ost.noarch Mon Jul 31 20:49:33 2017 openstack-neutron-common-2015.1.4-16.el7ost.noarch Mon Jul 31 20:49:32 2017 openstack-neutron-fwaas-2015.1.2-1.el7ost.noarch Thu Feb 23 15:54:20 2017 openstack-neutron-lbaas-2015.1.4-1.el7ost.noarch Wed Jul 26 17:37:42 2017 openstack-neutron-ml2-2015.1.4-16.el7ost.noarch Mon Jul 31 20:49:34 2017 openstack-neutron-openvswitch-2015.1.4-16.el7ost.noarch Mon Jul 31 20:49:34 2017 python-neutron-2015.1.4-16.el7ost.noarch Mon Jul 31 20:49:31 2017 python-neutron-fwaas-2015.1.2-1.el7ost.noarch Thu Feb 23 15:54:19 2017 python-neutron-lbaas-2015.1.4-1.el7ost.noarch Wed Jul 26 17:35:47 2017 python-neutronclient-2.4.0-2.el7ost.noarch Thu Feb 23 15:52:56 2017 ~~~ Compared to a lab: ~~~ ---> Package openstack-neutron.noarch 0:2015.1.4-14.el7ost will be updated ---> Package openstack-neutron.noarch 0:2015.1.4-21.el7ost will be an update ---> Package openstack-neutron-common.noarch 0:2015.1.4-14.el7ost will be updated ---> Package openstack-neutron-common.noarch 0:2015.1.4-21.el7ost will be an update ---> Package openstack-neutron-ml2.noarch 0:2015.1.4-14.el7ost will be updated ---> Package openstack-neutron-ml2.noarch 0:2015.1.4-21.el7ost will be an update ---> Package openstack-neutron-openvswitch.noarch 0:2015.1.4-14.el7ost will be updated ---> Package openstack-neutron-openvswitch.noarch 0:2015.1.4-21.el7ost will be an update ~~~ There are several bugzillas and upstream bugs about this: https://bugzilla.redhat.com/show_bug.cgi?id=1285882 https://bugs.launchpad.net/neutron/+bug/1606844 https://bugs.launchpad.net/neutron/+bug/1605546 https://bugs.launchpad.net/neutron/+bug/1533441 Additionally to the above, there is this warning message (right after the stack trace): ~~~ 2017-09-26 15:01:36.325 7399 WARNING neutron.agent.linux.iptables_manager [-] Tried to remove rule that was not there: 'INPUT' '-m mar k --mark 0x1 -j ACCEPT' True False 2017-09-26 15:01:36.325 7399 WARNING neutron.agent.linux.iptables_manager [-] Tried to remove rule that was not there: 'INPUT' '-p tcp -m tcp --dport 9697 -j DROP' True False 2017-09-26 15:01:36.325 7399 WARNING neutron.agent.linux.iptables_manager [-] Tried to remove rule that was not there: 'PREROUTING' '- d 169.254.169.254/32 -p tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0xffffffff' True False 2017-09-26 15:01:36.326 7399 WARNING neutron.agent.linux.iptables_manager [-] Tried to remove rule that was not there: 'PREROUTING' '- d 169.254.169.254/32 -i qr-+ -p tcp -m tcp --dport 80 -j REDIRECT --to-port 9697' True False ~~~ There is https://bugs.launchpad.net/neutron/+bug/1606844 Which was fixed with https://review.openstack.org/#/c/354406/ on Aug 11, 2016 I picked one example file which was changed to see if this was already in OSP 7: https://review.openstack.org/#/c/354406/1/neutron/agent/l3/namespaces.py But it doesn't seem so (logical, OSP 7 is kilo, and upstream only fixes the bug until OSP 8): ~~~ [root@undercloud-7 ~]# rpm -qf /usr/lib/python2.7/site-packages/neutron/agent/l3/namespaces.py python-neutron-2015.1.4-21.el7ost.noarch [root@undercloud-7 ~]# egrep -v '^#' /usr/lib/python2.7/site-packages/neutron/agent/l3/namespaces.py | head -n 10 from oslo_log import log as logging from neutron.agent.linux import ip_lib from neutron.i18n import _LE LOG = logging.getLogger(__name__) NS_PREFIX = 'qrouter-' INTERNAL_DEV_PREFIX = 'qr-' [root@undercloud-7 ~]# ~~~ ~~~ [root@undercloud-7 ~]# grep check_ns_exi /usr/lib/python2.7/site-packages/neutron/agent/l3/namespaces.py [root@undercloud-7 ~]# ~~~ Thanks, Andreas Easiest steps to reproduce would be: - create a router - wait until l3 agent creates its namespace - remove the namespace using 'ip netns' tool - remove the router via neutron api - observe that l3 agent gets into the error loop trying to remove the namespace for the router that no longer exists Created attachment 1359133 [details]
l3-agent.log
Hi Toni, The traceback here seems similar to that from my comment in https://bugzilla.redhat.com/show_bug.cgi?id=1510162 My comment from there: "The backtrace looks like another bug in this code path that would be present in all releases. self.process_monitor is only initialized in a super() call from the HA router initialize code. In this case initialize() failed early and super() was never called. I need to open an upstream bug and propose a change there. This would have been triggered even without the new code from what I can tell and was just a race condition waiting to happen." For that reason I feel this is a new problem. The failure is related to the changes, but is actually a new bug - i.e. fixing one bug uncovered another. I am fine with this small change and the one for https://bugzilla.redhat.com/show_bug.cgi?id=1510162 merging which are related since they make the original failure more recoverable and do not fill the log files unnecessarily. Hopefully Toni will agree. Since functionality wasn't damaged it reasonable to work on new bugs and to move this one to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3381 |