Hide Forgot
+++ This bug was initially created as a clone of Bug #1505005 +++ What problem/issue/behavior are you having trouble with? What do you expect to see? After updating to the latest OSP 9 packages, the disk was filled with traceback by the Neutron l3-agent log. The router had been deleted prior to the update and node restart, so it seems we were stuck in an odd edge case/race condition here. Expected behavior is that l3-agent cleans up namespaces and HA ports without filling logs with traceback. Where are you experiencing the behavior? What environment? RHEL 7.4, OSP 9 openstack-neutron-8.4.0-3.el7ost.noarch openstack-neutron-openvswitch-8.4.0-3.el7ost.noarch When does the behavior occur? Frequently? Repeatedly? At certain times? The traceback loop started after the node (and thus l3-agent) was rebooted. What information can you provide around timeframes and the business impact? The customer would like to perform the minor OSP 9 update in their prod environment, but is concerned about the impact (disk filing, neutron slowdown) of this bug, and would like to see the fix backported before moving forward. --- Additional comment from Andreas Karis on 2017-10-20 18:51:19 EDT --- [root@798007-controller01 tmp]# yum history packages-info python-neutron Loaded plugins: product-id, rhnplugin, search-disabled-repos, subscription-manager This system is receiving updates from RHN Classic or Red Hat Satellite. Transaction ID : 50 Begin time : Thu Oct 12 16:07:55 2017 Package : python-neutron-1:8.4.0-3.el7ost.noarch State : Updated Size : 9,213,330 Build host : x86-041.build.eng.bos.redhat.com Build time : Thu Aug 10 01:39:46 2017 Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. License : ASL 2.0 URL : http://launchpad.net/neutron/ Source RPM : openstack-neutron-8.4.0-3.el7ost.src.rpm Commit Time : Wed Aug 9 12:00:00 2017 Committer : Jason Joyce <jjoyce> Reason : dep Command Line : -q -y update From repo : rhel-x86_64-server-7-ost-9 Installed by : root <root> Changed by : System <unset> Transaction ID : 50 Begin time : Thu Oct 12 16:07:55 2017 Package : python-neutron-1:8.4.0-6.el7ost.noarch State : Update Size : 9,217,452 Build host : x86-019.build.eng.bos.redhat.com Build time : Tue Sep 19 16:01:21 2017 Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. License : ASL 2.0 URL : http://launchpad.net/neutron/ Source RPM : openstack-neutron-8.4.0-6.el7ost.src.rpm Commit Time : Tue Sep 19 12:00:00 2017 Committer : Ihar Hrachyshka <ihrachys> Reason : dep Command Line : -y upgrade openstack-neutron-8.4.0-6.el7ost.noarch.rpm openstack-neutron-bgp-dragent-8.4.0-6.el7ost.noarch.rpm openstack-neutron-common-8.4.0-6.el7ost.noarch.rpm openstack-neutron-linuxbridge-8. 4.0-6.el7ost.noarch.rpm openstack-neutron-macvtap-agent-8.4.0-6.el7ost.noarch.rpm openstack-neutron-metering-agent-8.4.0-6.el7ost.noarch.rpm openstack-neutron-ml2-8.4.0-6.el7ost.noarch.rpm openstack-neutron-open vswitch-8.4.0-6.el7ost.noarch.rpm openstack-neutron-rpc-server-8.4.0-6.el7ost.noarch.rpm openstack-neutron-sriov-nic-agent-8.4.0-6.el7ost.noarch.rpm python-neutron-8.4.0-6.el7ost.noarch.rpm python-neutron-tests- 8.4.0-6.el7ost.noarch.rpm From repo : /python-neutron-8.4.0-6.el7ost.noarch Installed by : root <root> Changed by : <rack> Transaction ID : 49 Begin time : Tue Sep 26 03:06:34 2017 Package : python-neutron-1:8.1.2-13.el7ost.noarch State : Updated Size : 9,170,445 Build host : x86-039.build.eng.bos.redhat.com Build time : Thu Nov 24 12:26:17 2016 Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. License : ASL 2.0 URL : http://launchpad.net/neutron/ Source RPM : openstack-neutron-8.1.2-13.el7ost.src.rpm Commit Time : Thu Nov 24 12:00:00 2016 Committer : Ihar Hrachyshka <ihrachys> Reason : dep Command Line : -q -y update From repo : rhel-x86_64-server-7-ost-9 Installed by : root <root> Changed by : System <unset> Transaction ID : 49 Begin time : Tue Sep 26 03:06:34 2017 Package : python-neutron-1:8.4.0-3.el7ost.noarch State : Update Size : 9,213,330 Build host : x86-041.build.eng.bos.redhat.com Build time : Thu Aug 10 01:39:46 2017 Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. License : ASL 2.0 URL : http://launchpad.net/neutron/ Source RPM : openstack-neutron-8.4.0-3.el7ost.src.rpm Commit Time : Wed Aug 9 12:00:00 2017 Committer : Jason Joyce <jjoyce> Reason : dep Command Line : -q -y update From repo : rhel-x86_64-server-7-ost-9 Installed by : root <root> Changed by : System <unset> Transaction ID : 46 Begin time : Sat Dec 31 23:09:32 2016 Package : python-neutron-1:8.1.2-4.el7ost.noarch State : Updated Size : 9,146,986 Build host : x86-019.build.eng.bos.redhat.com Build time : Mon Aug 15 11:40:12 2016 Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. License : ASL 2.0 URL : http://launchpad.net/neutron/ Source RPM : openstack-neutron-8.1.2-4.el7ost.src.rpm Commit Time : Mon Aug 15 12:00:00 2016 Committer : Jakub Libosvar <jlibosva> Reason : dep Command Line : -y -q update From repo : rhel-x86_64-server-7-ost-9 Installed by : root <root> Changed by : System <unset> Transaction ID : 46 Begin time : Sat Dec 31 23:09:32 2016 Package : python-neutron-1:8.1.2-13.el7ost.noarch State : Update Size : 9,170,445 Build host : x86-039.build.eng.bos.redhat.com Build time : Thu Nov 24 12:26:17 2016 Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. License : ASL 2.0 URL : http://launchpad.net/neutron/ Source RPM : openstack-neutron-8.1.2-13.el7ost.src.rpm Commit Time : Thu Nov 24 12:00:00 2016 Committer : Ihar Hrachyshka <ihrachys> Reason : dep Command Line : -q -y update From repo : rhel-x86_64-server-7-ost-9 Installed by : root <root> Changed by : System <unset> Transaction ID : 43 Begin time : Wed Sep 21 20:06:38 2016 Package : python-neutron-1:7.0.4-3.el7ost.noarch State : Updated Size : 8,096,833 Build host : x86-030.build.eng.bos.redhat.com Build time : Wed May 4 12:35:17 2016 Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. License : ASL 2.0 URL : http://launchpad.net/neutron/ Source RPM : openstack-neutron-7.0.4-3.el7ost.src.rpm Commit Time : Thu Apr 21 12:00:00 2016 Committer : Nir Magnezi <nmagnezi> Reason : dep Command Line : -v -y install python-psutil python-debtcollector plotnetcfg sos python-networking-cisco python-UcsSdk device-mapper-multipath python-networking-bigswitch openstack-neutron-bigswitch-lldp opensta ck-neutron-bigswitch-agent From repo : rhos-8.0-signed Installed by : root <root> Transaction ID : 43 Begin time : Wed Sep 21 20:06:38 2016 Package : python-neutron-1:8.1.2-4.el7ost.noarch State : Update Size : 9,146,986 Build host : x86-019.build.eng.bos.redhat.com Build time : Mon Aug 15 11:40:12 2016 Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. License : ASL 2.0 URL : http://launchpad.net/neutron/ Source RPM : openstack-neutron-8.1.2-4.el7ost.src.rpm Commit Time : Mon Aug 15 12:00:00 2016 Committer : Jakub Libosvar <jlibosva> Reason : dep Command Line : -y -q update From repo : rhel-x86_64-server-7-ost-9 Installed by : root <root> Changed by : System <unset> Transaction ID : 5 Begin time : Fri Jun 3 17:25:28 2016 Package : python-neutron-1:7.0.4-3.el7ost.noarch State : Dep-Install Size : 8,096,833 Build host : x86-030.build.eng.bos.redhat.com Build time : Wed May 4 12:35:17 2016 Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. License : ASL 2.0 URL : http://launchpad.net/neutron/ Source RPM : openstack-neutron-7.0.4-3.el7ost.src.rpm Commit Time : Thu Apr 21 12:00:00 2016 Committer : Nir Magnezi <nmagnezi> Reason : dep Command Line : -v -y install python-psutil python-debtcollector plotnetcfg sos python-networking-cisco python-UcsSdk device-mapper-multipath python-networking-bigswitch openstack-neutron-bigswitch-lldp opensta ck-neutron-bigswitch-agent From repo : rhos-8.0-signed Installed by : root <root> history packages-info --- Additional comment from Andreas Karis on 2017-10-20 18:51:29 EDT --- according to that history: September 26th, upgrade of neutron components from 8.1.2-13 (version without fix) to 8.4.0-3 (version with fix) ~~~ Transaction ID : 50 Begin time : Thu Oct 12 16:07:55 2017 Package : python-neutron-1:8.4.0-3.el7ost.noarch (...) Transaction ID : 50 Begin time : Thu Oct 12 16:07:55 2017 Package : python-neutron-1:8.4.0-6.el7ost.noarch (...) Transaction ID : 49 Begin time : Tue Sep 26 03:06:34 2017 Package : python-neutron-1:8.1.2-13.el7ost.noarch (...) Transaction ID : 49 Begin time : Tue Sep 26 03:06:34 2017 Package : python-neutron-1:8.4.0-3.el7ost.noarch (...) ~~~ --- Additional comment from Andreas Karis on 2017-10-20 18:52:03 EDT --- I gunzip'ed all the files and checked for logs that have the error message: ~~~ [akaris@collab-shell neutron]$ grep 'Error while deleting' *l3* -l l3-agent.log-20161126 l3-agent.log-20161127 l3-agent.log-20161128 l3-agent.log-20161129 l3-agent.log-20161225 l3-agent.log-20161226 l3-agent.log-20161227 l3-agent.log-20161228 l3-agent.log-20161229 l3-agent.log-20161230 l3-agent.log-20161231 l3-agent.log-20170101 l3-agent.log-20170927 ~~~ With the last occurrence at: ~~~ [akaris@collab-shell neutron]$ grep 'Error while' !$ | tail -n1 grep 'Error while' l3-agent.log-20170927 | tail -n1 2017-09-27 03:42:42.365 25246 ERROR neutron.agent.l3.agent [-] Error while deleting router 653d1e49-bdd3-4e8d-8899-cf4912995aaf ~~~ However, we do have a little gap here: ~~~ [akaris@collab-shell neutron]$ head l3-agent.log -n2 2017-09-28 15:30:30.839 20914 INFO oslo_rootwrap.client [req-3eab93f7-39ad-48c6-99ab-ff22154862c7 - - - - -] Stopping rootwrap daemon process with pid=21622 2017-09-28 16:20:03.456 25111 INFO neutron.common.config [-] Logging enabled! ~~~ ~~~ [akaris@collab-shell neutron]$ zcat l3-agent.log-20170927.gz | tail -n2 cat l3-agent.log-20170928 2017-09-27 03:42:42.365 25246 ERROR neutron.agent.l3.agent TypeError: 'NoneType' object has no attribute '__getitem__' 2017-09-27 03:42:42.365 25246 ERROR neutron.agent.l3.agent [akaris@collab-shell neutron]$ cat l3-agent.log-20170928 [akaris@collab-shell neutron]$ ~~~ --- Additional comment from Andreas Karis on 2017-10-20 18:52:27 EDT --- Begin time : Tue Sep 26 03:06:34 2017 upgrade from Package : python-neutron-1:8.1.2-13.el7ost.noarch to Package : python-neutron-1:8.4.0-3.el7ost.noarch https://review.openstack.org/#/c/353010/2 is commit 3f02209b77bb5d0718818e73dcb3aa28ea542580 Which was part of this rebase: https://bugzilla.redhat.com/show_bug.cgi?id=1392743 --- openstack-neutron-8.3.0-2 (...) 3f02209 L3 agent: check router namespace existence before delete (...) --- Additional comment from Andreas Karis on 2017-10-20 18:52:43 EDT --- That controller was rebooted a few hours after the updated neutron package was installed (from /var/log/messages-20171001.gz) : Sep 26 08:13:32 798007-controller01 kernel: Linux version 3.10.0-693.2.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Sat Sep 9 03:55:24 EDT 2017 Sep 26 08:13:32 798007-controller01 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.2.2.el7.x86_64 root=LABEL=img-rootfs ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet LANG=en_US.UTF-8 Also the spec file for the rpm mandates a service restart. --- Additional comment from Andreas Karis on 2017-10-20 18:58:14 EDT --- The customer wants to know why the issue was still seen at 2017-09-27 03:42:42.365 When the controller was rebooted at: Sep 26 08:13:32 Additionally, this should be fixed immediately after an upgrade given that the scripts will restart the services after an upgrade: ~~~ [root@undercloud-6 ~]# rpm -q --scripts openstack-neutron | tail -n 40 if [ $1 -eq 0 ] ; then # Package removal, not upgrade systemctl --no-reload disable neutron-ovs-cleanup.service > /dev/null 2>&1 || : systemctl stop neutron-ovs-cleanup.service > /dev/null 2>&1 || : fi if [ $1 -eq 0 ] ; then # Package removal, not upgrade systemctl --no-reload disable neutron-linuxbridge-cleanup.service > /dev/null 2>&1 || : systemctl stop neutron-linuxbridge-cleanup.service > /dev/null 2>&1 || : fi postuninstall scriptlet (using /bin/sh): systemctl daemon-reload >/dev/null 2>&1 || : if [ $1 -ge 1 ] ; then # Package upgrade, not uninstall systemctl try-restart neutron-dhcp-agent.service >/dev/null 2>&1 || : fi systemctl daemon-reload >/dev/null 2>&1 || : if [ $1 -ge 1 ] ; then # Package upgrade, not uninstall systemctl try-restart neutron-l3-agent.service >/dev/null 2>&1 || : fi systemctl daemon-reload >/dev/null 2>&1 || : if [ $1 -ge 1 ] ; then # Package upgrade, not uninstall systemctl try-restart neutron-metadata-agent.service >/dev/null 2>&1 || : fi systemctl daemon-reload >/dev/null 2>&1 || : if [ $1 -ge 1 ] ; then # Package upgrade, not uninstall systemctl try-restart neutron-server.service >/dev/null 2>&1 || : fi ~~~ --- Additional comment from Andreas Karis on 2017-10-20 18:59:39 EDT --- Sos reports and neutron logs can be found on: http://collab-shell.usersys.redhat.com/01942118/ I asked the customer to provide a tar archive of the entire /var/log from this controller as well and will pull this as soon as we have it. --- Additional comment from Jakub Libosvar on 2017-10-23 09:31:23 EDT --- It might be a known issue Brian worked on, assigning it to him to triage. --- Additional comment from on 2017-10-24 11:01:25 EDT --- Customer uploaded additional logs in anticipation of further reuqests they can be found on collab-shell http://collab-shell.usersys.redhat.com/01942118/ --- Additional comment from Brian Haley on 2017-10-25 11:49:44 EDT --- Here's the whole traceback: 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent [-] Error while deleting router 653d1e49-bdd3-4e8d-8899-cf4912995aaf 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 381, in _safe_router_removed 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent self._router_removed(router_id) 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 399, in _router_removed 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent ri.delete(self) 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 414, in delete 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent self.destroy_state_change_monitor(self.process_monitor) 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 356, in destroy_state_change_monitor 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent pm = self._get_state_change_monitor_process_manager() 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 327, in _get_state_change_monitor_process_manager 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent default_cmd_callback=self._get_state_change_monitor_callback()) 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 330, in _get_state_change_monitor_callback 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent ha_device = self.get_ha_device_name() 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 168, in get_ha_device_name 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent return (HA_DEV_PREFIX + self.ha_port['id'])[:self.driver.DEV_NAME_LEN] 2017-09-27 00:13:16.276 25246 ERROR neutron.agent.l3.agent TypeError: 'NoneType' object has no attribute '__getitem__' So self.ha_port is None. I also see other traces where self.keepalived_manager is None as well. This could be related to https://bugs.launchpad.net/neutron/+bug/1726370 which was just opened the other day, I will try to make progress on that this week. --- Additional comment from Andreas Karis on 2017-10-31 12:57:24 EDT --- Hi, I'm just checking in to see if you could make any progress on https://bugs.launchpad.net/neutron/+bug/1726370 and if you think that this bugzilla is related. Thanks, Andreas --- Additional comment from Brian Haley on 2017-10-31 13:05:10 EDT --- I sent out an upstream review the other day, but Anil found an issue with it, I'll send an update today. --- Additional comment from Andreas Karis on 2017-11-09 18:56:38 EST --- Hi, I hate that I have to ask again, but where are things standing? It seems that a fix was released for Pike -- are we getting a backport for newton? Sorry to be harassing you, I can also simply watch the upstream bug for further details if that's best. - Andreas --- Additional comment from Brian Haley on 2017-11-09 20:48:05 EST --- Yes, the upstream change to master just merged, it took longer than I thought for us to decide on the best solution. Now that the Pike and Ocata changes were proposed I can cherry-pick them tomorrow to the OSP branches. --- Additional comment from Andreas Karis on 2017-11-10 10:57:15 EST --- Thanks for the info and the great work! --- Additional comment from Brian Haley on 2017-11-10 14:42:47 EST --- This bug was against OSP10, I'll create an 11 and 9 one to track the other backports.
Toni, The arping tracebacks are "normal" since the process is running asynchronously, and creating/destroying routers could trigger it since the interface could go away: WARNING neutron.agent.linux.ip_lib [-] Interface qr-beb01b7d-ac might have been deleted concurrently: ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: arping: Device qr-beb01b7d-ac not available. The ovs agent trace doesn't look familiar: WARNING oslo_rootwrap.client [req-8380325e-48a5-4c8e-9f45-1e5975fe7c60 - - - - -] Leaving behind already spawned process with pid 117570, root should kill it if it's still there (I can't): error: [Errno 32] Broken pipe Doesn't look related to the change. I'd have to get more context for the server error, but since the change is mainly on the agent side and doesn't change what messages the agent sends to the server, I'm not sure it's related.
I looked at the logs for the second error, and it looks like that warning is from oslo.rootwrap, the OVS agent had a problem communicating with it (broken pipe) so started a new one. so it seems like a possible event that the agent recovered from gracefully. The initial rootwrap daemon was started about 5 hours previous: openvswitch-agent.log:2017-11-30 08:40:02.308 117309 INFO oslo_rootwrap.client [req-adce3282-330c-4b34-ab23-9a52662f3824 - - - - -] Spawned new rootwrap daemon process with pid=117570 openvswitch-agent.log:2017-11-30 13:28:36.665 117309 WARNING oslo_rootwrap.client [req-8380325e-48a5-4c8e-9f45-1e5975fe7c60 - - - - -] Leaving behind already spawned process with pid 117570, root should kill it if it's still there (I can't): error: [Errno 32] Broken pipe And I don't see any issues in the OVS agent log regarding having a problem running rootwrap after it re-spawned it, so don't think it's an issue.
Moving to verified, according to Brian all raised warnings are correct behavior.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462