Description of problem: One of our cu has reported the problem of stale namespace entries in there OSP-13 env due to which original namespace entries are not accessible. I do see similar problem in my lab env too & it seems we have a possible WA of controller reboot. But cu is looking for reason and permanent fix of this. Here is example from my env:- ~~~ [heat-admin@overcloud-controller-0 ~]$ sudo ip netns RTNETLINK answers: Invalid argument RTNETLINK answers: Invalid argument qrouter-2d96d97f-a5ba-42f6-8f2d-dcff61fa8d71 (id: 2) RTNETLINK answers: Invalid argument qdhcp-09188fe6-70bc-4f76-b12b-bcb425a86acf RTNETLINK answers: Invalid argument qrouter-71e1cdce-fe55-4806-a33a-2374f7efa62f [heat-admin@overcloud-controller-0 ~]$ sudo -i [root@overcloud-controller-0 ~]# [root@overcloud-controller-0 ~]# [root@overcloud-controller-0 ~]# [root@overcloud-controller-0 ~]# docker ps^C [root@overcloud-controller-0 ~]# docker exec -it -u root neutron_l3_agent /bin/bash ()[root@overcloud-controller-0 /]# ()[root@overcloud-controller-0 /]# neutron-netns-cleanup 2019-10-04 11:37:03.571 960732 INFO neutron.common.config [-] Logging enabled! 2019-10-04 11:37:03.572 960732 INFO neutron.common.config [-] /usr/bin/neutron-netns-cleanup version 12.0.6 2019-10-04 11:37:03.574 960732 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', 'privsep-helper', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpHUf_ML/privsep.sock'] 2019-10-04 11:37:04.660 960732 INFO oslo.privsep.daemon [-] Spawned new privsep daemon via rootwrap 2019-10-04 11:37:04.581 960978 INFO oslo.privsep.daemon [-] privsep daemon starting 2019-10-04 11:37:04.597 960978 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0 2019-10-04 11:37:04.606 960978 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/none 2019-10-04 11:37:04.606 960978 INFO oslo.privsep.daemon [-] privsep daemon running as pid 960978 2019-10-04 11:37:04.929 960732 ERROR neutron.agent.linux.utils [-] Exit code: 1; Stdin: ; Stdout: ; Stderr: setting the network namespace "qdhcp-09188fe6-70bc-4f76-b12b-bcb425a86acf" failed: Invalid argument [root@overcloud-controller-0 /]# exit exit [root@overcloud-controller-0 ~]# ip netns RTNETLINK answers: Invalid argument RTNETLINK answers: Invalid argument qrouter-2d96d97f-a5ba-42f6-8f2d-dcff61fa8d71 (id: 2) RTNETLINK answers: Invalid argument qdhcp-09188fe6-70bc-4f76-b12b-bcb425a86acf RTNETLINK answers: Invalid argument qrouter-71e1cdce-fe55-4806-a33a-2374f7efa62f ~~~ Post reboot:- ~~~ [heat-admin@overcloud-controller-0 ~]$ sudo ip netns list qrouter-2d96d97f-a5ba-42f6-8f2d-dcff61fa8d71 (id: 1) qdhcp-7e1e0798-610f-43b9-88e2-9bcb36dc7264 (id: 0) ~~~ Version-Release number of selected component (if applicable): OSP13 ~~~ Neutron packages from cu env [ravsingh@supportshell 02487105]$ less 10-sosreport-1007834-controller03-02487105-2019-10-04-evuikva.tar.xz/sosreport-1007834-controller03-02487105-2019-10-04-evuikva/installed-rpms |grep -i neutron openstack-neutron-12.0.6-10.el7ost.noarch Sat Sep 14 05:47:44 2019 openstack-neutron-common-12.0.6-10.el7ost.noarch Sat Sep 14 05:47:43 2019 openstack-neutron-l2gw-agent-12.0.2-0.20180412115803.a9f8009.el7ost.noarch Wed Aug 7 00:05:23 2019 openstack-neutron-lbaas-12.0.1-0.20181019202917.b9b6b6a.el7ost.noarch Sat Sep 14 05:47:58 2019 openstack-neutron-lbaas-ui-4.0.1-0.20181115043347.7f2010d.el7ost.noarch Wed Aug 7 00:05:42 2019 openstack-neutron-linuxbridge-12.0.6-10.el7ost.noarch Sat Sep 14 05:47:59 2019 openstack-neutron-metering-agent-12.0.6-10.el7ost.noarch Sat Sep 14 05:47:59 2019 openstack-neutron-ml2-12.0.6-10.el7ost.noarch Sat Sep 14 05:47:44 2019 openstack-neutron-openvswitch-12.0.6-10.el7ost.noarch Sat Sep 14 05:47:59 2019 openstack-neutron-sriov-nic-agent-12.0.6-10.el7ost.noarch Sat Sep 14 05:47:59 2019 puppet-neutron-12.4.1-7.el7ost.noarch Wed Aug 7 00:09:03 2019 python2-neutronclient-6.7.0-1.el7ost.noarch Wed Aug 7 00:03:57 2019 python2-neutron-lib-1.13.0-1.el7ost.noarch Wed Aug 7 00:03:59 2019 python-neutron-12.0.6-10.el7ost.noarch Sat Sep 14 05:47:43 2019 python-neutron-lbaas-12.0.1-0.20181019202917.b9b6b6a.el7ost.noarch Sat Sep 14 05:47:44 2019 Packages from my lab env ~~~ [root@overcloud-controller-1 ~]# rpm -qa | grep -i neutron python2-neutronclient-6.7.0-1.el7ost.noarch python-neutron-lbaas-12.0.1-0.20181019202915.b9b6b6a.el7ost.noarch openstack-neutron-metering-agent-12.0.5-11.el7ost.noarch openstack-neutron-lbaas-ui-4.0.1-0.20181115043347.7f2010d.el7ost.noarch python-neutron-12.0.5-11.el7ost.noarch openstack-neutron-12.0.5-11.el7ost.noarch openstack-neutron-l2gw-agent-12.0.2-0.20180412115803.a9f8009.el7ost.noarch openstack-neutron-ml2-12.0.5-11.el7ost.noarch openstack-neutron-openvswitch-12.0.5-11.el7ost.noarch openstack-neutron-sriov-nic-agent-12.0.5-11.el7ost.noarch python2-neutron-lib-1.13.0-1.el7ost.noarch puppet-neutron-12.4.1-5.el7ost.noarch openstack-neutron-common-12.0.5-11.el7ost.noarch openstack-neutron-lbaas-12.0.1-0.20181019202915.b9b6b6a.el7ost.noarch openstack-neutron-linuxbridge-12.0.5-11.el7ost.noarch ~~~ Do let us know if further inputs are required. Sosreport from controller is available on case. How reproducible: reproduced in lab Steps to Reproduce: 1. 2. 3. Actual results: Stale namespace entries Expected results: should not have stale entries. Additional info: Although it's doesn't seems to be urgent issue but cu wants resolution on priority. So setting BZ priority as case priority.
kernel-3.10.0-1062.1.1.el7.x86_64 Sat Sep 14 05:48:05 2019
(In reply to Ravi Singh from comment #0) > How reproducible: > reproduced in lab > > Steps to Reproduce: > 1. > 2. > 3. Please provide instructions on how to reproduce. Devel can't fix something if we don't describe how to actually get the system into an error state.
Hi Jamie, I am not sure how to reproduce this..since once customer reported issue..I saw the same in my environment & opened up this BZ. Please let me know if you want to have a look on my env.
I suggest you continue to work on it, so that you can provide a set of steps which reproduce the issue from boot.
Possible workaround for this issue: docker restart neutron_l3_agent; sleep 2; for i in $(ip netns 2>/dev/null | grep -v "id:" | sort); do docker exec -it -u 0 neutron_l3_agent ip netns delete $i; sleep 1; done; ip netns; docker restart neutron_l3_agent ; ip netns
CU update ~~~ Hi Tim, David, We've been able to reproduce the issue without CrowdStrike installed. So we can disregard that as being a factor for now. Simply running the tempest neutron test scenarios is enough to see the namespaces go stale [1]. We also updated to z9 last night, it's the same issue on that release. [1] tempest run -r neutron ~~~
In tripleo, we've worked around this by creating a service that is started on boot which creates a placeholder namespace to ensure the folders are created with the created shared nature. This patch was landed in the Stein timeframe (https://review.rdoproject.org/r/#/c/17078/) in the paunch packaging. This patch assumes an updated version of iproute2 and pyroute2 which have the fixes for the shared nature. I've proposed a backport of this patch for upstream Queens/Rocky which already have the updated iproute packaging. I've taking this bug over for the paunch patch get it landed. We need to have https://bugzilla.redhat.com/show_bug.cgi?id=1771556 before the paunch patch will work correctly.
changing to modify because we blocked to verify it.
*** Bug 1571321 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:4335