Bug 1758547 - Stale namespace entries & neutron-netns-cleanup is failing
Summary: Stale namespace entries & neutron-netns-cleanup is failing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-paunch
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z10
: 13.0 (Queens)
Assignee: Rodolfo Alonso
QA Contact: Eran Kuris
URL:
Whiteboard:
: 1571321 (view as bug list)
Depends On: 1760041 1771556 1772400
Blocks: 1771563
TreeView+ depends on / blocked
 
Reported: 2019-10-04 13:32 UTC by Ravi Singh
Modified: 2024-06-13 22:15 UTC (History)
38 users (show)

Fixed In Version: python-paunch-2.5.0-8.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1771563 (view as bug list)
Environment:
Last Closed: 2019-12-20 16:13:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1616268 0 None None None 2019-10-04 14:18:54 UTC
RDO 23669 0 None None None 2019-11-12 15:30:54 UTC
RDO 23676 0 None None None 2019-11-12 22:28:56 UTC
Red Hat Issue Tracker OSP-3354 0 None None None 2022-08-23 16:23:51 UTC
Red Hat Knowledge Base (Solution) 4484981 0 None None None 2019-10-08 16:12:33 UTC
Red Hat Product Errata RHBA-2019:4335 0 None None None 2019-12-20 16:14:02 UTC

Internal Links: 1760041

Description Ravi Singh 2019-10-04 13:32:47 UTC
Description of problem:

One of our cu has reported the problem of stale namespace entries in there OSP-13 env due to which original namespace entries are not accessible.
I do see similar problem in my lab env too & it seems we have a possible WA of controller reboot.
But cu is looking for reason and permanent fix of this.

Here is example from my env:-

~~~
[heat-admin@overcloud-controller-0 ~]$ sudo ip netns
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
qrouter-2d96d97f-a5ba-42f6-8f2d-dcff61fa8d71 (id: 2)
RTNETLINK answers: Invalid argument
qdhcp-09188fe6-70bc-4f76-b12b-bcb425a86acf
RTNETLINK answers: Invalid argument
qrouter-71e1cdce-fe55-4806-a33a-2374f7efa62f
[heat-admin@overcloud-controller-0 ~]$ sudo -i
[root@overcloud-controller-0 ~]# 
[root@overcloud-controller-0 ~]# 
[root@overcloud-controller-0 ~]# 
[root@overcloud-controller-0 ~]# docker ps^C
[root@overcloud-controller-0 ~]# docker exec -it -u root neutron_l3_agent /bin/bash
()[root@overcloud-controller-0 /]# 
()[root@overcloud-controller-0 /]# neutron-netns-cleanup
2019-10-04 11:37:03.571 960732 INFO neutron.common.config [-] Logging enabled!
2019-10-04 11:37:03.572 960732 INFO neutron.common.config [-] /usr/bin/neutron-netns-cleanup version 12.0.6
2019-10-04 11:37:03.574 960732 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', 'privsep-helper', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpHUf_ML/privsep.sock']
2019-10-04 11:37:04.660 960732 INFO oslo.privsep.daemon [-] Spawned new privsep daemon via rootwrap
2019-10-04 11:37:04.581 960978 INFO oslo.privsep.daemon [-] privsep daemon starting
2019-10-04 11:37:04.597 960978 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
2019-10-04 11:37:04.606 960978 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/none
2019-10-04 11:37:04.606 960978 INFO oslo.privsep.daemon [-] privsep daemon running as pid 960978
2019-10-04 11:37:04.929 960732 ERROR neutron.agent.linux.utils [-] Exit code: 1; Stdin: ; Stdout: ; Stderr: setting the network namespace "qdhcp-09188fe6-70bc-4f76-b12b-bcb425a86acf" failed: Invalid argument

[root@overcloud-controller-0 /]# exit
exit
[root@overcloud-controller-0 ~]# ip netns
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
qrouter-2d96d97f-a5ba-42f6-8f2d-dcff61fa8d71 (id: 2)
RTNETLINK answers: Invalid argument
qdhcp-09188fe6-70bc-4f76-b12b-bcb425a86acf
RTNETLINK answers: Invalid argument
qrouter-71e1cdce-fe55-4806-a33a-2374f7efa62f
~~~

Post reboot:-

~~~
[heat-admin@overcloud-controller-0 ~]$ sudo ip netns list
qrouter-2d96d97f-a5ba-42f6-8f2d-dcff61fa8d71 (id: 1)
qdhcp-7e1e0798-610f-43b9-88e2-9bcb36dc7264 (id: 0)
~~~

Version-Release number of selected component (if applicable):
OSP13

~~~
Neutron packages from cu env

[ravsingh@supportshell 02487105]$ less 10-sosreport-1007834-controller03-02487105-2019-10-04-evuikva.tar.xz/sosreport-1007834-controller03-02487105-2019-10-04-evuikva/installed-rpms  |grep -i neutron
openstack-neutron-12.0.6-10.el7ost.noarch                   Sat Sep 14 05:47:44 2019
openstack-neutron-common-12.0.6-10.el7ost.noarch            Sat Sep 14 05:47:43 2019
openstack-neutron-l2gw-agent-12.0.2-0.20180412115803.a9f8009.el7ost.noarch Wed Aug  7 00:05:23 2019
openstack-neutron-lbaas-12.0.1-0.20181019202917.b9b6b6a.el7ost.noarch Sat Sep 14 05:47:58 2019
openstack-neutron-lbaas-ui-4.0.1-0.20181115043347.7f2010d.el7ost.noarch Wed Aug  7 00:05:42 2019
openstack-neutron-linuxbridge-12.0.6-10.el7ost.noarch       Sat Sep 14 05:47:59 2019
openstack-neutron-metering-agent-12.0.6-10.el7ost.noarch    Sat Sep 14 05:47:59 2019
openstack-neutron-ml2-12.0.6-10.el7ost.noarch               Sat Sep 14 05:47:44 2019
openstack-neutron-openvswitch-12.0.6-10.el7ost.noarch       Sat Sep 14 05:47:59 2019
openstack-neutron-sriov-nic-agent-12.0.6-10.el7ost.noarch   Sat Sep 14 05:47:59 2019
puppet-neutron-12.4.1-7.el7ost.noarch                       Wed Aug  7 00:09:03 2019
python2-neutronclient-6.7.0-1.el7ost.noarch                 Wed Aug  7 00:03:57 2019
python2-neutron-lib-1.13.0-1.el7ost.noarch                  Wed Aug  7 00:03:59 2019
python-neutron-12.0.6-10.el7ost.noarch                      Sat Sep 14 05:47:43 2019
python-neutron-lbaas-12.0.1-0.20181019202917.b9b6b6a.el7ost.noarch Sat Sep 14 05:47:44 2019



Packages from my lab env

~~~
[root@overcloud-controller-1 ~]# rpm -qa | grep -i neutron
python2-neutronclient-6.7.0-1.el7ost.noarch
python-neutron-lbaas-12.0.1-0.20181019202915.b9b6b6a.el7ost.noarch
openstack-neutron-metering-agent-12.0.5-11.el7ost.noarch
openstack-neutron-lbaas-ui-4.0.1-0.20181115043347.7f2010d.el7ost.noarch
python-neutron-12.0.5-11.el7ost.noarch
openstack-neutron-12.0.5-11.el7ost.noarch
openstack-neutron-l2gw-agent-12.0.2-0.20180412115803.a9f8009.el7ost.noarch
openstack-neutron-ml2-12.0.5-11.el7ost.noarch
openstack-neutron-openvswitch-12.0.5-11.el7ost.noarch
openstack-neutron-sriov-nic-agent-12.0.5-11.el7ost.noarch
python2-neutron-lib-1.13.0-1.el7ost.noarch
puppet-neutron-12.4.1-5.el7ost.noarch
openstack-neutron-common-12.0.5-11.el7ost.noarch
openstack-neutron-lbaas-12.0.1-0.20181019202915.b9b6b6a.el7ost.noarch
openstack-neutron-linuxbridge-12.0.5-11.el7ost.noarch
~~~

Do let us know if further inputs are required.
Sosreport from controller is available on case.

How reproducible:
reproduced in lab

Steps to Reproduce:
1.
2.
3.

Actual results:
Stale namespace entries

Expected results:

should not have stale entries.
Additional info:

Although it's doesn't seems to be urgent issue but cu wants resolution on priority.
So setting BZ priority as case priority.

Comment 1 David Hill 2019-10-04 14:16:44 UTC
kernel-3.10.0-1062.1.1.el7.x86_64                           Sat Sep 14 05:48:05 2019

Comment 5 Jamie Bainbridge 2019-10-07 00:26:25 UTC
(In reply to Ravi Singh from comment #0)
> How reproducible:
> reproduced in lab
> 
> Steps to Reproduce:
> 1.
> 2.
> 3.

Please provide instructions on how to reproduce.

Devel can't fix something if we don't describe how to actually get the system into an error state.

Comment 6 Ravi Singh 2019-10-07 05:53:07 UTC
Hi Jamie,

I am not sure how to reproduce this..since once customer reported issue..I saw the same in my environment & opened up this BZ.
Please let me know if you want to have a look on my env.

Comment 7 Jamie Bainbridge 2019-10-07 06:06:42 UTC
I suggest you continue to work on it, so that you can provide a set of steps which reproduce the issue from boot.

Comment 18 David Hill 2019-10-09 12:47:17 UTC
Possible workaround for this issue:

docker restart neutron_l3_agent; sleep 2; for i in $(ip netns 2>/dev/null | grep -v "id:" | sort); do docker exec -it -u 0 neutron_l3_agent ip netns delete $i; sleep 1; done; ip netns; docker restart neutron_l3_agent ; ip netns

Comment 46 Cristian Muresanu 2019-11-09 17:18:48 UTC
CU update
~~~

Hi Tim, David,

We've been able to reproduce the issue without CrowdStrike installed. So we can disregard that as being a factor for now.

Simply running the tempest neutron test scenarios is enough to see the namespaces go stale [1]. We also updated to z9 last night, it's the same issue on that release.

[1]
tempest run -r neutron
~~~

Comment 57 Alex Schultz 2019-11-12 15:30:26 UTC
In tripleo, we've worked around this by creating a service that is started on boot which creates a placeholder namespace to ensure the folders are created with the created shared nature. This patch was landed in the Stein timeframe (https://review.rdoproject.org/r/#/c/17078/) in the paunch packaging. This patch assumes an updated version of iproute2 and pyroute2 which have the fixes for the shared nature.  I've proposed a backport of this patch for upstream Queens/Rocky which already have the updated iproute packaging.  I've taking this bug over for the paunch patch get it landed. We need to have https://bugzilla.redhat.com/show_bug.cgi?id=1771556 before the paunch patch will work correctly.

Comment 73 Eran Kuris 2019-11-20 14:00:30 UTC
changing to modify because we blocked to verify it.

Comment 80 Nate Johnston 2019-12-17 18:10:19 UTC
*** Bug 1571321 has been marked as a duplicate of this bug. ***

Comment 81 Rodolfo Alonso 2019-12-17 18:13:01 UTC
*** Bug 1571321 has been marked as a duplicate of this bug. ***

Comment 83 errata-xmlrpc 2019-12-20 16:13:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:4335


Note You need to log in before you can comment on or make changes to this bug.