Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1103461

Summary: When deleting router/networks, netns should be deleted by default.
Product: Red Hat OpenStack Reporter: Toni Freger <tfreger>
Component: kernelAssignee: Denys Vlasenko <dvlasenk>
Status: CLOSED WORKSFORME QA Contact: Ofer Blaut <oblaut>
Severity: medium Docs Contact:
Priority: high    
Version: 5.0 (RHEL 7)CC: amuller, chrisw, dvlasenk, jliberma, jshortt, kimi.zhang, lhh, lwang, majopela, mangelajo, mschuppe, nyechiel, oblaut, psimerda, rhos-maint, rkhan, rsussman, sputhenp, tfreger, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 5.0 (RHEL 7)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-14 18:58:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 743661, 1198800    

Description Toni Freger 2014-06-01 05:01:13 UTC
Even when the router/networks are deleted the netns still exists.
Netns should be deleted by default.

Comment 3 Ihar Hrachyshka 2014-06-02 08:39:33 UTC
openstack-netns-cleanup is not currently packaged as a service (ova-cleanup is though).

Comment 4 Ihar Hrachyshka 2014-06-02 08:40:54 UTC
Sorry, ovs-cleanup, not ova-cleanup.

Comment 5 Assaf Muller 2014-06-09 15:40:11 UTC
@Miguel: I thought we use netns-cleanup for our downstream HA solutions? Directly via the CLI then?

Comment 6 Assaf Muller 2014-06-09 15:45:31 UTC
Actually, scratch that, Ihar confused me :)

We don't need netns-cleanup, we just need to see if everything works properly when changing these two configuration values:

l3_agent.ini:
router_delete_namespaces = True

dhcp_agent.ini:
dhcp_delete_namespaces = True

@Ofer:
Instead of waiting for a downstream only patch that changes these two values to True by default, I think the best course of action is for QA/you to change these two values on the network node in your deployment and seeing if everything still works. If so, I'll send the patch. Note that I suspect that RHELOSP 5 on RHEL 6.5 will fail the verification, and RHELOSP 5 on RHEL 7 will pass! Could I ask you to test the proposed change on both platforms? It would influence the branch(es) that we'd push the patch to.

Comment 7 Toni Freger 2014-06-10 12:03:44 UTC
RHOS5 on RHEL7 the bug still reproduces, even though l3_agent and dhcp_agent are set to True.

The error trace is attached:

var/log/neutron/l3_agent.log

2014-06-10 14:05:33.291 23580 ERROR neutron.agent.l3_agent [-] Failed trying to delete namespace: qrouter-4937b089-5a8a-4766-a6e6-86bac1554d07
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent Traceback (most recent call last):
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3_agent.py", line 301, in _destroy_router_namespace
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent     ns_ip.netns.delete(namespace)
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 450, in delete
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent     self._as_root('delete', name, use_root_namespace=True)
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 217, in _as_root
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent     kwargs.get('use_root_namespace', False))
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 70, in _as_root
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent     namespace)
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 81, in _execute
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent     root_helper=root_helper)
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 76, in execute
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent     raise RuntimeError(m)
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent RuntimeError: 
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent Command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'delete', 'qrouter-4937b089-5a8a-4766-a6e6-86bac1554d07']
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent Exit code: 1
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent Stdout: ''
2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent Stderr: 'Cannot remove namespace file "/var/run/netns/qrouter-4937b089-5a8a-4766-a6e6-86bac1554d07": Device or resource busy\n'

Comment 8 Miguel Angel Ajo 2014-06-10 12:10:18 UTC
Ok, this error

2014-06-10 14:05:33.291 23580 TRACE neutron.agent.l3_agent Stderr: 'Cannot remove namespace file "/var/run/netns/qrouter-4937b089-5a8a-4766-a6e6-86bac1554d07": Device or resource busy\n'

Means that we're missing any of the netns patches in rhel7 kernel or iproute.

I believe this is not neutron specific, but iproute, or a kernel backport missing.

Comment 9 lpeer 2014-08-13 12:46:08 UTC
*** Bug 1060340 has been marked as a duplicate of this bug. ***

Comment 10 Miguel Angel Ajo 2014-08-13 12:58:26 UTC
We are still seeing  "Device or resource busy" sometimes when trying to delete a namespace, we suspect of iproute itself or the kernel for any missing netns related fix.

What conditions (else than failures) could trigger a "Device or resource busy" for an ip netns delete?

Comment 11 Pavel Šimerda (pavlix) 2014-09-01 09:51:38 UTC
The iproute tool is very simple and any error like that is almost certainly coming from the kernel.

Comment 12 Toni Freger 2015-03-02 08:58:30 UTC
Hi,

Any updates regarding this bug?

Thanks,
Toni

Comment 15 Denys Vlasenko 2015-09-24 19:03:11 UTC
Should be happening when a "ip netns delete NAME" is run. iproute2 code which handles this operation is straightforward:

static int on_netns_del(char *nsname, void *arg)
{
        char netns_path[PATH_MAX];
        snprintf(netns_path, sizeof(netns_path), "%s/%s", NETNS_RUN_DIR, nsname);
        umount2(netns_path, MNT_DETACH);
        if (unlink(netns_path) < 0) {
                fprintf(stderr, "Cannot remove namespace file \"%s\": %s\n",
                        netns_path, strerror(errno));
                return -1;
        }
        return 0;
}

So, indeed, "The iproute tool is very simple and any error like that is almost certainly coming from the kernel" is true.

Comment 20 Denys Vlasenko 2015-10-14 18:58:19 UTC
Ok. Closing this as WORKSFORME per Assaf suggestion.

Please reopen if you have a reproducer.
Please specify exact kernel version.