Description of problem: neutron-dhcp-agent can start several child processes per network: * dnsmasq * neutron-ns-metadata-proxy (for isolated networks only) and it does create a qdhcp-xxxx namespace for every tenant network it handles. When shutting down the service: # service neutron-dhcp-agent stop The namespaces are left behind: [root@neutron ~]# ip netns | grep qdhcp qdhcp-0a3e917c-04c5-4ea2-b21b-20beb367c8e3 qdhcp-0318c4e7-71c5-4f18-b8dd-f1fd8fb7fff2 [root@neutron ~]# ps fax | grep dnsmasq 4682 pts/0 S+ 0:00 \_ grep dnsmasq 4439 ? S 0:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap320ee63c-c6 --except-interface=lo --pid-file=/var/lib/neutron/dhcp/0a3e917c-04c5-4ea2-b21b-20beb367c8e3/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/0a3e917c-04c5-4ea2-b21b-20beb367c8e3/host --dhcp-optsfile=/var/lib/neutron/dhcp/0a3e917c-04c5-4ea2-b21b-20beb367c8e3/opts --leasefile-ro --dhcp-range=tag0,192.168.99.0,static,120s --dhcp-lease-max=256 --conf-file= --domain=openstacklocal 4441 ? S 0:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap3ce8b56f-18 --except-interface=lo --pid-file=/var/lib/neutron/dhcp/0318c4e7-71c5-4f18-b8dd-f1fd8fb7fff2/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/0318c4e7-71c5-4f18-b8dd-f1fd8fb7fff2/host --dhcp-optsfile=/var/lib/neutron/dhcp/0318c4e7-71c5-4f18-b8dd-f1fd8fb7fff2/opts --leasefile-ro --dhcp-range=tag0,192.168.50.0,static,120s --dhcp-lease-max=256 --conf-file= --domain=openstacklocal Also the ovs ports are left behind: [root@neutron ~]# ovs-vsctl show ca839344-5f3e-44c2-a241-93ea60538ed3 Bridge br-int Port "tap320ee63c-c6" Interface "tap320ee63c-c6" <<<<<<<<<<<<<< type: internal Port br-int Interface br-int type: internal Port "tap3ce8b56f-18" Interface "tap3ce8b56f-18" <<<<<<<<<<<<<< type: internal Bridge br-tun Port br-tun Interface br-tun type: internal ovs_version: "1.11.0" Version-Release number of selected component (if applicable): openstack-neutron-2013.2-16.el6ost.noarch or openstack-neutron-2014.1-0.1.b1.el6.noarch How reproducible: Always. Steps to Reproduce: 1. start the service 2. setup some tenant networks, with a VM connected to them so the services and namespaces are actually created. 3. stop the service. Actual results: dnsmasq or neutron-ns-metadata-proxy child processes are left. qdhcp-* network namespaces are left. ovs tap ports for the dnsmasq server are left. Expected results: all child process are terminated. qdhcp-* network namespaces are cleaned up. the ovs ports are cleaned up Additional info: In a simple instalation this wouldn't be a problem, but for HA setups, we need to stop services in one node, and start them in a different one, without interfering to each other. In this situation, an unmanaged dnsmasq is left connected to the network, and also a namespace metadata proxy.
This is related, it seems that the settings are left by-design, and we should use neutron-netns-cleanup in the neutron-l3-agent init.d script at stop, or right after stop of l3-agent. But It has parameters, I'm checking it. https://bugs.launchpad.net/neutron/+bug/1115999
Launchpad bug #1115999 prevents from properly cleaning the metadata-proxies in namespaces (qdhcp or qrouter), that needs to be fixed to have a workaround here.
Launchpad bug#1273095 prevents from properly selecting which kind of namespace we want to cleanup (dhcp or l3-agent).
It seems that the neutron-netns-cleanup is broken 1) it doesn't have an /etc/init.d/neutron-netns-cleanup script as ovs has 2) it fails on invocation # neutron-netns-cleanup --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/dhcp_agent.ini --debug --force 2014-02-03 11:31:36.046 4848 INFO neutron.common.config [-] Logging enabled! 2014-02-03 11:31:36.046 4848 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'list'] execute /usr/lib/python2.6/site-packages/neutron/agent/linux/utils.py:43 2014-02-03 11:31:36.193 4848 DEBUG neutron.agent.linux.utils [-] Command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'list'] Exit code: 0 Stdout: 'qdhcp-f742e733-672e-4e76-8003-034185564a90\nqdhcp-b38c071a-508d-44bb-8359-c2e694bb6f9b\nqdhcp-7bfe4972-8c59-4f0b-8634-3420acd75844\nqdhcp-e1fde457-5fb8-4965-b64b-248da99f7a8d\nqrouter-7c82790b-e1e6-4451-a772-fb4f39117d5a\n' Stderr: '' execute /usr/lib/python2.6/site-packages/neutron/agent/linux/utils.py:60 Error importing interface driver 'neutron.agent.linux.interface.OVSInterfaceDriver': no such option: ovs_use_veth
There is another bug in iproute rhbz#1062685 that prevents netns deletion from working. It has a fix, and it's tested.
using /etc/init.d/neutron-netns-forced-cleanup start cleans up the network namespaces and all internal iptable rules + interfaces, the fix up is provided in this repo: http://file.rdu.redhat.com/~majopela/neutron-ha-fixes-bz-1051028-and-36-cleanup/ neutron needs to be patched (netns_cleanup script).
for patches & scripts, please refer to https://bugzilla.redhat.com/attachment.cgi?bugid=1051036
When the neutron-netns-cleanup init.d script is installed (via pacemaker, or via normal init.d script installation) it will, during startup, clean up any empty namespaces (with no resources inside: processes, ports, etc), and when stopped it will clean up all resources forced. Stop should happen in three conditions: 1) When the node is set off the cluster 2) When the neutron-agent resources are took off the node. 3) If the neutron-netns-cleanup script is installed as a service it will clean up all netns namespaces during reboot/poweroff/halt or leaving the programmed runlevels.
I have tested that when service neutron-netns-cleanup stop used netns are cleaned The stop conditions are HA related and not script one. openstack-neutron-2013.2.3-4.el6ost.noarch [root@puma05 ~]# ip netns qdhcp-a76e98a5-7ae3-4f91-b721-4f81cebcfa6f qdhcp-6dcaa203-e61a-4003-a1fe-95d60853516f qrouter-15ef1247-b52a-43fc-bfa2-27478dbfe1f3 [root@puma05 ~]# service neutron-netns-cleanup stop [root@puma05 ~]# ip netns [root@puma05 ~]# [root@puma05 ~]# [root@puma05 ~]# service neutron-netns-cleanup start [root@puma05 ~]# ip netns [root@puma05 ~]# openstack-status == neutron services == neutron-server: inactive (disabled on boot) neutron-dhcp-agent: active neutron-l3-agent: active neutron-metadata-agent: active neutron-lbaas-agent: inactive (disabled on boot) neutron-openvswitch-agent: active == Support services == openvswitch: active messagebus: active [root@puma05 ~]# service neutron-dhcp-agent restart Stopping neutron-dhcp-agent: [ OK ] Starting neutron-dhcp-agent: [ OK ] [root@puma05 ~]# service neutron-l3-agent restart Stopping neutron-l3-agent: [ OK ] Starting neutron-l3-agent: [ OK ] [root@puma05 ~]# ip netns qdhcp-a76e98a5-7ae3-4f91-b721-4f81cebcfa6f qdhcp-6dcaa203-e61a-4003-a1fe-95d60853516f [root@puma05 ~]# ip netns qdhcp-a76e98a5-7ae3-4f91-b721-4f81cebcfa6f qdhcp-6dcaa203-e61a-4003-a1fe-95d60853516f qrouter-15ef1247-b52a-43fc-bfa2-27478dbfe1f3
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0516.html