Bug 1051028 - neutron-dhcp-agent doesn't clean after itself when service is shut down
Summary: neutron-dhcp-agent doesn't clean after itself when service is shut down
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 4.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z4
: 4.0
Assignee: Miguel Angel Ajo
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On: 1062685 1173435
Blocks: 1051036 RHEL-OSP_Neutron_HA
TreeView+ depends on / blocked
 
Reported: 2014-01-09 15:10 UTC by Miguel Angel Ajo
Modified: 2022-07-09 06:16 UTC (History)
6 users (show)

Fixed In Version: openstack-neutron-2013.2.2-9.el6ost
Doc Type: Bug Fix
Doc Text:
Cause: the neutron dhcp agent is known not to clean up resources (netns, dnsmasq processes, etc.) when the service is stopped. This is a feature intended to allow upgrades to the agent without service disruption. Consequence: When trying remove a node from the cluster, and stop the services, the dhcp services/resources will remain active, but will get updated as soon as there are changes to the served tenant networks. Fix: Added the neutron-netns-cleanup init script to allow cleanup of the dhcp service resources as needed. Result: The resources can be cleaned up now by running the script.
Clone Of:
: 1051036 (view as bug list)
Environment:
Last Closed: 2014-05-29 20:18:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1115999 0 None None None Never
Launchpad 1273095 0 None None None Never
Red Hat Product Errata RHSA-2014:0516 0 normal SHIPPED_LIVE Moderate: openstack-neutron security, bug fix, and enhancement update 2014-05-30 00:15:59 UTC

Description Miguel Angel Ajo 2014-01-09 15:10:20 UTC
Description of problem:

  neutron-dhcp-agent can start several child processes per network: 
   * dnsmasq
   * neutron-ns-metadata-proxy (for isolated networks only)

  and it does create a qdhcp-xxxx namespace for every tenant network
  it handles.

  When shutting down the service:

# service neutron-dhcp-agent stop

  The namespaces are left behind:

[root@neutron ~]#  ip netns | grep qdhcp
qdhcp-0a3e917c-04c5-4ea2-b21b-20beb367c8e3
qdhcp-0318c4e7-71c5-4f18-b8dd-f1fd8fb7fff2


[root@neutron ~]# ps fax | grep dnsmasq
 4682 pts/0    S+     0:00          \_ grep dnsmasq
 4439 ?        S      0:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap320ee63c-c6 --except-interface=lo --pid-file=/var/lib/neutron/dhcp/0a3e917c-04c5-4ea2-b21b-20beb367c8e3/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/0a3e917c-04c5-4ea2-b21b-20beb367c8e3/host --dhcp-optsfile=/var/lib/neutron/dhcp/0a3e917c-04c5-4ea2-b21b-20beb367c8e3/opts --leasefile-ro --dhcp-range=tag0,192.168.99.0,static,120s --dhcp-lease-max=256 --conf-file= --domain=openstacklocal
 4441 ?        S      0:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap3ce8b56f-18 --except-interface=lo --pid-file=/var/lib/neutron/dhcp/0318c4e7-71c5-4f18-b8dd-f1fd8fb7fff2/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/0318c4e7-71c5-4f18-b8dd-f1fd8fb7fff2/host --dhcp-optsfile=/var/lib/neutron/dhcp/0318c4e7-71c5-4f18-b8dd-f1fd8fb7fff2/opts --leasefile-ro --dhcp-range=tag0,192.168.50.0,static,120s --dhcp-lease-max=256 --conf-file= --domain=openstacklocal


Also the ovs ports are left behind:
[root@neutron ~]# ovs-vsctl show
ca839344-5f3e-44c2-a241-93ea60538ed3
    Bridge br-int
        Port "tap320ee63c-c6"
            Interface "tap320ee63c-c6" <<<<<<<<<<<<<<
                type: internal
        Port br-int
            Interface br-int
                type: internal
        Port "tap3ce8b56f-18"
            Interface "tap3ce8b56f-18" <<<<<<<<<<<<<<
                type: internal
    Bridge br-tun
        Port br-tun
            Interface br-tun
                type: internal
    ovs_version: "1.11.0"


Version-Release number of selected component (if applicable):
 openstack-neutron-2013.2-16.el6ost.noarch
or
 openstack-neutron-2014.1-0.1.b1.el6.noarch

How reproducible:
  
  Always.

Steps to Reproduce:
1. start the service
2. setup some tenant networks, with a VM connected to them so the services and namespaces are actually created.

3. stop the service.

Actual results:

 dnsmasq or neutron-ns-metadata-proxy child processes are left.
 qdhcp-* network namespaces are left.
 ovs tap ports for the dnsmasq server are left.


Expected results:

 all child process are terminated.
 qdhcp-* network namespaces are cleaned up.
 the ovs ports are cleaned up

  
Additional info:
  
  In a simple instalation this wouldn't be a problem, but for HA setups, we need to stop services in one node, and start them in a different one, without interfering to each other.

  In this situation, an unmanaged dnsmasq is left connected to the network, and also a namespace metadata proxy.

Comment 2 Miguel Angel Ajo 2014-01-10 20:42:39 UTC
This is related, it seems that the settings are left by-design, and we should use neutron-netns-cleanup in the neutron-l3-agent init.d script at stop, or right after stop of l3-agent. But It has parameters, I'm checking it.


https://bugs.launchpad.net/neutron/+bug/1115999

Comment 3 Miguel Angel Ajo 2014-01-27 07:44:19 UTC
Launchpad bug #1115999 prevents from properly cleaning the metadata-proxies in namespaces (qdhcp or qrouter), that needs to be fixed to have a workaround here.

Comment 4 Miguel Angel Ajo 2014-01-27 08:12:43 UTC
Launchpad bug#1273095 prevents from properly selecting which kind of namespace we want to cleanup (dhcp or l3-agent).

Comment 5 Miguel Angel Ajo 2014-02-03 11:48:17 UTC
It seems that the neutron-netns-cleanup is broken 

1) it doesn't have an /etc/init.d/neutron-netns-cleanup script as ovs has

2) it fails on invocation
# neutron-netns-cleanup  --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/dhcp_agent.ini  --debug --force 
2014-02-03 11:31:36.046 4848 INFO neutron.common.config [-] Logging enabled!
2014-02-03 11:31:36.046 4848 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'list'] execute /usr/lib/python2.6/site-packages/neutron/agent/linux/utils.py:43
2014-02-03 11:31:36.193 4848 DEBUG neutron.agent.linux.utils [-] 
Command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'list']
Exit code: 0
Stdout: 'qdhcp-f742e733-672e-4e76-8003-034185564a90\nqdhcp-b38c071a-508d-44bb-8359-c2e694bb6f9b\nqdhcp-7bfe4972-8c59-4f0b-8634-3420acd75844\nqdhcp-e1fde457-5fb8-4965-b64b-248da99f7a8d\nqrouter-7c82790b-e1e6-4451-a772-fb4f39117d5a\n'
Stderr: '' execute /usr/lib/python2.6/site-packages/neutron/agent/linux/utils.py:60
Error importing interface driver 'neutron.agent.linux.interface.OVSInterfaceDriver': no such option: ovs_use_veth

Comment 6 Miguel Angel Ajo 2014-02-07 17:47:32 UTC
There is another bug in iproute rhbz#1062685 that prevents netns deletion from working. It has a fix, and it's tested.

Comment 7 Miguel Angel Ajo 2014-02-20 15:33:40 UTC

using /etc/init.d/neutron-netns-forced-cleanup start 
cleans up the network namespaces and all internal iptable rules + interfaces,

the fix up is provided in this repo:
http://file.rdu.redhat.com/~majopela/neutron-ha-fixes-bz-1051028-and-36-cleanup/

neutron needs to be patched (netns_cleanup script).

Comment 8 Miguel Angel Ajo 2014-02-20 15:40:45 UTC
for patches & scripts, please refer to 
https://bugzilla.redhat.com/attachment.cgi?bugid=1051036

Comment 11 Miguel Angel Ajo 2014-04-10 09:20:04 UTC
When the neutron-netns-cleanup init.d script is installed (via pacemaker, or via normal init.d script installation) it will, during startup, clean up any empty namespaces (with no resources inside: processes, ports, etc), and when stopped  it will clean up all resources forced.

Stop should happen in three conditions: 
  1) When the node is set off the cluster 
  2) When the neutron-agent resources are took off the node.
  3) If the neutron-netns-cleanup script is installed as a service it will clean up all netns namespaces during reboot/poweroff/halt or leaving the programmed runlevels.

Comment 12 Ofer Blaut 2014-04-22 08:58:25 UTC
I have tested that when service neutron-netns-cleanup stop used netns are cleaned 

The stop conditions are HA related and not script one.

openstack-neutron-2013.2.3-4.el6ost.noarch


[root@puma05 ~]# ip netns
qdhcp-a76e98a5-7ae3-4f91-b721-4f81cebcfa6f
qdhcp-6dcaa203-e61a-4003-a1fe-95d60853516f
qrouter-15ef1247-b52a-43fc-bfa2-27478dbfe1f3

[root@puma05 ~]# service neutron-netns-cleanup stop
[root@puma05 ~]# ip netns
[root@puma05 ~]# 
[root@puma05 ~]# 
[root@puma05 ~]# service neutron-netns-cleanup start
[root@puma05 ~]# ip netns 
[root@puma05 ~]# openstack-status 
== neutron services ==
neutron-server:                         inactive  (disabled on boot)
neutron-dhcp-agent:                     active
neutron-l3-agent:                       active
neutron-metadata-agent:                 active
neutron-lbaas-agent:                    inactive  (disabled on boot)
neutron-openvswitch-agent:              active
== Support services ==
openvswitch:                            active
messagebus:                             active
[root@puma05 ~]# service neutron-dhcp-agent restart
Stopping neutron-dhcp-agent:                               [  OK  ]
Starting neutron-dhcp-agent:                               [  OK  ]
[root@puma05 ~]# service neutron-l3-agent restart
Stopping neutron-l3-agent:                                 [  OK  ]
Starting neutron-l3-agent:                                 [  OK  ]
[root@puma05 ~]# ip netns 
qdhcp-a76e98a5-7ae3-4f91-b721-4f81cebcfa6f
qdhcp-6dcaa203-e61a-4003-a1fe-95d60853516f
[root@puma05 ~]# ip netns 
qdhcp-a76e98a5-7ae3-4f91-b721-4f81cebcfa6f
qdhcp-6dcaa203-e61a-4003-a1fe-95d60853516f
qrouter-15ef1247-b52a-43fc-bfa2-27478dbfe1f3

Comment 14 errata-xmlrpc 2014-05-29 20:18:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0516.html


Note You need to log in before you can comment on or make changes to this bug.