Bug 1227635

Summary: DHCPNAK after neutron-dhcp-agent restart
Product: Red Hat OpenStack Reporter: Nir Magnezi <nmagnezi>
Component: openstack-neutronAssignee: Nir Magnezi <nmagnezi>
Status: CLOSED ERRATA QA Contact: Toni Freger <tfreger>
Severity: high Docs Contact:
Priority: high    
Version: 6.0 (Juno)CC: amedeo.salvati, chrisw, ihrachys, lhh, lpeer, nyechiel, oblaut, pablo.iranzo, sauchter, tfreger, yeylon
Target Milestone: z4Keywords: ZStream
Target Release: 6.0 (Juno)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-2014.2.3-6.el7ost Doc Type: Bug Fix
Doc Text:
Previously, dnsmasq did not save lease information in persistent storage, and when it was restarted, the lease information was lost. This behavior was a result of the removal of the dnsmasq '--dhcp-script' option under BZ#1202392. As a result, instances were stuck in the network boot process for a long period of time. In addition, NACK messages were noted in the dnsmasq log. This update addresses this issue by removing the authoritative option, so that NAKs are not sent in response to DHCPREQUESTs to other servers. This change is expected tp prevent dnsmasq from NAKing clients renewing leases issued before it was restarted/rescheduled, with the result that no DHCPNAK messages can be found in the log files.
Story Points: ---
Clone Of: 1227633 Environment:
Last Closed: 2015-08-24 20:15:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nir Magnezi 2015-06-03 08:16:38 UTC
+++ This bug was initially created as a clone of Bug #1227633 +++

Description of problem:
=======================

After rolling out a configuration change, we restarted neutron-dhcp-agent service, and then dnsmasq logs start flooding: DHCPNAK ... lease not found.
DHCPNAK is replied by dnsmasq for all DHCPREQUEST renews from all VMs. However the MAC and IP pairs exist in host files.
The log flooding increases when more and more VMs start renewing and they keep retrying until IP expire and send DHCPDISCOVER and reinit the IP.
The log flooding gradually disappears when the VMs IP expire and send DHCPDISCOVER, to which dnsmasq respond DHCPOFFER properly.

Analysis:
=========
I noticed that option --leasefile-ro is used in dnsmasq command when started by neutron dhcp-agent. According to dnsmasq manual, this option should be used together with --dhcp-script to customize the lease database. However, the option --dhcp-script was removed when fixing bug 1202392.
Because of this, dnsmasq will not save lease information in persistent storage, and when it is restarted, lease information is lost.

Solution:
=========
Simply replace --leasefile-ro by --dhcp-leasefile=<path to dhcp runtime files>/lease would solve the problem. (patch attached)

Comment 8 Toni Freger 2015-08-17 10:36:03 UTC
Tested on Rhel7.1 puddle from 2015-08-13.1


dhcp-common-4.2.5-36.el7.x86_64
dhcp-libs-4.2.5-36.el7.x86_64

openstack-neutron-2014.2.3-9.el7ost.noarch
openstack-neutron-common-2014.2.3-9.el7ost.noarch
python-neutronclient-2.3.9-1.el7ost.noarch
python-neutron-2014.2.3-9.el7ost.noarch
openstack-neutron-openvswitch-2014.2.3-9.el7ost.noarch

The DHCP messages from /var/log/messages file of the agents.

Agent#1:

Aug 17 06:23:48 networker2 dnsmasq-dhcp[21035]: DHCPDISCOVER(tap29cff8bc-cc) fa:16:3e:a8:b9:1a
Aug 17 06:23:48 networker2 dnsmasq-dhcp[21035]: DHCPOFFER(tap29cff8bc-cc) 80.80.80.5 fa:16:3e:a8:b9:1a
Aug 17 06:23:48 networker2 dnsmasq-dhcp[21035]: DHCPREQUEST(tap29cff8bc-cc) 80.80.80.5 fa:16:3e:a8:b9:1a
Aug 17 06:23:48 networker2 dnsmasq-dhcp[21035]: DHCPACK(tap29cff8bc-cc) 80.80.80.5 fa:16:3e:a8:b9:1a host-80-80-80-5

Agent#2

Aug 17 06:23:49 networker1 dnsmasq-dhcp[28502]: DHCPDISCOVER(tap6b8cb71c-b4) fa:16:3e:a8:b9:1a
Aug 17 06:23:49 networker1 dnsmasq-dhcp[28502]: DHCPOFFER(tap6b8cb71c-b4) 80.80.80.5 fa:16:3e:a8:b9:1a

Comment 10 errata-xmlrpc 2015-08-24 20:15:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1680.html