Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1227638

Summary: DHCPNAK after neutron-dhcp-agent restart
Product: Red Hat OpenStack Reporter: Nir Magnezi <nmagnezi>
Component: openstack-neutronAssignee: Nir Magnezi <nmagnezi>
Status: CLOSED ERRATA QA Contact: Toni Freger <tfreger>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.0 (RHEL 7)CC: chrisw, ihrachys, jschluet, lpeer, mlopes, nyechiel, oblaut, tfreger, yeylon, ykawada
Target Milestone: z5Keywords: ZStream
Target Release: 5.0 (RHEL 7)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-2014.1.4-4.el7ost openstack-neutron-2014.1.4-5.el6ost Doc Type: Bug Fix
Doc Text:
Previously, dnsmasq did not save lease information in persistent storage, and when it was restarted, the lease information was lost. This behavior was a result of the removal of the dnsmasq '--dhcp-script' option under BZ#1202392. As a result, instances were stuck in the network boot process for a long period of time. In addition, NACK messages were noted in the dnsmasq log. This update addresses this issue by removing the authoritative option, so that NAKs are not sent in response to DHCPREQUESTs to other servers. This change is expected to prevent dnsmasq from NAKing clients renewing leases issued before it was restarted/rescheduled, with the result that no DHCPNAK messages can be found in the log files.
Story Points: ---
Clone Of: 1227633 Environment:
Last Closed: 2015-09-10 11:52:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nir Magnezi 2015-06-03 08:19:48 UTC
+++ This bug was initially created as a clone of Bug #1227633 +++

Description of problem:
=======================

After rolling out a configuration change, we restarted neutron-dhcp-agent service, and then dnsmasq logs start flooding: DHCPNAK ... lease not found.
DHCPNAK is replied by dnsmasq for all DHCPREQUEST renews from all VMs. However the MAC and IP pairs exist in host files.
The log flooding increases when more and more VMs start renewing and they keep retrying until IP expire and send DHCPDISCOVER and reinit the IP.
The log flooding gradually disappears when the VMs IP expire and send DHCPDISCOVER, to which dnsmasq respond DHCPOFFER properly.

Analysis:
=========
I noticed that option --leasefile-ro is used in dnsmasq command when started by neutron dhcp-agent. According to dnsmasq manual, this option should be used together with --dhcp-script to customize the lease database. However, the option --dhcp-script was removed when fixing bug 1202392.
Because of this, dnsmasq will not save lease information in persistent storage, and when it is restarted, lease information is lost.

Solution:
=========
Simply replace --leasefile-ro by --dhcp-leasefile=<path to dhcp runtime files>/lease would solve the problem. (patch attached)

Comment 6 Toni Freger 2015-09-01 09:28:36 UTC
Verified via suggested steps on Rhel7 

python-neutron-2014.1.5-2.el7ost.noarch
openstack-neutron-openvswitch-2014.1.5-2.el7ost.noarch
python-neutronclient-2.3.4-3.el7ost.noarch
openstack-neutron-ml2-2014.1.5-2.el7ost.noarch
openstack-neutron-2014.1.5-2.el7ost.noarch

Comment 9 Ihar Hrachyshka 2015-09-09 09:07:00 UTC
dnsmasq won't assign the same IP to another VM since it uses static configuration based on fixed IPs of the ports. The only issue you get now is that when dnsmasq is restarted, it will initially reply with NAK on renew attempt, but then on next attempt it will lease the IP address correctly. So the issue is merely short downtime while instances request new lease (which will be the same as used before).

To avoid the issue, you can indeed instruct your instances not to renew at all, but that may be a problem once you need to update your IP addresses for ports: those updates won't ever get to instances until they reboot.

Comment 11 errata-xmlrpc 2015-09-10 11:52:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1754.html