Bug 867441

Summary: dhcp4 problem with dnsmasq => 2.61
Product: [Fedora] Fedora Reporter: Gene Czarcinski <gczarcinski>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 17CC: berrange, clalancette, crobinso, eblake, itamar, jforbes, jyang, laine, libvirt-maint, veillard, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-24 12:48:07 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
add --interface to dnsmasq command line none

Description Gene Czarcinski 2012-10-17 10:15:24 EDT
Description of problem:

Recently, I have been working with the dnsmasq developer on fixing a couple of dhcp6 related problems.  In the process, we discovered that there is a problem with the way libvirt runs dnsmasq as a dhcp4 server.  This issue became urgent for libvirt when dnsmasq in F17 was updated from 2.59 to 2.63.

First, to have this problem, you must be running multiple dnsmasqs with each one supporting a different IPv4 network.  While --bind-interfaces is specified, it appears that it will have no effect unless --interface is also specified.  The obvious fix is to add --interface <dev> to the command line. The relevnat information can be seen in this message:
http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2012q4/006418.html

Note: this issue has no effect with respect to dnsmasq's dhcp6 service in that it does handle/filter all packets.

I am pasting the the heart of the problem here:
----------------------------------------------
The problem is that when you have more than one instance of dnsmasq 
doing DHCP. Each instance is listening on *:67. Now, a packet arrives 
for port 67 on a particular interface. How is the kernel supposed know 
which instance of dnsmasq to send it to? It can't and sometimes gets it 
wrong. This is normally masked because DHCP clients fall back to 
broadcast, and they get sent to _all_ the listeners, (check the bug 
report I referenced) but there are situations were this fails.

For DNS, with --bind-interfaces, there isn't a problem, because when 
dnsmasq is configured with --interface or --listen-address then port 53 
is bound to a particular address, not the wildcard address. DHCP always 
binds the wildcard address (there are some strange packets in a DHCP 
exchange that get missed otherwise.) As we've seen, --interface or 
--listen-address is an access control mechanism in the DHCP code: 
recieve all packets and filter.

The change in 2.61 is that when dnsmasq is configured with exactly one 
--interface, it calls an obscure Linux-only socket option, 
SO_BINDTODEVICE on the DHCP socket (which is bound to *:67). That has 
the effect of getting the right packets to the right dnsmasq instance. 
It only works for exactly one --interface (otherwise, dnsmasq would have 
to start handling multiple DHCP sockets - a big change.)

The SO_BINDTODEVICE stuff only works with --interface, not 
--listen-address, hence the desirability of moving libvirt from 
--listen-address to --interface.

THis stuff is all horrible, a legacy of the LSD-inspired Berkeley 
sockets API. dnsmasq was originally intended to be run as one daemon on 
a machine, handling multiple interfaces. Adapting to the 
one-dnsmasq-per-interface paradigm has been a long hard road.
---------------------------------------------------
Comment 1 Gene Czarcinski 2012-10-18 16:54:57 EDT
Created attachment 629650 [details]
add --interface to dnsmasq command line

I have not had a chance yet to put this through the git process but this patch should correct the problem.  

Basically, as described by the dnsmasq developer: "The problem is that, without SO_BINDTODEVICE, there is no guarantee that the kernel will route DHCP (v4 or v6) packets to the correct instance of dnsmasq, when there is more than one."

The --interface parameter is added to the command line and nothing is removed.
Comment 2 Gene Czarcinski 2012-10-19 07:53:40 EDT
patch submitted to upstream git
Comment 3 Eric Blake 2012-10-19 12:17:05 EDT
(In reply to comment #2)
> patch submitted to upstream git

https://www.redhat.com/archives/libvir-list/2012-October/msg01042.html
Comment 4 Gene Czarcinski 2012-10-24 12:48:07 EDT
This is being close since it turns out to be not a problem for libvirt.  Here is what Simon Kelley (dnsmasq developer) final statement about the problem:

http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2012q4/006445.html
-----------------------------------------------------------------------
OK, so this is vaguely embarrassing. Having checked the actual code,
rather than the changelog, I see that dnsmasq >=2.61 _already_ does the
right thing. Setting --bind-interfaces* and a single --listen-address
will cause the code to set SO_BINDTODEVICE on the DHCP socket(s).

So, there is not a problem with the existing libvirt command line.

Gene, apologies for sending you on a wild-goose chase with this.

* or bind-dyanmic on 2.63 and later.

Cheers,

Simon.
--------------------------------------------------------------------------