Red Hat Bugzilla – Bug 850606
libvirtd should check for properly running dnsmasq on networks presumed "active" at startup (and start one if necessary)
Last modified: 2012-10-01 17:37:36 EDT
Description of problem:
When I start libvirtd with a "default" network, it does not spawn dnsmasq to serve DHCP requests
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Install libvirtd
2. Use the default network configuration
Libvirt starts, and the default network "starts" but dnsmasq isn't started to serve DHCP.
<bridge name="virbr0" />
<ip address="192.168.122.1" netmask="255.255.255.0">
<range start="192.168.122.2" end="192.168.122.254" />
:virsh net-info default
:ps aux | grep dnsmasq
root 22874 0.0 0.0 109248 884 pts/4 S+ 16:28 0:00 grep --color=auto dnsmasq
If I manually start dnsmasq it works, but it should start automatically with libvirtd turning up the "default" network
/usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/default.pid --conf-file= --except-interface lo --listen-address 192.168.122.1 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override
Not sure if it matters, but it appears that Centos 6.3 does the same thing.
I'm using "network" instead of "NetworkManager" if that's somehow related.
virsh net-destroy default; virsh net-start default
causes dnsmasq to restart. Working with laine on IRC, if dnsmasq crashes or is killed, and virbr0 is still present, restarting libvirtd will *NOT* restart dnsmasq.
dnsmasq is only started by libvirtd if it thinks the network is not already up, which it determines by seeing if the virbr0 device is present. Part of libvirtd turning up the networks should probably confirm that dnsmasq is running, and if not, start it.
I agree we should be checking for dnsmasq and restarting it if needed (and probably giving it a SIGHUP even if it's there, just for good measure).
In our discussion on IRC, you figured out that you had run /etc/init.d/dnsmasq restart" and that had killed dnsmasq. But that script doesn't exist on F16, because it has switched to using systemd. (and when I run "service dnsmasq restart" or "systemctl restart dnsmasq.service", it fails and doesn't kill all of libvirtd's dnsmasq instances).
Was that original behavior only seen on CentOS, and you just verified the result on F16 by manually killing the dnsmasq processes? Or is there some other weird circumstance that causes dnsmasq processes to be killed?
It was originally seen on CentOS. I tried it on F16, but forcibly killing dnsmasq first, wanting to see if it would restart.
Okay, so there isn't a separate "dnsmasq is myseteriously dying" bug on F16. That's good to know :-)
I've changed the summary of this BZ to more accurately reflect what's needed from libvirt.
Thanks for the report and extra investigation!
Upstream libvirt has been enhanced to restart radvd/dnsmasq when needed when libvirtd is restarted. It will also send a SIGHUP to all dnsmasq and radvd processes when libvirtd is restarted. The following two commits are required for this new behavior. I'm not sure how easily they will backport to the libvirt that's in F16 (which this BZ is filed against) or F17, but they will be in 0.10.2, which means they will automatically be in F18.
If the backport isn't trivial, we may want to consider marking this as CLOSED/NEXTRELEASE or CLOSED/UPSTREAM instead.
Author: Laine Stump <firstname.lastname@example.org>
Date: Sun Sep 16 21:22:27 2012 -0400
network: restart radvd/dnsmasq if needed when libvirtd is restarted
A user on IRC had accidentally killed all of his libvirt-started
dnsmasq instances (due to a buggy dnsmasq service script in Fedora
16), and had hoped that libvirtd would notice this on restart and
reload all the dnsmasq daemons (as it does with iptables
rules). Unfortunately this was not the case - as long as the network
object had a pid registered for dnsmasq and/or radvd, it assumed that
the processes were running.
This patch takes advantage of the new utility functions in
bridge_driver.c to do a "refresh" of all radvd and dnsmasq processes
started by libvirt each time libvirtd is restarted - this function
attempts to do a SIGHUP of each existing process, and if that fails,
it restarts the process, rebuilding all the associated config files
and commandline parameters in the process. This normally has no
effect, but will be useful in solving the occasional "odd situation"
without needing to take the drastic step of destroying/re-starting the
Author: Laine Stump <email@example.com>
Date: Mon Aug 20 00:59:46 2012 -0400
network: reorganize dnsmasq and radvd config file / startup
This patch splits the starting of dnsmasq and radvd into multiple
files, and adds new networkRefreshXX() and networkRestartXX()
functions for each. These new functions are currently commented out
because they won't be used until the next commit, and the compile options
require all static functions to be used.
networkRefreshXX() - rewrites any file-based config for dnsmasq/radvd,
and sends SIGHUP to the process to make it reread its config. If the
program isn't already running, it's just started.
networkRestartXX() - kills the given program, waits for it to exit
(see the comments in the function networkKillDaemon()), then calls
This commit is here mostly as a checkpoint to verify no change in
functional behavior after refactoring networkStartXX() functions to
fit in with these new functions.
Amazingly these patches apply cleanly to F16 maint. However given the size of the changes, the (hopefully) rarity of the issue, and the fact that there's a workaround (destroy, start), I don't plan on backporting these to the maintenance branches.
Moving to F18.
Aaaaand libvirt 0.10.2 is already in F18, so just closing as CURRENTRELEASE