Red Hat Bugzilla – Full Text Bug Listing
|Summary:||libvirtd should check for properly running dnsmasq on networks presumed "active" at startup (and start one if necessary)|
|Product:||[Fedora] Fedora||Reporter:||Scott Baker <scott>|
|Component:||libvirt||Assignee:||Laine Stump <laine>|
|Status:||CLOSED CURRENTRELEASE||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Version:||18||CC:||berrange, clalancette, crobinso, itamar, jforbes, jyang, laine, libvirt-maint, veillard, virt-maint|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2012-10-01 17:37:36 EDT||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Scott Baker 2012-08-21 19:31:03 EDT
Description of problem: When I start libvirtd with a "default" network, it does not spawn dnsmasq to serve DHCP requests Version-Release number of selected component (if applicable): libvirt-0.9.6.1-1.fc16.x86_64 How reproducible: Easily Steps to Reproduce: 1. Install libvirtd 2. Use the default network configuration Actual results: Libvirt starts, and the default network "starts" but dnsmasq isn't started to serve DHCP. ---------------------------------------------------------- :cat /etc/libvirt/qemu/networks/default.xml <network> <name>default</name> <uuid>06974f33-83e7-4780-9532-bf3d7acefa7c</uuid> <bridge name="virbr0" /> <mac address='52:54:00:CA:D3:5D'/> <forward/> <ip address="192.168.122.1" netmask="255.255.255.0"> <dhcp> <range start="192.168.122.2" end="192.168.122.254" /> </dhcp> </ip> </network> :virsh net-info default Name default UUID 06974f33-83e7-4780-9532-bf3d7acefa7c Active: yes Persistent: yes Autostart: yes Bridge: virbr0 :ps aux | grep dnsmasq root 22874 0.0 0.0 109248 884 pts/4 S+ 16:28 0:00 grep --color=auto dnsmasq If I manually start dnsmasq it works, but it should start automatically with libvirtd turning up the "default" network /usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/default.pid --conf-file= --except-interface lo --listen-address 192.168.122.1 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override Not sure if it matters, but it appears that Centos 6.3 does the same thing.
Comment 1 Scott Baker 2012-08-22 11:08:00 EDT
I'm using "network" instead of "NetworkManager" if that's somehow related.
Comment 2 Scott Baker 2012-08-22 13:55:04 EDT
virsh net-destroy default; virsh net-start default causes dnsmasq to restart. Working with laine on IRC, if dnsmasq crashes or is killed, and virbr0 is still present, restarting libvirtd will *NOT* restart dnsmasq. dnsmasq is only started by libvirtd if it thinks the network is not already up, which it determines by seeing if the virbr0 device is present. Part of libvirtd turning up the networks should probably confirm that dnsmasq is running, and if not, start it.
Comment 3 Laine Stump 2012-08-24 02:11:16 EDT
I agree we should be checking for dnsmasq and restarting it if needed (and probably giving it a SIGHUP even if it's there, just for good measure). An aside: In our discussion on IRC, you figured out that you had run /etc/init.d/dnsmasq restart" and that had killed dnsmasq. But that script doesn't exist on F16, because it has switched to using systemd. (and when I run "service dnsmasq restart" or "systemctl restart dnsmasq.service", it fails and doesn't kill all of libvirtd's dnsmasq instances). Was that original behavior only seen on CentOS, and you just verified the result on F16 by manually killing the dnsmasq processes? Or is there some other weird circumstance that causes dnsmasq processes to be killed?
Comment 4 Scott Baker 2012-08-24 11:09:12 EDT
It was originally seen on CentOS. I tried it on F16, but forcibly killing dnsmasq first, wanting to see if it would restart.
Comment 5 Laine Stump 2012-08-24 14:16:37 EDT
Okay, so there isn't a separate "dnsmasq is myseteriously dying" bug on F16. That's good to know :-) I've changed the summary of this BZ to more accurately reflect what's needed from libvirt. Thanks for the report and extra investigation!
Comment 6 Laine Stump 2012-09-23 14:21:10 EDT
Upstream libvirt has been enhanced to restart radvd/dnsmasq when needed when libvirtd is restarted. It will also send a SIGHUP to all dnsmasq and radvd processes when libvirtd is restarted. The following two commits are required for this new behavior. I'm not sure how easily they will backport to the libvirt that's in F16 (which this BZ is filed against) or F17, but they will be in 0.10.2, which means they will automatically be in F18. If the backport isn't trivial, we may want to consider marking this as CLOSED/NEXTRELEASE or CLOSED/UPSTREAM instead. commit 4cf974b67427e33e3ce38df4787cddd6e2822d67 Author: Laine Stump <firstname.lastname@example.org> Date: Sun Sep 16 21:22:27 2012 -0400 network: restart radvd/dnsmasq if needed when libvirtd is restarted A user on IRC had accidentally killed all of his libvirt-started dnsmasq instances (due to a buggy dnsmasq service script in Fedora 16), and had hoped that libvirtd would notice this on restart and reload all the dnsmasq daemons (as it does with iptables rules). Unfortunately this was not the case - as long as the network object had a pid registered for dnsmasq and/or radvd, it assumed that the processes were running. This patch takes advantage of the new utility functions in bridge_driver.c to do a "refresh" of all radvd and dnsmasq processes started by libvirt each time libvirtd is restarted - this function attempts to do a SIGHUP of each existing process, and if that fails, it restarts the process, rebuilding all the associated config files and commandline parameters in the process. This normally has no effect, but will be useful in solving the occasional "odd situation" without needing to take the drastic step of destroying/re-starting the network. commit 1ce4922e720e125421b3f8061d0eb6fdd152c41a Author: Laine Stump <email@example.com> Date: Mon Aug 20 00:59:46 2012 -0400 network: reorganize dnsmasq and radvd config file / startup This patch splits the starting of dnsmasq and radvd into multiple files, and adds new networkRefreshXX() and networkRestartXX() functions for each. These new functions are currently commented out because they won't be used until the next commit, and the compile options require all static functions to be used. networkRefreshXX() - rewrites any file-based config for dnsmasq/radvd, and sends SIGHUP to the process to make it reread its config. If the program isn't already running, it's just started. networkRestartXX() - kills the given program, waits for it to exit (see the comments in the function networkKillDaemon()), then calls networkStartXX(). This commit is here mostly as a checkpoint to verify no change in functional behavior after refactoring networkStartXX() functions to fit in with these new functions.
Comment 7 Cole Robinson 2012-10-01 17:25:03 EDT
Amazingly these patches apply cleanly to F16 maint. However given the size of the changes, the (hopefully) rarity of the issue, and the fact that there's a workaround (destroy, start), I don't plan on backporting these to the maintenance branches. Moving to F18.
Comment 8 Cole Robinson 2012-10-01 17:37:36 EDT
Aaaaand libvirt 0.10.2 is already in F18, so just closing as CURRENTRELEASE