Bug 1154953
Summary: | Virtual machine fails to start due to a problem with dnsmasq | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Matthias Scheutz <matthias.scheutz> | ||||
Component: | dnsmasq | Assignee: | Pavel Šimerda (pavlix) <psimerda> | ||||
Status: | CLOSED ERRATA | QA Contact: | Jan Ščotka <jscotka> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 6.7 | CC: | dmitry, dyuan, ftaylor, jdenemar, jscotka, matthias.scheutz, mzhan, psklenar, rbalakri, rmy, salmy, thomas.j.thompson, thozza, tlavigne, vvasilev, wonczak | ||||
Target Milestone: | rc | Keywords: | Patch, Regression | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | dnsmasq-2.48-17.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-05-11 01:04:02 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Matthias Scheutz
2014-10-21 06:31:32 UTC
Are you sure you don't have a conflicting dnsmasq running on the host? What is the output of "ps -fwC dnsmasq" command? Not that I know of, the output of "ps -fwC dnsmasq" is this: UID PID PPID C STIME TTY TIME CMD and the dnsmasq service is stopped I should add that when I stop the dhcp server using service dhcpd stop and then reload libvirtd service reload libvirtd then I can start the VM using virt-manager and it works fine. However, it is then not possible to restart dhcpd, i.e., service dhcpd start always fails. So it seems that the latest update to libvirt or related packages somehow changed the port dhcpd is listening in the virtual network when dnsmasq gets called as part of staring libvirtd, is that possible? Please advise. Any suggestions on how to resolve this? Right now, we cannot run the VM and the DHCP server at the same time as we used to before. And if I turn the DHCP server off and start them VM, this is what I get doing "netstat -aunp": udp 0 0 192.168.122.1:53 0.0.0.0:* 3523/dnsmasq udp 0 0 192.168.0.254:53 0.0.0.0:* 1788/named udp 0 0 127.0.0.1:53 0.0.0.0:* 1788/named udp 0 0 0.0.0.0:67 0.0.0.0:* 3523/dnsmasq I had the same problem, too (albeit on Centos 6). Here downgrading dnsmasq-2.48-14.el6.x86_64 to dnsmasq-2.48-13.el6.x86_64 solved the problem for me. While googling, I stubled on an old FC18-Bugreport, which described the exact same problem https://bugzilla.redhat.com/show_bug.cgi?id=977555 whch was fixed at that time. Maybe for some reason this bug got resurrected in the latest dnsmasq package. Yup, downgrading to dnsmasq-2.48-13.el6.x86_64 worked, thanks for the tip Stephan! For the RH developers: When I do "netstat -aunp" now I get: udp 0 0 192.168.122.1:53 0.0.0.0:* 2786/dnsmasq udp 0 0 192.168.0.254:53 0.0.0.0:* 1789/named udp 0 0 127.0.0.1:53 0.0.0.0:* 1789/named udp 0 0 0.0.0.0:67 0.0.0.0:* 2786/dnsmasq udp 0 0 0.0.0.0:67 0.0.0.0:* 2410/dhcpd So, with the older version of dnsmasq dhcpd can be listening at 0 0.0.0.0:67 as well it seems It would be great, if this regression could be fixed upstream Is there an update on when this will be fixed? An ETA would be appreciated from me as well. I hit this problem w/o running dhcpd, libvirtd can't start the virtual networks at all in my particular case. Here's the traceback in virt manager when attempting to start a network (it's nearly identical to the F18 bug IIRC): Traceback (most recent call last): File "/usr/share/virt-manager/virtManager/asyncjob.py", line 44, in cb_wrapper callback(asyncjob, *args, **kwargs) File "/usr/share/virt-manager/virtManager/asyncjob.py", line 65, in tmpcb callback(*args, **kwargs) File "/usr/share/virt-manager/virtManager/network.py", line 82, in start self.net.create() File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2128, in create if ret == -1: raise libvirtError ('virNetworkCreate() failed', net=self) libvirtError: internal error Child process (/usr/sbin/dnsmasq --strict-order --pid-file=/var/run/libvirt/network/isolated_local.pid --conf-file= --except-interface lo --bind-interfaces --listen-address 192.168.100.1 --dhcp-option=3 --no-resolv --dhcp-range 192.168.100.128,192.168.100.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/isolated_local.leases --dhcp-lease-max=127 --dhcp-no-override --dhcp-hostsfile=/var/lib/libvirt/dnsmasq/isolated_local.hostsfile --addn-hosts=/var/lib/libvirt/dnsmasq/isolated_local.addnhosts) unexpected exit status 2: dnsmasq: failed to set SO_REUSE{ADDR|PORT} on DHCP socket: Protocol not available Note that the downgrade in dnsmasq is a viable work around at this point for me, but I'd rather the repo have a fixed version... (In reply to Dr. Stephan Wonczak from comment #6) > I had the same problem, too (albeit on Centos 6). Here downgrading > dnsmasq-2.48-14.el6.x86_64 > to > dnsmasq-2.48-13.el6.x86_64 > solved the problem for me. I have just compared the two versions and they appear to only differ in the initscript. As far as I know, libvirt is not supposed to use the initscript at all and therefore the change is unlikely to affect it, unless I missed something. (In reply to TJ from comment #10) > An ETA would be appreciated from me as well. I hit this problem w/o running > dhcpd, libvirtd can't start the virtual networks at all in my particular > case. Have you checked whether any daemons are using the DHCP server port other than instances of dnsmasq started by libvirt? > Here's the traceback in virt manager when attempting to start a network > (it's nearly identical to the F18 bug IIRC): > > /usr/sbin/dnsmasq --strict-order > --pid-file=/var/run/libvirt/network/isolated_local.pid --conf-file= > --except-interface lo --bind-interfaces --listen-address 192.168.100.1 > --dhcp-option=3 --no-resolv --dhcp-range 192.168.100.128,192.168.100.254 > --dhcp-leasefile=/var/lib/libvirt/dnsmasq/isolated_local.leases > --dhcp-lease-max=127 --dhcp-no-override > --dhcp-hostsfile=/var/lib/libvirt/dnsmasq/isolated_local.hostsfile > --addn-hosts=/var/lib/libvirt/dnsmasq/isolated_local.addnhosts Noting the `--bind-interfaces` option also referred to in the dnsmasq source code, see below. > dnsmasq: failed to set SO_REUSE{ADDR|PORT} on DHCP socket: Protocol not > available This error line uniquely identifies actual code where that happens. /* When bind-interfaces is set, there might be more than one dnmsasq instance binding port 67. That's OK if they serve different networks. Need to set REUSEADDR|REUSEPORT to make this posible. Handle the case that REUSEPORT is defined, but the kernel doesn't support it. This handles the introduction of REUSEPORT on Linux. */ if (option_bool(OPT_NOWILD) || option_bool(OPT_CLEVERBIND)) { int rc = 0; #ifdef SO_REUSEPORT if ((rc = setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &oneopt, sizeof(oneopt))) == -1 && errno == ENOPROTOOPT) rc = 0; #endif if (rc != -1) rc = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &oneopt, sizeof(oneopt)); if (rc == -1) die(_("failed to set SO_REUSE{ADDR|PORT} on DHCP socket: %s"), NULL, EC_BADNET); } From the source code it looks like the error message is only printed when `setsockopt(..., SO_REUSEADDR, ...)` exits with ENOPROTOOPT, if I didn't miss something. The SO_REUSEPORT part looks safe from ENOPROTOOPT to me. > Note that the downgrade in dnsmasq is a viable work around at this point for > me, but I'd rather the repo have a fixed version... Can you confirm that the downgrade actually helps? Do we have a simple reproducer not involving libvirt? I will later attempt to run the command above launched by libvirt in case it is enough to reproduce the issue. Here's how I reproduce the issue: - Stopped all virt networks using virt manager - Verified nothing on the DHCP ports using: sudo lsof -iUDP - Upgrade the dnsmasq package - Try to start a virtual network (via virt manager) and it fails as noted above. - Downgrade dnsmasq - Retry network start. Networks start as expected. libvirt version is: libvirt-0.10.2-46.el6.x86_64 I'll see if I can reproduce the problem manually... (In reply to TJ from comment #12) > - Downgrade dnsmasq Have you also tried to just restart dnsmasq instead of downgrading at this point? We tried that, it does not work. The only thing that works is downgrading to dnsmasq.x86_64 0:2.48-13.el6. Would be great if this could be fixed, Matthias With a bit of stracing with the -13 and -14 versions of dnsmasq we confirmed that the difference wasn't in code but rather in build environment and more specifically version of the kernel headers. The former version didn't detect support for SO_REUSEPORT at compile time and therefore uses SO_REUSEADDR while the latter uses SO_REUSEPORT. The upstream solution is to use both SO_REUSEADDR and SO_REUSEPORT. The respective upstream commit follows... commit ffbad34b310ab2db6a686c85f5c0a0e52c0680c8 Author: Simon Kelley <simon.uk> Date: Wed Aug 14 15:53:57 2013 +0100 Set SOREUSEADDR as well as SOREUSEPORT on DHCP sockets when both available. diff --git a/src/dhcp.c b/src/dhcp.c index 333a327..b95a4ba 100644 --- a/src/dhcp.c +++ b/src/dhcp.c @@ -70,15 +70,15 @@ static int make_fd(int port) support it. This handles the introduction of REUSEPORT on Linux. */ if (option_bool(OPT_NOWILD) || option_bool(OPT_CLEVERBIND)) { - int rc = -1, porterr = 0; + int rc = 0; #ifdef SO_REUSEPORT if ((rc = setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &oneopt, sizeof(oneopt))) == -1 && - errno != ENOPROTOOPT) - porterr = 1; + errno == ENOPROTOOPT) + rc = 0; #endif - if (rc == -1 && !porterr) + if (rc != -1) rc = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &oneopt, sizeof(oneopt)); if (rc == -1) diff --git a/src/dhcp6.c b/src/dhcp6.c index 17e03e5..89af7dd 100644 --- a/src/dhcp6.c +++ b/src/dhcp6.c @@ -55,15 +55,15 @@ void dhcp6_init(void) support it. This handles the introduction of REUSEPORT on Linux. */ if (option_bool(OPT_NOWILD) || option_bool(OPT_CLEVERBIND)) { - int rc = -1, porterr = 0; + int rc = 0; #ifdef SO_REUSEPORT if ((rc = setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &oneopt, sizeof(oneopt))) == -1 && - errno != ENOPROTOOPT) - porterr = 1; + errno == ENOPROTOOPT) + rc = 0; #endif - if (rc == -1 && !porterr) + if (rc != -1) rc = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &oneopt, sizeof(oneopt)); if (rc == -1) *** Bug 1176224 has been marked as a duplicate of this bug. *** Created attachment 1112206 [details]
Patch adapted to RHEL 6.
We just update to dnsmasq.x86_64 0:2.48-16.el6_7 and kernel-2.6.32-573.18.1.el6.x86_64 but the problem is still there... Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0949.html |