Bug 884957 - guest can not get NAT IP from dnsmasq-2.48-10
guest can not get NAT IP from dnsmasq-2.48-10
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: dnsmasq (Show other bugs)
6.4
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Tomas Hozza
qe-baseos-daemons
: Patch
: 886641 886682 887928 892448 (view as bug list)
Depends On:
Blocks: 804141 888457
  Show dependency treegraph
 
Reported: 2012-12-07 02:26 EST by Huang Wenlong
Modified: 2013-10-20 17:46 EDT (History)
19 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
This Bug was caused by backported patch for Bug #882251. In the end I used different approach. So this Bug does not need to be documented. dnsmasq-2.48-10.el6.x86_64 was never distributed to the customer.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-21 05:44:59 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
domain xml (2.54 KB, text/plain)
2012-12-10 05:02 EST, Huang Wenlong
no flags Details
default network xml (366 bytes, text/plain)
2012-12-10 05:03 EST, Huang Wenlong
no flags Details
strace of the failing dnsmasq (23.85 KB, text/plain)
2012-12-11 14:15 EST, Laine Stump
no flags Details
sosreport and var log messages (1.73 MB, application/x-gzip)
2013-01-08 11:35 EST, IBM Bug Proxy
no flags Details
RHEL 6.4 guest xml (2.42 KB, text/plain)
2013-01-08 11:35 EST, IBM Bug Proxy
no flags Details
proposed patch (606 bytes, text/plain)
2013-01-09 11:51 EST, IBM Bug Proxy
no flags Details

  None (edit)
Description Huang Wenlong 2012-12-07 02:26:30 EST
Description of problem:
guest can not get NAT IP from dnsmasq-2.48-10

Version-Release number of selected component (if applicable):
libvirt-0.10.2-11.el6.x86_64
dnsmasq-2.48-10.el6.x86_64


How reproducible:
100

Steps to Reproduce:
1.start a NATnetwork guest
2.try to get ip address from dhcp
3. guest can not get IP address sometime dnsmasq process will down

#cat /var/log/messages |grep dnsmasq

Dec 7 02:13:57 intel-q9400-4-2 dnsmasq-dhcp[2853]: DHCPDISCOVER(virbr0)
52:54:00:b2:68:d7
Dec 7 02:13:57 intel-q9400-4-2 dnsmasq-dhcp[2853]: DHCPOFFER(virbr0)
192.168.122.19 52:54:00:b2:68:d7
Dec 7 02:13:57 intel-q9400-4-2 dnsmasq-dhcp[2853]: DHCPDISCOVER(virbr0)
52:54:00:b2:68:d7
Dec 7 02:13:57 intel-q9400-4-2 dnsmasq-dhcp[2853]: DHCPOFFER(virbr0)
192.168.122.19 52:54:00:b2:68:d7
Dec 7 02:13:57 intel-q9400-4-2 dnsmasq-dhcp[2853]: DHCPREQUEST(virbr0)
192.168.122.19 52:54:00:b2:68:d7
Dec 7 02:13:57 intel-q9400-4-2 dnsmasq-dhcp[2853]: DHCPACK(virbr0)
192.168.122.19 52:54:00:b2:68:d7
Dec 7 02:14:33 intel-q9400-4-2 dnsmasq[2853]: exiting on receipt of SIGTERM
Dec 7 02:14:33 intel-q9400-4-2 dnsmasq[3045]: started, version 2.48
cachesize 150
Dec 7 02:14:33 intel-q9400-4-2 dnsmasq[3045]: compile time options: IPv6
GNU-getopt DBus no-I18N DHCP TFTP
Dec 7 02:14:33 intel-q9400-4-2 dnsmasq[3045]: reading /etc/resolv.conf
Dec 7 02:14:33 intel-q9400-4-2 dnsmasq[3045]: using nameserver
172.16.52.28#53
Dec 7 02:14:33 intel-q9400-4-2 dnsmasq[3045]: using nameserver 10.68.5.11#53
Dec 7 02:14:33 intel-q9400-4-2 dnsmasq[3045]: using nameserver
10.66.127.10#53
Dec 7 02:14:33 intel-q9400-4-2 dnsmasq[3045]: read /etc/hosts - 2 addresses
Dec 7 02:14:33 intel-q9400-4-2 yum[3003]: Updated:
dnsmasq-2.48-10.el6.x86_64
Dec 7 02:14:43 intel-q9400-4-2 dnsmasq[3119]: failed to bind listening
socket for ::1: Address already in use
Dec 7 02:14:43 intel-q9400-4-2 dnsmasq[3119]: FAILED to start up





Actual results:
as step

Expected results:
guest can get ip address

Additional info:
libvirt-0.10.2-10.el6.x86_64 dnsmasq-2.48-10.el6.x86_64 they work well
libvirt-0.10.2-11.el6.x86_64 dnsmasq-2.48-9.el6.x86_64 they work well
libvirt-0.10.2-11.el6.x86_64 dnsmasq-2.48-10.el6.x86_64 they work bad
Comment 1 Jiri Denemark 2012-12-10 04:50:17 EST
Could you attach the XML definitions for both domain and network used to trigger this bug?
Comment 2 Huang Wenlong 2012-12-10 05:01:59 EST
Hi,Jiri

I used the most simple xml in this case ,I will attach the guest and net xml

Wenlong
Comment 3 Huang Wenlong 2012-12-10 05:02:38 EST
Created attachment 660659 [details]
domain xml
Comment 4 Huang Wenlong 2012-12-10 05:03:06 EST
Created attachment 660660 [details]
default network xml
Comment 5 Huang Wenlong 2012-12-10 05:08:37 EST
The dnsmasq process does quit  when start the domain via libvirt 
but libvirt do not know that.

# ps -ef |grep dns
root     17366  8976  0 18:04 pts/0    00:00:00 grep dns


[root@intel-w3520-12-2 rpms]# virsh net-list 
Name                 State      Autostart     Persistent
--------------------------------------------------
default              active     yes           yes


restart libvirtd can start dnsmasq successed 

[root@intel-w3520-12-2 rpms]# /etc/init.d/libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]
[root@intel-w3520-12-2 rpms]# virsh net-list 
Name                 State      Autostart     Persistent
--------------------------------------------------
default              active     yes           yes

[root@intel-w3520-12-2 rpms]# ps -ef |grep dns
nobody   17499     1  0 18:05 ?        00:00:00 /usr/sbin/dnsmasq --strict-order --local=// --domain-needed --pid-file=/var/run/libvirt/network/default.pid --conf-file= --bind-dynamic --interface virbr0 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override --dhcp-hostsfile=/var/lib/libvirt/dnsmasq/default.hostsfile --addn-hosts=/var/lib/libvirt/dnsmasq/default.addnhosts
root     17555  8976  0 18:05 pts/0    00:00:00 grep dns


then start a guest , dnsmasq process will quit . 

tail /var/log/message
Dec 10 18:07:06 intel-w3520-12-2 dnsmasq[17499]: failed to bind listening socket for ::1: Address already in use
Dec 10 18:07:06 intel-w3520-12-2 dnsmasq[17499]: FAILED to start up
Comment 6 Jiri Denemark 2012-12-10 10:22:57 EST
I think this is a bug in --bind-dynamic implementation of dnsmasq-2.48-10.el6. The only difference between dnsmasq command line generated by libvirt libvirt-0.10.2-10.el6 and libvirt-0.10.2-11.el6 is that the former uses --bind-interfaces while the latter uses --bind-dynamic when used with dnsmasq-2.48-10.el6.
Comment 9 Laine Stump 2012-12-11 14:15:12 EST
Created attachment 661611 [details]
strace of the failing dnsmasq

This strace collected by Peter Krempa shows that dnsmasq is attempting to bind to "::1" twice - succeeding the first time, but failing the 2nd time.
Comment 15 RHEL Product and Program Management 2012-12-12 15:29:33 EST
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.
Comment 19 Dave Allan 2012-12-13 09:51:15 EST
*** Bug 886641 has been marked as a duplicate of this bug. ***
Comment 29 Eric Blake 2012-12-14 12:14:11 EST
dnsmasq --version already comes with a line that looks like:
Compile time options IPv6 GNU-getopt DBus no-I18N DHCP TFTP

adding a new "option" to this line that appears only when the CVE is fixed will work for libvirt.
Comment 37 Eric Blake 2012-12-18 11:20:26 EST
Unfortunately, the mere act of 'yum reinstall dnsmasq -y' for dnsmasq-2.48-11.el6.x86_64 runs a 'killall dnsmasq', which fries the dnsmasq instances being run by libvirtd.  This is unacceptable behavior, as it kills network connectivity of guests that libvirt is managing.  I'm moving this back to ASSIGNED to make sure we get that fixed (although it might be worth spawning into another BZ to have this one just track the CVE fix).
Comment 38 Eric Blake 2012-12-18 11:23:37 EST
In the same vein, even though 'chkconfig --list dnsmasq' on my system shows:
dnsmasq        	0:off	1:off	2:off	3:off	4:off	5:off	6:off

the act of upgrading dnsmasq started a global /usr/sbin/dnsmasq process with no command line arguments.  A global dnsmasq should only be started if the service is enabled, and not merely because a newer dnsmasq was installed.
Comment 39 Eric Blake 2012-12-18 11:24:24 EST
See bug 850944 for the issues mentioned in comments 37 and 38.
Comment 42 Greg Nichols 2012-12-19 13:55:43 EST
*** Bug 887928 has been marked as a duplicate of this bug. ***
Comment 44 Laine Stump 2012-12-20 15:05:21 EST
*** Bug 886682 has been marked as a duplicate of this bug. ***
Comment 45 Laine Stump 2013-01-08 11:30:01 EST
*** Bug 892448 has been marked as a duplicate of this bug. ***
Comment 46 IBM Bug Proxy 2013-01-08 11:35:50 EST
Created attachment 674948 [details]
sosreport and var log messages
Comment 47 IBM Bug Proxy 2013-01-08 11:35:59 EST
Created attachment 674949 [details]
RHEL 6.4 guest xml
Comment 48 IBM Bug Proxy 2013-01-08 22:41:11 EST
------- Comment From onmahaja@in.ibm.com 2013-01-09 03:33 EDT-------
I also observed this fact.  I can sometimes see these lines in /var/log/messages

Jan  8 09:51:23 localhost dnsmasq[2615]: failed to bind listening socket for ::1: Address already in use
Jan  8 09:51:23 localhost dnsmasq[2615]: FAILED to start up

But this also happens with other addresses assigned to virbr0
Check this out -

Jan  7 12:58:18 oc2826874472 dnsmasq[3098]: failed to bind listening socket for 192.168.254.1: Address already in use
Jan  7 12:58:18 oc2826874472 dnsmasq[3098]: FAILED to start up

Jan  8 15:26:29 oc2826874472 dnsmasq[3193]: failed to bind listening socket for 192.168.122.1 : Address already in use
Jan  8 15:26:29 oc2826874472 dnsmasq[3193]: FAILED to start up

and consequently
libvirt fails to start the dnsmasq daemon - and hence the guest DHCP queries are not responded - there is clearly something wrong with the dnsmasq  '--interface' option which binds to specified interface ( in this case virbr0) .

dnsmasq daemon fails to start in  src/network.c :create_bound_listeners()

Investigating the reasons ...
Comment 49 IBM Bug Proxy 2013-01-08 23:01:50 EST
------- Comment From onmahaja@in.ibm.com 2013-01-09 03:58 EDT-------
As mentioned in  comment #23 libvirt fails to start the dnsmasq daemon in

2013-01-07 01:58:34.930+0000: 15544: error : virCommandWait:2345 : internal error Child process (/usr/sbin/dnsmasq --strict-order --local=// --domain-needed --pid-file=/var/run/libvirt/network/default.pid --conf-file= --bind-dynamic --interface virbr0 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override --dhcp-hostsfile=/var/lib/libvirt/dnsmasq/default.hostsfile --addn-hosts=/var/lib/libvirt/dnsmasq/default.addnhosts) unexpected exit status 2:
dnsmasq: failed to bind listening socket for ::1: Address already in use
Comment 50 IBM Bug Proxy 2013-01-08 23:21:20 EST
------- Comment From onmahaja@in.ibm.com 2013-01-09 04:13 EDT-------
As mentioned in  comment #23 libvirt fails to start the dnsmasq daemon in

2013-01-07 01:58:34.930+0000: 15544: error : virCommandWait:2345 : internal error Child process (/usr/sbin/dnsmasq --strict-order --local=// --domain-needed --pid-file=/var/run/libvirt/network/default.pid --conf-file= --bind-dynamic --interface virbr0 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override --dhcp-hostsfile=/var/lib/libvirt/dnsmasq/default.hostsfile --addn-hosts=/var/lib/libvirt/dnsmasq/default.addnhosts) unexpected exit status 2:
dnsmasq: failed to bind listening socket for ::1: Address already in use

------- Comment From onmahaja@in.ibm.com 2013-01-09 04:14 EDT-------
As mentioned in  comment #23 libvirt fails to start the dnsmasq daemon in

2013-01-07 01:58:34.930+0000: 15544: error : virCommandWait:2345 : internal error Child process (/usr/sbin/dnsmasq --strict-order --local=// --domain-needed --pid-file=/var/run/libvirt/network/default.pid --conf-file= --bind-dynamic --interface virbr0 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override --dhcp-hostsfile=/var/lib/libvirt/dnsmasq/default.hostsfile --addn-hosts=/var/lib/libvirt/dnsmasq/default.addnhosts) unexpected exit status 2:
dnsmasq: failed to bind listening socket for ::1: Address already in use
Comment 51 IBM Bug Proxy 2013-01-09 11:51:42 EST
Created attachment 675724 [details]
proposed patch


------- Comment on attachment From onmahaja@in.ibm.com 2013-01-09 16:44 EDT-------


Note this - 

--interface=  			: disables all interfaces except loop
--interface=virbr0 	: disables all interfaces except virbr0 & loop

Hence, 

# netstat -napt  | grep :53
tcp        0      0 127.0.0.1:53                0.0.0.0:*                   LISTEN      10417/dnsmasq       
tcp        0      0 192.168.122.1:53            0.0.0.0:*                   LISTEN      10417/dnsmasq       


excerpts from /var/log/messages :
Jan  9 11:02:34 oc2826874472 dnsmasq[5924]: failed to bind listening socket for 192.168.122.1: Address already in use
Jan  9 11:02:34 oc2826874472 dnsmasq[5924]: FAILED to start up


attached patch enables libvirt to issue dnsmasq with "--except-interface lo" - i.e., disabled loop interface


After restarting libvirt with this patch - libvirt issues dnsmasq - 
/usr/sbin/dnsmasq --strict-order --local=// --domain-needed --pid-file=/var/run/libvirt/network/default.pid --conf-file= --bind-dynamic --except-interface lo --interface virbr0 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override --dhcp-hostsfile=/var/lib/libvirt/dnsmasq/default.hostsfile --addn-hosts=/var/lib/libvirt/dnsmasq/default.addnhosts

and 

# netstat -napt  | grep :53
tcp        0      0 192.168.122.1:53            0.0.0.0:*                   LISTEN      308/dnsmasq         

and guests get DHCP leased IPs

patch attach - please share your comments
Comment 52 Laine Stump 2013-01-09 13:07:41 EST
I'm guessing you were redirected here from Bug 892448 (filed by IBM). Note that this BZ is in VERIFIED state, which means that it has already been fixed. You just need to update both dnsmasq and libvirt to at least the following versions:

  libvirt-0.10.2-13.el6.x86_64
  dnsmasq-2.48-12.el6.x86_64

This removes "--bind-dynamic" from dnsmasq (which libvirt automatically detects) and modifies libvirt to still allow networks using public addresses as long as dnsmasq was built to use SO_BINDTODEVICE (which is now indicated in dnsmasq's --version output).

You will then not need any other patch.
Comment 53 IBM Bug Proxy 2013-01-18 09:11:23 EST
------- Comment From nabharay@in.ibm.com 2013-01-18 14:00 EDT-------
Hi,

I upgraded to the latest RHEL 6.4 snap3 kernel and now the guests are able to get DHCP ip's.

[root@phx3 ~]# uname -a
Linux phx3.in.ibm.com 2.6.32-353.el6.x86_64 #1 SMP Mon Jan 7 15:35:17 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

[root@phx3 ~]# rpm -qa|grep dnsmasq
dnsmasq-2.48-13.el6.x86_64

[root@phx3 ~]# rpm -qa|grep libvirt
libvirt-0.10.2-15.el6.x86_64

This issue can be closed now.

Thanks,

Nabhajit Ray
Comment 54 errata-xmlrpc 2013-02-21 05:44:59 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0277.html

Note You need to log in before you can comment on or make changes to this bug.