Bug 734903 - having dnsmasq start by default and bind to all interfaces breaks libvirt bridging
having dnsmasq start by default and bind to all interfaces breaks libvirt bri...
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: dnsmasq (Show other bugs)
16
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Douglas Schilling Landgraf
Fedora Extras Quality Assurance
RejectedBlocker AcceptedNTH
:
: 735414 (view as bug list)
Depends On:
Blocks: F16Beta-accepted/F16BetaFreezeExcept
  Show dependency treegraph
 
Reported: 2011-08-31 15:46 EDT by Adam Williamson
Modified: 2011-10-05 16:25 EDT (History)
12 users (show)

See Also:
Fixed In Version: dnsmasq-2.58-2.fc16
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-09-24 00:38:23 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Adam Williamson 2011-08-31 15:46:42 EDT
It seems that dnsmasq's sysv to systemd migration results in it being started by default. This is a change: its sysv init file did not specify any runlevels, hence it wasn't started by default. Its systemd service file states:

[Install]
WantedBy=multi-user.target

so it does get started by default.

It's run with the sole parameter -s adam.localdomain , which results in it binding to all interfaces. This prevents libvirtd from using dnsmasq in setting up its default networking config, which stops you running any VMs unless you have a custom networking setup for libvirt. The error you get in /var/log/messages and /var/log/libvirt/libvirtd.log is:

12:35:15.839: 2764: error : virCommandWait:2172 : internal error Child process (/usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/default.pid --conf-file= --except-interface lo --listen-address 192.168.122.1 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override) status unexpected: exit status 2

stopping the 'dnsmasq -s adam.localdomain' process and restarting libvirtd allowed it to run as intended.

dnsmasq probably shouldn't start by default, since I expect quite a lot of people have it installed simply because libvirtd needs it, and starting a 'standard' dnsmasq by default *interferes* with libvirtd. If it's going to start by default, it shouldn't bind all interfaces.

Proposing as a Beta blocker as this interferes with the stock Fedora virt setup: "The release must boot successfully as a virtual guest in a situation where the virtual host is running the same release (using Fedora's current preferred virtualization technology)".
Comment 1 Adam Williamson 2011-08-31 15:59:14 EDT
actually, just WantedBy isn't enough to make it get started by default, but it sure seems to be started by default.
Comment 2 Adam Williamson 2011-08-31 16:02:45 EDT
yup...if I downgrade dnsmasq, 'rm -f /etc/systemd/system/multi-user.target.wants/dnsmasq.service', upgrade dnsmasq, then '/etc/systemd/system/multi-user.target.wants/dnsmasq.service' is created: upgrading dnsmasq definitely results in it being enabled by default.
Comment 3 Adam Williamson 2011-08-31 16:08:21 EDT
confirming that after a clean install of the old dnsmasq, it's disabled:

[root@adam build]# rpm -e --nodeps dnsmasq
[root@adam build]# yum --disablerepo=updates-testing install dnsmasq
...
Installed:
  dnsmasq.x86_64 0:2.52-2.fc15          
[root@adam build]# chkconfig --list dnsmasq

Note: This output shows SysV services only and does not include native
      systemd services. SysV configuration data might be overridden by native
      systemd configuration.

dnsmasq        	0:off	1:off	2:off	3:off	4:off	5:off	6:off

so the %triggerun in dnsmasq.spec which enables the systemd service on upgrade from < 2.52-3 is incorrect, I believe.
Comment 4 Jóhann B. Guðmundsson 2011-09-01 05:20:40 EDT
Note that this is an error on my behalf in the spec file ( check post or triggerun ) 

Anyway of people are worried about some kind of "conflicts" with libvirt then I would say file an RFE against libvirt where you ask them to come up with a unit file that which can be a copy of the dnsmasq.service with the specific libvirt dnsmasq startup option along with one line added as in Conflicts=dnsmasqd.service in to the [unit] section of that service file which when started will stop the dnsmasqd.service and replace it with a running instance of their own. 

That problem should not be solved here.
Comment 5 Adam Williamson 2011-09-02 14:09:42 EDT
Discussed at 2011-09-02 blocker review meeting. Agreed that this is not a blocker, but NTH: although it violates "The release must boot successfully as a virtual guest in a situationwhere the virtual host is running the same release (using Fedora's current preferred virtualization technology)", it likely only affects upgrades, and is very easy to workaround.

It should be very easy to fix, just take the 'systemctl enable' line out of triggerun.
Comment 6 Dan Williams 2011-09-13 16:08:01 EDT
It'll break NetworkManager's internet connection sharing and local caching DNS functionality too.  There's no way that dnsmasq should be binding to all interfaces ever, unless the person configuring dnsmasq explicitly tells it to do so.  It certainly shouldn't be doing it by default.
Comment 7 Dan Williams 2011-09-13 16:14:54 EDT
(In reply to comment #4)
> Note that this is an error on my behalf in the spec file ( check post or
> triggerun ) 
> 
> Anyway of people are worried about some kind of "conflicts" with libvirt then I
> would say file an RFE against libvirt where you ask them to come up with a unit
> file that which can be a copy of the dnsmasq.service with the specific libvirt
> dnsmasq startup option along with one line added as in
> Conflicts=dnsmasqd.service in to the [unit] section of that service file which
> when started will stop the dnsmasqd.service and replace it with a running
> instance of their own. 
> 
> That problem should not be solved here.

I completely disagree.  The problem with a global dnsmasq configuration is that dnsmasq is used in two different general cases:

1) as a component of other services (libvirt and NetworkManager) that both generate their own configuration dynamically and supply it to dnsmasq either via a custom configuration file, or via the command line

2) standalone dnsmasq as a system service

Tailoring the systemd service configuration for #2, especially if that includes binding to all interfaces by default, breaks case #1, because these services aren't using the global dnsmasq configuration for very good reasons (ie they need specific options based on the interfaces they are spawning dnsmasq for, and those options are almost always not going to be in the global config).  The main problem is again that dnsmasq would be binding to all interfaces, which is really isn't required.  It gets used for so many different things that if the user wants that behavior, they should be required to modify the global configuration to enable it.
Comment 8 Adam Williamson 2011-09-13 16:27:08 EDT
suggestion: split dnsmasq into two packages, dnsmasq-core and dnsmasq. dnsmasq-core has the binary, dnsmasq has the system service. virt-manager and NM depend on dnsmasq-core.
Comment 9 Eric Blake 2011-09-15 16:32:50 EDT
I hit this issue on a clean F16 install from beta RC1; I think we should reconsider the blocker status
Comment 10 Adam Williamson 2011-09-15 16:57:23 EDT
so, the problem is that nothing post 2.52-2 made stable, so RC1 has 2.52-2; you get that on install, then on post-install update you get 2.58-1, so you hit this bug (it gets enabled incorrectly on the update).

We could take 2.58-1 onto RC1 to ensure this doesn't happen, but then we'd be making sure people who upgrade from 15 hit this bug on upgrade, and it's just a silly workaround.

I think we should simply reject the 2.58-1 update, and Patrick, Doug or Johann should submit an update which doesn't incorrectly enable the service on update.

This is a very simple fix and has been sitting here for weeks, so if no-one does it soon, I'm gonna do it myself.
Comment 11 Douglas Schilling Landgraf 2011-09-15 22:00:41 EDT
Hello Adam,

commit c47619565117b0b269a449d7c903852c1ac81c47
Author: Adam Williamson <awilliam@redhat.com>
Date:   Thu Sep 15 18:54:05 2011 -0700

    do not enable service on upgrade, it was not enabled by default before


Thanks for the below update. I am busy these days I couldn't update the package.

Cheers
Douglas
Comment 12 Fedora Update System 2011-09-15 22:06:12 EDT
dnsmasq-2.58-2.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/dnsmasq-2.58-2.fc16
Comment 13 Fedora Update System 2011-09-17 15:34:34 EDT
Package dnsmasq-2.58-2.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing dnsmasq-2.58-2.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/dnsmasq-2.58-2.fc16
then log in and leave karma (feedback).
Comment 14 Marcelo Moreira de Mello 2011-09-23 14:44:09 EDT
 Hello, 

 This package solved the issue, but after ran the a complete update at my box, the issue get back: 


Error starting network 'privateDHCP': internal error Child process (/usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/privateDHCP.pid --conf-file= --except-interface lo --dhcp-option=3 --no-resolv --listen-address 192.168.69.1) status unexpected: exit status 2

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 44, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 65, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/network.py", line 82, in start
    self.net.create()
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1452, in create
    if ret == -1: raise libvirtError ('virNetworkCreate() failed', net=self)
libvirtError: internal error Child process (/usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/privateDHCP.pid --conf-file= --except-interface lo --dhcp-option=3 --no-resolv --listen-address 192.168.69.1) status unexpected: exit status 2


notebook $> rpm -q dnsmasq libvirt libvirt-python libvirt-client virt-manager virt-manager-common; cat /etc/redhat-release  ; uname -a 
dnsmasq-2.58-2.fc16.x86_64
libvirt-0.9.4-1.fc16.x86_64
libvirt-python-0.9.4-1.fc16.x86_64
libvirt-client-0.9.4-1.fc16.x86_64
virt-manager-0.9.0-5.fc16.noarch
virt-manager-common-0.9.0-5.fc16.noarch


Fedora release 16 (Verne)

Linux notebook.mmello.local 3.1.0-0.rc6.git0.3.fc16.x86_64 #1 SMP Fri Sep 16 12:26:22 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
Comment 15 Adam Williamson 2011-09-23 14:52:38 EDT
marcelo: if you ever upgraded from 2.52 to 2.58-1, you would get the bug, and 2.58-2 would not fix it. 2.58-2 only ensures that people upgrading from 2.52 to 2.58-2 do not hit the bug, it would not turn the service off again if your previous release was 2.58-1.
Comment 16 Marcelo Moreira de Mello 2011-09-23 14:58:59 EDT
 Guys, 

 Restarting the libvirtd manually the virtual interfaces works as expected: 


virbr0    Link encap:Ethernet  HWaddr 52:54:00:93:75:45  
          inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

virbr1    Link encap:Ethernet  HWaddr 52:54:00:3C:12:19  
          inet addr:192.168.69.1  Bcast:192.168.69.255  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

 
 Now, going thru virt-manager and trying to stop the virbr1 and start it again it raises the error: 


Error starting network 'privateDHCP': internal error Child process (/usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/privateDHCP.pid --conf-file= --except-interface lo --dhcp-option=3 --no-resolv --listen-address 192.168.69.1) status unexpected: exit status 2

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 44, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 65, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/network.py", line 82, in start
    self.net.create()
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1452, in create
    if ret == -1: raise libvirtError ('virNetworkCreate() failed', net=self)
libvirtError: internal error Child process (/usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/privateDHCP.pid --conf-file= --except-interface lo --dhcp-option=3 --no-resolv --listen-address 192.168.69.1) status unexpected: exit status 2

    After hit the error, to workaround I followed that steps: 


      1) Close virt-manager GUI
   
  Then:  
   
[root@notebook ~]# ifconfig  | grep virbr1
[root@notebook ~]# /etc/init.d/libvirtd  stop
Stopping libvirtd (via systemctl):                         [  OK  ]
[root@notebook ~]# ifconfig  | grep virbr1
[root@notebook ~]# killall dnsmasq
[root@notebook ~]# killall dnsmasq
dnsmasq: no process found
[root@notebook ~]# /etc/init.d/libvirtd start
Starting libvirtd (via systemctl):                         [  OK  ]
[root@notebook ~]# ifconfig  | grep virbr1
virbr1    Link encap:Ethernet  HWaddr 52:54:00:3C:12:19  
[root@notebook ~]# ps aux | grep dnsma
nobody   10270  0.0  0.0  13088   576 ?        S    15:54   0:00 /usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/privateDHCP.pid --conf-file= --except-interface lo --dhcp-option=3 --no-resolv --listen-address 192.168.69.1
Comment 17 Douglas Schilling Landgraf 2011-09-23 15:05:50 EDT
Marcelo, it's expected, please see Adam's comment. I remember you sharing that your environment had 2.52, then you go to 2.58-1, then 2.58-2.

Please confirm.

Cheers
Douglas
Comment 18 Marcelo Moreira de Mello 2011-09-23 15:18:11 EDT
  
  Hey guys, 

notebook $> sudo yum downgrade dnsmasq-2.52
notebook $> rpm -q dnsmasq
dnsmasq-2.52-2.fc15.x86_64

notebook $> sudo /etc/init.d/libvirtd stop
Stopping libvirtd (via systemctl):                         [  OK  ]

notebook $> sudo killall dnsmasq
notebook $> sudo killall dnsmasq
dnsmasq: no process found

notebook $> yum update dnsmasq
 { .. SNIP .. }
Running Transaction
  Updating   : dnsmasq-2.58-2.fc16.x86_64                                   1/2 
warning: /etc/dnsmasq.conf created as /etc/dnsmasq.conf.rpmnew
  Cleanup    : dnsmasq-2.52-2.fc15.x86_64                                   2/2 

Updated:
  dnsmasq.x86_64 0:2.58-2.fc16  
  
notebook $> rpm -q dnsmasq
dnsmasq-2.58-2.fc16.x86_64

notebook $> sudo /etc/init.d/libvirtd  start
Starting libvirtd (via systemctl):                         [  OK  ]

notebook $> ifconfig  | grep virbr
virbr0    Link encap:Ethernet  HWaddr 52:54:00:93:75:45  
virbr1    Link encap:Ethernet  HWaddr 52:54:00:3C:12:19  

 Afterwards, going thru virt-manager GUI and trying to disable/enable the virtual network it worked as expected. 
 
 Thank you guys for heads up. 
 
Best, 
mmello
Comment 19 Fedora Update System 2011-09-24 00:38:10 EDT
dnsmasq-2.58-2.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 20 Douglas Schilling Landgraf 2011-09-26 16:21:49 EDT
*** Bug 735414 has been marked as a duplicate of this bug. ***
Comment 21 Marek Michał Mazur 2011-09-28 16:25:53 EDT
Problems still persist.
Comment 22 Douglas Schilling Landgraf 2011-09-28 16:35:21 EDT
Marek,

      Can you please share with us the details? 
Like, have you upgraded the dnsmasq from which version? 

Comment #16 and #18 doesn't help?
https://bugzilla.redhat.com/show_bug.cgi?id=734903#c16
https://bugzilla.redhat.com/show_bug.cgi?id=734903#c18

Cheers
Douglas
Comment 23 Marek Michał Mazur 2011-09-28 16:46:43 EDT
Installed Packages
Name        : dnsmasq
Arch        : x86_64
Version     : 2.58
Release     : 2.fc16
Size        : 335 k
Repo        : installed
From repo   : updates-testing

causes:


Nie można uruchomić sieci wirtualnej "default": wewnętrzny błąd Child process (/usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/default.pid --conf-file= --except-interface lo --listen-address 192.168.122.1 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override) status unexpected: exit status 2

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/uihelpers.py", line 631, in validate_network
    virnet.create()
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1563, in create
    if ret == -1: raise libvirtError ('virNetworkCreate() failed', net=self)
libvirtError: wewnętrzny błąd Child process (/usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/default.pid --conf-file= --except-interface lo --listen-address 192.168.122.1 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override) status unexpected: exit status 2
Comment 24 Adam Williamson 2011-09-29 05:41:28 EDT
marek: again, it will not be magically fixed if you had one of the 'bad' versions installed. once you get in that situation, you need to fix it manually. either do what marcelo did in comment #18, or simply do 'systemctl stop dnsmasq.service' then 'systemctl disable dnsmasq.service'.
Comment 25 Yanko Kaneti 2011-10-05 05:29:40 EDT
dnsmasq.service has:

[Install]
WantedBy=multi-user.target

How is this fixed ?
Comment 26 Yanko Kaneti 2011-10-05 06:35:42 EDT
Apparently I was confused about the purpose of the install section. Please ignore the previous comment. Sorry for the noise.
Comment 27 Adam Williamson 2011-10-05 16:25:39 EDT
yeah, my initial comment had that wrong. I forgot that just having that stanza isn't enough for a systemd service to be enabled by default, it has to be done explicitly in package %post.

Note You need to log in before you can comment on or make changes to this bug.