RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 871201 - If libvirt is restarted after updating dnsmasq or radvd packages, a subsequent "virsh net-destroy" will fail to kill the dnsmasq/radvd processes
Summary: If libvirt is restarted after updating dnsmasq or radvd packages, a subsequen...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Laine Stump
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-10-29 22:05 UTC by Laine Stump
Modified: 2013-02-21 07:11 UTC (History)
11 users (show)

Fixed In Version: libvirt-0.10.2-7.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-02-21 07:11:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2013:0276 0 normal SHIPPED_LIVE Moderate: libvirt security, bug fix, and enhancement update 2013-02-20 21:18:26 UTC

Description Laine Stump 2012-10-29 22:05:27 UTC
This behavior has been present "for a very long time" but became apparent recently when the dnsmasq package was updated.

The problem is that when libvirtd restarts, it re-reads the dnsmasq and radvd pidfiles, then does a sanity check on the pid it finds, including checking that the symbolic link in /proc/$pid/exe actually points to the same file as the path used by libvirt to execute the binary in the first place. If this fails, libvirt assumes that the process is no longer alive. But if the original binary has been replaced, the link in /proc is set to "$binarypath (deleted)" (it literally has the string " (deleted)" appended to it), so even if a new binary exists in the same location, attempts to resolve the link will fail.

In the end, not only is the old dnsmasq/radvd not terminated when the network is stopped, but a new dnsmasq can't be started when the network is being restarted.

libvirt needs to be more intelligent about determining the validity of the pid in the processes' pid files.

Comment 1 Laine Stump 2012-10-30 18:36:15 UTC
Fix pushed upstream:

commit 7bafe009d93f8b26330d52dc3289643699cf74f0
Author: Laine Stump <laine>
Date:   Mon Oct 29 18:05:41 2012 -0400

    util: do a better job of matching up pids with their binaries

Comment 4 yanbing du 2012-11-06 07:40:33 UTC
Hi Laine,
  It still fail with libvirt-0.10.2-7.el6.x86_64.

My steps
1. Upgrate dnsmasq from dnsmasq-2.48-6 to dnsmasq-2.48-7.
# rpm -Uvh dnsmasq-2.48-7.el6.x86_64.rpm 
Preparing...                ########################################### [100%]
   1:dnsmasq                ########################################### [100%]

2. Restart libvirtd
# service libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]

3. Destroy and start a network
# virsh net-list 
Name                 State      Autostart     Persistent
--------------------------------------------------
default              active     no            yes
net-1                active     no            yes

# virsh net-destroy default
Network default destroyed

# virsh net-start default
error: Failed to start network default
error: internal error Child process (/usr/sbin/dnsmasq --strict-order --bind-interfaces --local=// --domain-needed --pid-file=/var/run/libvirt/network/default.pid --conf-file= --except-interface lo --listen-address 192.168.122.1 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override --dhcp-hostsfile=/var/lib/libvirt/dnsmasq/default.hostsfile --addn-hosts=/var/lib/libvirt/dnsmasq/default.addnhosts) unexpected exit status 2: 
dnsmasq: failed to bind listening socket for 192.168.122.1: Address already in use

# ps aux|grep dnsmasq
nobody   26552  0.0  0.0  12876   708 ?        S    15:29   0:00 /usr/sbin/dnsmasq -s englab.nay.redhat.com
root     27214  0.0  0.0 103244   832 pts/3    S+   15:36   0:00 grep dnsmasq

If kill the only one dnsmasq process(26552) and the 'default' network can start. And then desrtoy&start the 'net-1' network has no problem.

Comment 5 Laine Stump 2012-11-07 20:11:56 UTC
I just tested it with a local build of libvirt-0.10.2-7 and it worked properly Can you perform this test for me?

1) reboot the system so everything is fresh

2) "ps -AlF | grep dnsmasq" to learn the pids of dnsmasq processes.

3) pick one of those pids ($pid) and run "ls -l /proc/$pid/exe"

4) copy the dnsmasq binary somewhere else, delete the original, then copy it back
   (to simulate the binary being replaced:

     cp -a /usr/sbin/dnsmasq /tmp
     rm /usr/sbin/dnsmasq
     cp -a /tmp/dnsmasq /usr/sbin

4) again look at the link in /proc: "ls -l /proc/$pid/exe"

I'm interested if that link will show

    /proc/10577/exe -> /usr/sbin/dnsmasq (deleted)

or something else. (I had assumed that the same "(deleted)" string was always added to the link, but possibly it adds something different in different locales?)

Comment 6 yanbing du 2012-11-08 03:14:50 UTC
(In reply to comment #5)
> I just tested it with a local build of libvirt-0.10.2-7 and it worked
> properly Can you perform this test for me?
> 
> 1) reboot the system so everything is fresh
> 
> 2) "ps -AlF | grep dnsmasq" to learn the pids of dnsmasq processes.
> 
> 3) pick one of those pids ($pid) and run "ls -l /proc/$pid/exe"
> 
> 4) copy the dnsmasq binary somewhere else, delete the original, then copy it
> back
>    (to simulate the binary being replaced:
> 
>      cp -a /usr/sbin/dnsmasq /tmp
>      rm /usr/sbin/dnsmasq
>      cp -a /tmp/dnsmasq /usr/sbin
> 
> 4) again look at the link in /proc: "ls -l /proc/$pid/exe"
> 
> I'm interested if that link will show
> 
>     /proc/10577/exe -> /usr/sbin/dnsmasq (deleted)
> 
> or something else. (I had assumed that the same "(deleted)" string was
> always added to the link, but possibly it adds something different in
> different locales?)

Yes, you are right, after step4:
# ls -l /proc/2704/exe
lrwxrwxrwx. 1 root root 0 Nov  8 11:02 /proc/2704/exe -> /usr/sbin/dnsmasq (deleted)

Then, restart libvirtd, and destroy/start the 'default' network, it works properly.
But if i upgrade the dnsmasq rpm package, the result as comment 4 described.
# ps -AlF | grep dnsmasq
5 S nobody    3001     1  0  80   0 -  3220 poll_s   720   2 11:06 ?        00:00:00 /usr/sbin/dnsmasq --strict-order --bind-interfaces --local=// --domain-needed --pid-file=/var/run/libvirt/network/default.pid --conf-file= --except-interface lo --listen-address 192.168.122.1 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override --dhcp-hostsfile=/var/lib/libvirt/dnsmasq/default.hostsfile --addn-hosts=/var/lib/libvirt/dnsmasq/default.addnhosts
0 S root      3031  2424  0  80   0 - 25811 pipe_w   828   0 11:07 pts/1    00:00:00 grep dnsmasq

# ls -l /proc/3001/exe
lrwxrwxrwx. 1 root root 0 Nov  8 11:08 /proc/3001/exe -> /usr/sbin/dnsmasq

# rpm -Uvh dnsmasq-2.48-7.el6.x86_64.rpm 
Preparing...                ########################################### [100%]
   1:dnsmasq                ########################################### [100%]

# ps -AlF | grep dnsmasq
5 S nobody    3074     1  0  80   0 -  3219 poll_s   704   1 11:08 ?        00:00:00 /usr/sbin/dnsmasq -s englab.nay.redhat.com
4 S root      3079  2424  0  80   0 - 25811 pipe_w   832   1 11:08 pts/1    00:00:00 grep dnsmasq

# ls -l /proc/3074/exe
lrwxrwxrwx. 1 root root 0 Nov  8 11:13 /proc/3074/exe -> /usr/sbin/dnsmasq

Comment 7 Laine Stump 2012-11-13 18:58:57 UTC
Sigh.

Yes, I see that now on my RHEL6 box as well. This is a problem with the packaging of dnsmasq - while upgrading the package, the initscript makes the following mistakes:

1) it terminates *all* dnsmasq processes, even those not started by the dnsmasq service

2) It starts the dnsmasq with a default config, and the default config prevents anyone else from running dnsmasq, including libvirt.

So, this is a completely different problem, with the same cause as one I reported against RHEL a few months ago in Bug 850944. (During my search, I found a much older bug for the same problem, filed against (and fixed in) Fedora 11/12: Bug 547605).

I've updated Bug 850944 to point out this newly discovered problem and describe it in detail, and also written and successfully tested a patch against RHEL6 dnsmasq which I attached to that BZ. At this point, the libvirt code written in response to the current BZ is complete and correct (it can be tested by manually replacing /usr/sbin/dnsmasq as I outlined above), and the other problem with dnsmasq should be tracked in Bug 850944.

Comment 8 yanbing du 2012-11-14 03:17:08 UTC
(In reply to comment #7)
> Sigh.
> 
> Yes, I see that now on my RHEL6 box as well. This is a problem with the
> packaging of dnsmasq - while upgrading the package, the initscript makes the
> following mistakes:
> 
> 1) it terminates *all* dnsmasq processes, even those not started by the
> dnsmasq service
> 
> 2) It starts the dnsmasq with a default config, and the default config
> prevents anyone else from running dnsmasq, including libvirt.
> 
> So, this is a completely different problem, with the same cause as one I
> reported against RHEL a few months ago in Bug 850944. (During my search, I
> found a much older bug for the same problem, filed against (and fixed in)
> Fedora 11/12: Bug 547605).
> 
> I've updated Bug 850944 to point out this newly discovered problem and
> describe it in detail, and also written and successfully tested a patch
> against RHEL6 dnsmasq which I attached to that BZ. At this point, the
> libvirt code written in response to the current BZ is complete and correct
> (it can be tested by manually replacing /usr/sbin/dnsmasq as I outlined
> above), and the other problem with dnsmasq should be tracked in Bug 850944.

Thanks Laine, for your detailed explanation.
As the dnsmasq problem will be tracked in bug 850944, we can move this bug to VERIFIED.

Comment 9 errata-xmlrpc 2013-02-21 07:11:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html


Note You need to log in before you can comment on or make changes to this bug.