Bug 871201
| Summary: | If libvirt is restarted after updating dnsmasq or radvd packages, a subsequent "virsh net-destroy" will fail to kill the dnsmasq/radvd processes | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Laine Stump <laine> |
| Component: | libvirt | Assignee: | Laine Stump <laine> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.3 | CC: | acathrow, dallan, dyasny, dyuan, mzhan, rwu, thozza, weizhan, whuang, ydu, zhpeng |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-0.10.2-7.el6 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-02-21 07:11:15 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Laine Stump
2012-10-29 22:05:27 UTC
Fix pushed upstream:
commit 7bafe009d93f8b26330d52dc3289643699cf74f0
Author: Laine Stump <laine>
Date: Mon Oct 29 18:05:41 2012 -0400
util: do a better job of matching up pids with their binaries
Hi Laine, It still fail with libvirt-0.10.2-7.el6.x86_64. My steps 1. Upgrate dnsmasq from dnsmasq-2.48-6 to dnsmasq-2.48-7. # rpm -Uvh dnsmasq-2.48-7.el6.x86_64.rpm Preparing... ########################################### [100%] 1:dnsmasq ########################################### [100%] 2. Restart libvirtd # service libvirtd restart Stopping libvirtd daemon: [ OK ] Starting libvirtd daemon: [ OK ] 3. Destroy and start a network # virsh net-list Name State Autostart Persistent -------------------------------------------------- default active no yes net-1 active no yes # virsh net-destroy default Network default destroyed # virsh net-start default error: Failed to start network default error: internal error Child process (/usr/sbin/dnsmasq --strict-order --bind-interfaces --local=// --domain-needed --pid-file=/var/run/libvirt/network/default.pid --conf-file= --except-interface lo --listen-address 192.168.122.1 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override --dhcp-hostsfile=/var/lib/libvirt/dnsmasq/default.hostsfile --addn-hosts=/var/lib/libvirt/dnsmasq/default.addnhosts) unexpected exit status 2: dnsmasq: failed to bind listening socket for 192.168.122.1: Address already in use # ps aux|grep dnsmasq nobody 26552 0.0 0.0 12876 708 ? S 15:29 0:00 /usr/sbin/dnsmasq -s englab.nay.redhat.com root 27214 0.0 0.0 103244 832 pts/3 S+ 15:36 0:00 grep dnsmasq If kill the only one dnsmasq process(26552) and the 'default' network can start. And then desrtoy&start the 'net-1' network has no problem. I just tested it with a local build of libvirt-0.10.2-7 and it worked properly Can you perform this test for me?
1) reboot the system so everything is fresh
2) "ps -AlF | grep dnsmasq" to learn the pids of dnsmasq processes.
3) pick one of those pids ($pid) and run "ls -l /proc/$pid/exe"
4) copy the dnsmasq binary somewhere else, delete the original, then copy it back
(to simulate the binary being replaced:
cp -a /usr/sbin/dnsmasq /tmp
rm /usr/sbin/dnsmasq
cp -a /tmp/dnsmasq /usr/sbin
4) again look at the link in /proc: "ls -l /proc/$pid/exe"
I'm interested if that link will show
/proc/10577/exe -> /usr/sbin/dnsmasq (deleted)
or something else. (I had assumed that the same "(deleted)" string was always added to the link, but possibly it adds something different in different locales?)
(In reply to comment #5) > I just tested it with a local build of libvirt-0.10.2-7 and it worked > properly Can you perform this test for me? > > 1) reboot the system so everything is fresh > > 2) "ps -AlF | grep dnsmasq" to learn the pids of dnsmasq processes. > > 3) pick one of those pids ($pid) and run "ls -l /proc/$pid/exe" > > 4) copy the dnsmasq binary somewhere else, delete the original, then copy it > back > (to simulate the binary being replaced: > > cp -a /usr/sbin/dnsmasq /tmp > rm /usr/sbin/dnsmasq > cp -a /tmp/dnsmasq /usr/sbin > > 4) again look at the link in /proc: "ls -l /proc/$pid/exe" > > I'm interested if that link will show > > /proc/10577/exe -> /usr/sbin/dnsmasq (deleted) > > or something else. (I had assumed that the same "(deleted)" string was > always added to the link, but possibly it adds something different in > different locales?) Yes, you are right, after step4: # ls -l /proc/2704/exe lrwxrwxrwx. 1 root root 0 Nov 8 11:02 /proc/2704/exe -> /usr/sbin/dnsmasq (deleted) Then, restart libvirtd, and destroy/start the 'default' network, it works properly. But if i upgrade the dnsmasq rpm package, the result as comment 4 described. # ps -AlF | grep dnsmasq 5 S nobody 3001 1 0 80 0 - 3220 poll_s 720 2 11:06 ? 00:00:00 /usr/sbin/dnsmasq --strict-order --bind-interfaces --local=// --domain-needed --pid-file=/var/run/libvirt/network/default.pid --conf-file= --except-interface lo --listen-address 192.168.122.1 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override --dhcp-hostsfile=/var/lib/libvirt/dnsmasq/default.hostsfile --addn-hosts=/var/lib/libvirt/dnsmasq/default.addnhosts 0 S root 3031 2424 0 80 0 - 25811 pipe_w 828 0 11:07 pts/1 00:00:00 grep dnsmasq # ls -l /proc/3001/exe lrwxrwxrwx. 1 root root 0 Nov 8 11:08 /proc/3001/exe -> /usr/sbin/dnsmasq # rpm -Uvh dnsmasq-2.48-7.el6.x86_64.rpm Preparing... ########################################### [100%] 1:dnsmasq ########################################### [100%] # ps -AlF | grep dnsmasq 5 S nobody 3074 1 0 80 0 - 3219 poll_s 704 1 11:08 ? 00:00:00 /usr/sbin/dnsmasq -s englab.nay.redhat.com 4 S root 3079 2424 0 80 0 - 25811 pipe_w 832 1 11:08 pts/1 00:00:00 grep dnsmasq # ls -l /proc/3074/exe lrwxrwxrwx. 1 root root 0 Nov 8 11:13 /proc/3074/exe -> /usr/sbin/dnsmasq Sigh. Yes, I see that now on my RHEL6 box as well. This is a problem with the packaging of dnsmasq - while upgrading the package, the initscript makes the following mistakes: 1) it terminates *all* dnsmasq processes, even those not started by the dnsmasq service 2) It starts the dnsmasq with a default config, and the default config prevents anyone else from running dnsmasq, including libvirt. So, this is a completely different problem, with the same cause as one I reported against RHEL a few months ago in Bug 850944. (During my search, I found a much older bug for the same problem, filed against (and fixed in) Fedora 11/12: Bug 547605). I've updated Bug 850944 to point out this newly discovered problem and describe it in detail, and also written and successfully tested a patch against RHEL6 dnsmasq which I attached to that BZ. At this point, the libvirt code written in response to the current BZ is complete and correct (it can be tested by manually replacing /usr/sbin/dnsmasq as I outlined above), and the other problem with dnsmasq should be tracked in Bug 850944. (In reply to comment #7) > Sigh. > > Yes, I see that now on my RHEL6 box as well. This is a problem with the > packaging of dnsmasq - while upgrading the package, the initscript makes the > following mistakes: > > 1) it terminates *all* dnsmasq processes, even those not started by the > dnsmasq service > > 2) It starts the dnsmasq with a default config, and the default config > prevents anyone else from running dnsmasq, including libvirt. > > So, this is a completely different problem, with the same cause as one I > reported against RHEL a few months ago in Bug 850944. (During my search, I > found a much older bug for the same problem, filed against (and fixed in) > Fedora 11/12: Bug 547605). > > I've updated Bug 850944 to point out this newly discovered problem and > describe it in detail, and also written and successfully tested a patch > against RHEL6 dnsmasq which I attached to that BZ. At this point, the > libvirt code written in response to the current BZ is complete and correct > (it can be tested by manually replacing /usr/sbin/dnsmasq as I outlined > above), and the other problem with dnsmasq should be tracked in Bug 850944. Thanks Laine, for your detailed explanation. As the dnsmasq problem will be tracked in bug 850944, we can move this bug to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0276.html |