Bug 673588
Summary: | libvirt can deadlock from double-closing qemu monitors | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Eric Blake <eblake> |
Component: | libvirt | Assignee: | Eric Blake <eblake> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.1 | CC: | dallan, dyuan, eblake, jdenemar, kxiong, mjenner, vbian, xen-maint |
Target Milestone: | rc | ||
Target Release: | 6.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-0.8.7-7.el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-05-19 13:26:36 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Eric Blake
2011-01-28 20:23:41 UTC
Patches posted upstream: http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-January/msg01517.html Not just deadlock, but also crash: Using these steps from Wen Congyang: 1. use gdb to debug libvirtd, and set breakpoint in the function qemuConnectMonitor() 2. start a vm, and the libvirtd will be stopped in qemuConnectMonitor() 3. kill -STOP $(cat /var/run/libvirt/qemu/<domain>.pid) 4. continue to run libvirtd in gdb, and libvirtd will be blocked in the function qemuMonitorSetCapabilities() 5. kill -9 $(cat /var/run/libvirt/qemu/<domain>.pid) 6. continue to run libvirtd in gdb I saw libvirt crash: 11:12:44.882: 17952: error : qemuRemoveCgroup:335 : internal error Unable to find cgroup for windows_2008-32 11:12:44.882: 17952: warning : qemudShutdownVMDaemon:3109 : Failed to remove cgroup for windows_2008-32 Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff0aaf700 (LWP 17950)] 0x0000003021675705 in malloc_consolidate () from /lib64/libc.so.6 (gdb) bt #0 0x0000003021675705 in malloc_consolidate () from /lib64/libc.so.6 #1 0x0000003021677f38 in _int_free () from /lib64/libc.so.6 #2 0x00007ffff79e2d73 in virFree (ptrptr=0x7ffff0aae7a0) at util/memory.c:311 #3 0x000000000041dc75 in qemudClientMessageRelease (client=0x7fffec0012f0, msg=0x7fffe0014e10) at libvirtd.c:2065 #4 0x000000000041dd16 in qemudDispatchClientWrite (client=0x7fffec0012f0) at libvirtd.c:2095 #5 0x000000000041dfbe in qemudDispatchClientEvent (watch=8, fd=18, events=2, opaque=0x6fadb0) at libvirtd.c:2165 #6 0x00000000004189ee in virEventDispatchHandles (nfds=7, fds=0x7fffec0011b0) at event.c:467 #7 0x0000000000419082 in virEventRunOnce () at event.c:599 #8 0x000000000041e1c1 in qemudOneLoop () at libvirtd.c:2265 The third upstream patch has been reposted, and needs to be ACK'd and resubmitted to rhel: https://www.redhat.com/archives/libvir-list/2011-February/msg00074.html In POST; two patches per comment 1 and one more patch at: http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-February/msg00372.html Back in POST, since 0.8.7-5.el6 is incomplete: http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-February/msg00963.html checked with libvirt-0.8.7-4.el6.x86_64.rpm --- reproducer libvirt-0.8.7-7.el6.x86_64.rpm --- verification from one terminal: 1. virsh start <domain> 2. kill -STOP $(cat /var/run/libvirt/qemu/<domain>.pid) 3. kill -9 $(cat /var/run/libvirt/qemu/<domain>.pid) from the other termical 1.gdb libvirtd (gdb) b qemuConnectMonitor Breakpoint 1 at 0x434ff0: file qemu/qemu_driver.c, line 1246. (gdb) r start a vm, and the libvirtd will be stopped in qemuConnectMonitor() 2. kill -STOP $(cat /var/run/libvirt/qemu/<domain>.pid) 3. continue to run libvirtd in gdb, and libvirtd will be blocked in the function qemuMonitorSetCapabilities() (gdb) c 4. kill -9 $(cat /var/run/libvirt/qemu/<domain>.pid) [reproducer] tail -F /var/log/libvirt/qemu/domain.log eges of VM to 107:107 char device redirected to /dev/pts/2 char device redirected to /dev/pts/4 Using CPU model "cpu64-rhel6" 2011-02-21 06:41:11.178: shutting down 2011-02-21 06:41:11.318: shutting down NB: although could see the guest be shut down twice, but I didn't encounter the libvirt crash . [verification] red_worker_main: begin handle_dev_input: start 2011-02-21 06:49:36.566: shutting down on the first test, libvirtd never hung . And libvirtd never crashed on the second, the domain only be shut down once Please check whether the steps are correct , if so , will set the bug status to VERIFIED . if not , will retest it to meet the exact request . (In reply to comment #7) > checked with > libvirt-0.8.7-4.el6.x86_64.rpm --- reproducer > libvirt-0.8.7-7.el6.x86_64.rpm --- verification > > 3. continue to run libvirtd in gdb, and libvirtd will be blocked in the > function qemuMonitorSetCapabilities() > (gdb) c > 4. kill -9 $(cat /var/run/libvirt/qemu/<domain>.pid) I believe the reason I saw libvirt crash in my testing after killing the qemu pid is that I also had virt-manager running at the same time, so that there was multi-threaded interactions competing for status about the domain. Also, the crash is dependent on race conditions, so I'm not sure how reliably it can be created. But even if you don't see the crash: > [reproducer] > tail -F /var/log/libvirt/qemu/domain.log > eges of VM to 107:107 > char device redirected to /dev/pts/2 > char device redirected to /dev/pts/4 > Using CPU model "cpu64-rhel6" > 2011-02-21 06:41:11.178: shutting down > 2011-02-21 06:41:11.318: shutting down This is a valid detection of the bug, > [verification] > red_worker_main: begin > handle_dev_input: start > 2011-02-21 06:49:36.566: shutting down > and this is a valid verification that the bug was fixed. > Please check whether the steps are correct , if so , will set the bug status to > VERIFIED . if not , will retest it to meet the exact request . I'm satisfied with moving the bug to VERIFIED, even if you can't reproduce the crash; even if it would be nicer to get the crash reproducer as well (I tried again today, but failed to reproduce things; then again, when I first reproduced the crash, it was while using upstream libvirt.git, and there may have been other interactions in upstream that are not present on any RHEL build that affect the likelihood of a crash). An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0596.html |