Bug 924756
| Summary: | libvirtd SIGABRT when shutting down a guest | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Julio Entrena Perez <jentrena> |
| Component: | libvirt | Assignee: | Eric Blake <eblake> |
| Status: | CLOSED DUPLICATE | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.4 | CC: | acathrow, ajia, dallan, dyasny, dyuan, eblake, jentrena, lyarwood, mzhan, pzhukov, rwu, whuang, ydu, zhwang |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-05-15 15:00:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 835616, 896690, 960054 | ||
Hi,Julio Could you provide some clear steps to reproduce this issue ? It is hard for me to reproduce it without steps Thanks very much Wenlong (In reply to comment #4) > Hi,Julio > > Could you provide some clear steps to reproduce this issue ? > It is hard for me to reproduce it without steps > Thanks very much > > Wenlong Not really: we're not sure how to trigger the condition. I wonder if this upstream patch has any relation: https://www.redhat.com/archives/libvir-list/2013-March/msg01489.html libvirt-0.10.2-18.el6_4.3.x86_64 vdsm-4.10.2-13.0.el6ev.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6_4.2.x86_64 I use one host with 118 vms the host load average: 100.33, 92.45, 48.45 it can be reproduced this issue , start and shutoff vms via rhevm , libvirtd is still running . Wenlong (In reply to comment #11) > libvirt-0.10.2-18.el6_4.3.x86_64 > vdsm-4.10.2-13.0.el6ev.x86_64 > qemu-kvm-rhev-0.12.1.2-2.355.el6_4.2.x86_64 > > I use one host with 118 vms the host load average: 100.33, 92.45, 48.45 > it can be reproduced this issue , start and shutoff vms via rhevm , > libvirtd is still running . > > > Wenlong Sorry ! I miss a NOT , I can not reproduc this issue . Do you have MALLOC_PERTURB_ set in the environment? If not, can you set it to a non-zero value, which will help glibc detect heap smashing bugs closer to the point at which they happen? bug 919057 describes what sounds to be a similar case of heap corruption triggered by a domain shutdown was the domain being shut down transient or persistent? This commit mentions a crash possible for transient domains, but seems to focus on auto-destroy guests (those that go away when the virConnectPtr is closed) and might not be related to the setup you were using
commit 7ccad0b16d12d7616c7c21b1359f6a55a9677521
Author: Daniel P. Berrange <berrange>
Date: Thu Feb 28 12:18:48 2013 +0000
Fix crash in QEMU auto-destroy with transient guests
When the auto-destroy callback runs it is supposed to return
NULL if the virDomainObjPtr is no longer valid. It was not
doing this for transient guests, so we tried to virObjectUnlock
a mutex which had been freed. This often led to a crash.
Signed-off-by: Daniel P. Berrange <berrange>
(In reply to comment #15) > was the domain being shut down transient or persistent? VDSM (calling libvirtd here) only creates transient domains. (In reply to comment #16) > This commit mentions a crash possible for transient domains, but seems to > focus on auto-destroy guests (those that go away when the virConnectPtr is > closed) and might not be related to the setup you were using > > commit 7ccad0b16d12d7616c7c21b1359f6a55a9677521 > Author: Daniel P. Berrange <berrange> > Date: Thu Feb 28 12:18:48 2013 +0000 > > Fix crash in QEMU auto-destroy with transient guests > > When the auto-destroy callback runs it is supposed to return > NULL if the virDomainObjPtr is no longer valid. It was not > doing this for transient guests, so we tried to virObjectUnlock > a mutex which had been freed. This often led to a crash. > > Signed-off-by: Daniel P. Berrange <berrange> I'm not entirely sure how to configure auto destroy. Would the domain need to start with the VIR_DOMAIN_START_AUTODESTROY flag? AFAICT VDSM doesn't set this. Lee Hmm - another upstream message about a race still present (and THIS one sounds more like what we are hitting with guest shutdown): https://www.redhat.com/archives/libvir-list/2013-April/msg00625.html bug 915353 describes a crash on shutdown; it was fixed for libvirt-0.10.2-18.el6_4.1 - I'm starting to think that this particular fix is the one that solves the problem at hand. *** This bug has been marked as a duplicate of bug 915353 *** |
Description of problem: libvirtd crashed around the same time a guest was shutdown from RHEV-M. Core was generated by `/usr/sbin/libvirtd --listen'. Program terminated with signal 6, Aborted. #0 0x00007f92202408a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007f92202408a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f9220242085 in abort () at abort.c:92 #2 0x00007f922027e7b7 in __libc_message (do_abort=2, fmt=0x7f9220365f80 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:198 #3 0x00007f92202840e6 in malloc_printerr (action=3, str=0x7f9220363f3e "corrupted double-linked list", ptr=<value optimized out>) at malloc.c:6311 #4 0x00007f92202844f0 in malloc_consolidate (av=0x7f91dc000020) at malloc.c:5181 #5 0x00007f9220286ba8 in _int_free (av=0x7f91dc000020, p=0x7f91dc126b30, have_lock=0) at malloc.c:5054 #6 0x00007f92227114b9 in virFree (ptrptr=0x7f91dc126850) at util/memory.c:309 #7 0x00007f92227295dd in virHashFree (table=0x7f91dc126850) at util/virhash.c:265 #8 0x00007f922275fd86 in virDomainSnapshotObjListFree (snapshots=0x7f91dc10f830) at conf/snapshot_conf.c:724 #9 0x00007f9222723dbb in virObjectUnref (anyobj=<value optimized out>) at util/virobject.c:139 #10 0x0000000000489ad2 in qemuMonitorDispose (obj=<value optimized out>) at qemu/qemu_monitor.c:248 #11 0x00007f9222723dbb in virObjectUnref (anyobj=<value optimized out>) at util/virobject.c:139 #12 0x00007f92227095a8 in virEventPollCleanupHandles () at util/event_poll.c:567 #13 0x00007f9222709c90 in virEventPollRunOnce () at util/event_poll.c:636 #14 0x00007f9222708b67 in virEventRunDefaultImpl () at util/event.c:247 #15 0x00007f92227f863d in virNetServerRun (srv=0x1d7cf60) at rpc/virnetserver.c:748 #16 0x00000000004235b7 in main (argc=<value optimized out>, argv=<value optimized out>) at libvirtd.c:1228 Version-Release number of selected component (if applicable): libvirtd 0.10.2-18.el6 vdsm 4.10.2-1.6.el6 qemu-kvm 0.12.1.2-2.355.el6_4.1 qemu-img 0.12.1.2-2.355.el6_4.1 spice-server 0.12.0-12.el6 How reproducible: Unknown. Steps to Reproduce: 1. A guest was shutdown from RHEV-M: $ cat var/log/libvirt/qemu/i-web001.log | grep -B4 08\:34 inputs_connect: inputs channel client create qemu: terminating on signal 15 from pid 12005 red_channel_client_disconnect: 0x7fdf7c298f80 (channel 0x7fdf7c21d670 type 4 id 0) red_channel_client_disconnect: 0x7fdf7c2aa200 (channel 0x7fdf7c21d0b0 type 2 id 0) 2013-03-20 08:34:34.849+0000: shutting down 2. libvirtd crashed by SIGABRT: $ xzgrep error var/log/libvirtd.log.3.xz | grep -v debug 2013-03-20 08:34:34.848+0000: 12005: error : qemuMonitorIORead:513 : Unable to read from monitor: Connection reset by peer 2013-03-20 08:34:34.848+0000: 12005: error : qemuAgentIO:642 : internal error End of file from monitor 2013-03-20 08:34:34.850+0000: 12005: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet140" not in key map 2013-03-20 08:34:34.852+0000: 12005: error : virNetDevGetIndex:653 : Unable to get index for interface vnet140: No such device 2013-03-20 08:34:34.982+0000: 12005: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet141" not in key map 2013-03-20 08:34:34.984+0000: 12005: error : virNetDevGetIndex:653 : Unable to get index for interface vnet141: No such device 2013-03-20 08:34:34.848+000012005: error : qemuMonitorIORead:513 : Unable to read from monitor: Connection reset by peer 2013-03-20 08:34:34.848+000012005: error : qemuAgentIO:642 : internal error End of file from monitor 2013-03-20 08:34:34.850+000012005: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet140" not in key map 2013-03-20 08:34:34.852+000012005: error : virNetDevGetIndex:653 : Unable to get index for interface vnet140: No such device 2013-03-20 08:34:34.982+000012005: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet141" not in key map 2013-03-20 08:34:34.984+000012005: error : virNetDevGetIndex:653 : Unable to get index for interface vnet141: No such device /var/log/libvirtd.log.3.xz : 2013-03-20 08:34:35.362+0000: 12005: debug : virCgroupRemoveRecursively:727 : Removing cgroup /cgroup /freezer/libvirt/qemu/i-web001/ 2013-03-20 08:34:35.378+0000: 12005: debug : virCgroupRemove:772 : Removing cgroup /cgroup/blkio/libv irt/qemu/i-web001/ and all child cgroups 2013-03-20 08:34:35.378+0000: 12005: debug : virCgroupRemoveRecursively:727 : Removing cgroup /cgroup /blkio/libvirt/qemu/i-web001/ 2013-03-20 08:34:35.394+0000: 12005: debug : virObjectUnref:135 : OBJECT_UNREF: obj=0x7f91dc111420 2013-03-20 08:34:35.394+0000: 12005: debug : virObjectUnref:137 : OBJECT_DISPOSE: obj=0x7f91dc111420 2013-03-20 08:34:35.394+0000: 12005: debug : virObjectUnref:135 : OBJECT_UNREF: obj=0x7f91dc115bd0 2013-03-20 08:34:35.395+0000: 12005: debug : virObjectUnref:135 : OBJECT_UNREF: obj=0x7f91dc112c00 2013-03-20 08:34:35.395+0000: 12005: debug : virObjectUnref:137 : OBJECT_DISPOSE: obj=0x7f91dc112c00 2013-03-20 08:34:35.395+0000: 12005: debug : qemuMonitorDispose:246 : mon=0x7f91dc112c00 2013-03-20 08:34:35.395+0000: 12005: debug : virObjectUnref:135 : OBJECT_UNREF: obj=0x7f91dc115bd0 2013-03-20 08:34:35.395+0000: 12005: debug : virObjectUnref:137 : OBJECT_DISPOSE: obj=0x7f91dc115bd0 Caught abort signal dumping internal log buffer: Actual results: Expected results: Additional info: