Bug 924756

Summary: libvirtd SIGABRT when shutting down a guest
Product: Red Hat Enterprise Linux 6 Reporter: Julio Entrena Perez <jentrena>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.4CC: acathrow, ajia, dallan, dyasny, dyuan, eblake, jentrena, lyarwood, mzhan, pzhukov, rwu, whuang, ydu, zhwang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-05-15 15:00:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 896690, 835616, 960054    

Description Julio Entrena Perez 2013-03-22 12:57:22 UTC
Description of problem:
libvirtd crashed around the same time a guest was shutdown from RHEV-M.

Core was generated by `/usr/sbin/libvirtd --listen'.
Program terminated with signal 6, Aborted.
#0  0x00007f92202408a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007f92202408a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f9220242085 in abort () at abort.c:92
#2  0x00007f922027e7b7 in __libc_message (do_abort=2, 
    fmt=0x7f9220365f80 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:198
#3  0x00007f92202840e6 in malloc_printerr (action=3, str=0x7f9220363f3e "corrupted double-linked list", 
    ptr=<value optimized out>) at malloc.c:6311
#4  0x00007f92202844f0 in malloc_consolidate (av=0x7f91dc000020) at malloc.c:5181
#5  0x00007f9220286ba8 in _int_free (av=0x7f91dc000020, p=0x7f91dc126b30, have_lock=0) at malloc.c:5054
#6  0x00007f92227114b9 in virFree (ptrptr=0x7f91dc126850) at util/memory.c:309
#7  0x00007f92227295dd in virHashFree (table=0x7f91dc126850) at util/virhash.c:265
#8  0x00007f922275fd86 in virDomainSnapshotObjListFree (snapshots=0x7f91dc10f830) at conf/snapshot_conf.c:724
#9  0x00007f9222723dbb in virObjectUnref (anyobj=<value optimized out>) at util/virobject.c:139
#10 0x0000000000489ad2 in qemuMonitorDispose (obj=<value optimized out>) at qemu/qemu_monitor.c:248
#11 0x00007f9222723dbb in virObjectUnref (anyobj=<value optimized out>) at util/virobject.c:139
#12 0x00007f92227095a8 in virEventPollCleanupHandles () at util/event_poll.c:567
#13 0x00007f9222709c90 in virEventPollRunOnce () at util/event_poll.c:636
#14 0x00007f9222708b67 in virEventRunDefaultImpl () at util/event.c:247
#15 0x00007f92227f863d in virNetServerRun (srv=0x1d7cf60) at rpc/virnetserver.c:748
#16 0x00000000004235b7 in main (argc=<value optimized out>, argv=<value optimized out>) at libvirtd.c:1228

Version-Release number of selected component (if applicable):
libvirtd 0.10.2-18.el6
vdsm 4.10.2-1.6.el6
qemu-kvm 0.12.1.2-2.355.el6_4.1
qemu-img 0.12.1.2-2.355.el6_4.1
spice-server 0.12.0-12.el6

How reproducible:
Unknown.

Steps to Reproduce:
1. A guest was shutdown from RHEV-M:

$ cat var/log/libvirt/qemu/i-web001.log | grep -B4 08\:34
inputs_connect: inputs channel client create
qemu: terminating on signal 15 from pid 12005
red_channel_client_disconnect: 0x7fdf7c298f80 (channel 0x7fdf7c21d670 type 4 id 0)
red_channel_client_disconnect: 0x7fdf7c2aa200 (channel 0x7fdf7c21d0b0 type 2 id 0)
2013-03-20 08:34:34.849+0000: shutting down

2. libvirtd crashed by SIGABRT:

$ xzgrep error var/log/libvirtd.log.3.xz  | grep -v debug
2013-03-20 08:34:34.848+0000: 12005: error : qemuMonitorIORead:513 : Unable to read from monitor: Connection reset by peer
2013-03-20 08:34:34.848+0000: 12005: error : qemuAgentIO:642 : internal error End of file from monitor
2013-03-20 08:34:34.850+0000: 12005: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet140" not in key map
2013-03-20 08:34:34.852+0000: 12005: error : virNetDevGetIndex:653 : Unable to get index for interface vnet140: No such device
2013-03-20 08:34:34.982+0000: 12005: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet141" not in key map
2013-03-20 08:34:34.984+0000: 12005: error : virNetDevGetIndex:653 : Unable to get index for interface vnet141: No such device
2013-03-20 08:34:34.848+000012005: error : qemuMonitorIORead:513 : Unable to read from monitor: Connection reset by peer
2013-03-20 08:34:34.848+000012005: error : qemuAgentIO:642 : internal error End of file from monitor
2013-03-20 08:34:34.850+000012005: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet140" not in key map
2013-03-20 08:34:34.852+000012005: error : virNetDevGetIndex:653 : Unable to get index for interface vnet140: No such device
2013-03-20 08:34:34.982+000012005: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet141" not in key map
2013-03-20 08:34:34.984+000012005: error : virNetDevGetIndex:653 : Unable to get index for interface vnet141: No such device

/var/log/libvirtd.log.3.xz :
2013-03-20 08:34:35.362+0000: 12005: debug : virCgroupRemoveRecursively:727 : Removing cgroup /cgroup
/freezer/libvirt/qemu/i-web001/
2013-03-20 08:34:35.378+0000: 12005: debug : virCgroupRemove:772 : Removing cgroup /cgroup/blkio/libv
irt/qemu/i-web001/ and all child cgroups
2013-03-20 08:34:35.378+0000: 12005: debug : virCgroupRemoveRecursively:727 : Removing cgroup /cgroup
/blkio/libvirt/qemu/i-web001/
2013-03-20 08:34:35.394+0000: 12005: debug : virObjectUnref:135 : OBJECT_UNREF: obj=0x7f91dc111420
2013-03-20 08:34:35.394+0000: 12005: debug : virObjectUnref:137 : OBJECT_DISPOSE: obj=0x7f91dc111420
2013-03-20 08:34:35.394+0000: 12005: debug : virObjectUnref:135 : OBJECT_UNREF: obj=0x7f91dc115bd0
2013-03-20 08:34:35.395+0000: 12005: debug : virObjectUnref:135 : OBJECT_UNREF: obj=0x7f91dc112c00
2013-03-20 08:34:35.395+0000: 12005: debug : virObjectUnref:137 : OBJECT_DISPOSE: obj=0x7f91dc112c00
2013-03-20 08:34:35.395+0000: 12005: debug : qemuMonitorDispose:246 : mon=0x7f91dc112c00
2013-03-20 08:34:35.395+0000: 12005: debug : virObjectUnref:135 : OBJECT_UNREF: obj=0x7f91dc115bd0
2013-03-20 08:34:35.395+0000: 12005: debug : virObjectUnref:137 : OBJECT_DISPOSE: obj=0x7f91dc115bd0
Caught abort signal dumping internal log buffer:
  
Actual results:


Expected results:


Additional info:

Comment 4 Huang Wenlong 2013-03-25 06:36:01 UTC
Hi,Julio

Could you provide some clear steps to reproduce this issue ? 
It is hard for me to reproduce it without steps 
Thanks very much 

Wenlong

Comment 5 Julio Entrena Perez 2013-03-25 11:58:13 UTC
(In reply to comment #4)
> Hi,Julio
> 
> Could you provide some clear steps to reproduce this issue ? 
> It is hard for me to reproduce it without steps 
> Thanks very much 
> 
> Wenlong

Not really: we're not sure how to trigger the condition.

Comment 6 Pavel Zhukov 2013-03-25 14:05:35 UTC
Duplicate https://bugzilla.redhat.com/show_bug.cgi?id=918959 ?

Comment 7 Eric Blake 2013-03-26 20:19:15 UTC
I wonder if this upstream patch has any relation:
https://www.redhat.com/archives/libvir-list/2013-March/msg01489.html

Comment 11 Huang Wenlong 2013-04-02 03:04:48 UTC
libvirt-0.10.2-18.el6_4.3.x86_64
vdsm-4.10.2-13.0.el6ev.x86_64
qemu-kvm-rhev-0.12.1.2-2.355.el6_4.2.x86_64

I use one host with 118 vms the host load average: 100.33, 92.45, 48.45 
it can be reproduced this issue , start and shutoff vms  via rhevm , libvirtd is still running .


Wenlong

Comment 12 Huang Wenlong 2013-04-02 03:26:38 UTC
(In reply to comment #11)
> libvirt-0.10.2-18.el6_4.3.x86_64
> vdsm-4.10.2-13.0.el6ev.x86_64
> qemu-kvm-rhev-0.12.1.2-2.355.el6_4.2.x86_64
> 
> I use one host with 118 vms the host load average: 100.33, 92.45, 48.45 
> it can be reproduced this issue , start and shutoff vms  via rhevm ,
> libvirtd is still running .
> 
> 
> Wenlong

Sorry !  I miss a NOT ,  

I can not reproduc this issue .

Comment 13 Eric Blake 2013-04-05 23:01:03 UTC
Do you have MALLOC_PERTURB_ set in the environment?  If not, can you set it to a non-zero value, which will help glibc detect heap smashing bugs closer to the point at which they happen?

Comment 14 Eric Blake 2013-04-05 23:03:41 UTC
bug 919057 describes what sounds to be a similar case of heap corruption triggered by a domain shutdown

Comment 15 Eric Blake 2013-04-05 23:05:40 UTC
was the domain being shut down transient or persistent?

Comment 16 Eric Blake 2013-04-05 23:10:07 UTC
This commit mentions a crash possible for transient domains, but seems to focus on auto-destroy guests (those that go away when the virConnectPtr is closed) and might not be related to the setup you were using

commit 7ccad0b16d12d7616c7c21b1359f6a55a9677521
Author: Daniel P. Berrange <berrange>
Date:   Thu Feb 28 12:18:48 2013 +0000

    Fix crash in QEMU auto-destroy with transient guests
    
    When the auto-destroy callback runs it is supposed to return
    NULL if the virDomainObjPtr is no longer valid. It was not
    doing this for transient guests, so we tried to virObjectUnlock
    a mutex which had been freed. This often led to a crash.
    
    Signed-off-by: Daniel P. Berrange <berrange>

Comment 17 Lee Yarwood 2013-04-08 16:56:24 UTC
(In reply to comment #15)
> was the domain being shut down transient or persistent?

VDSM (calling libvirtd here) only creates transient domains.

(In reply to comment #16)
> This commit mentions a crash possible for transient domains, but seems to
> focus on auto-destroy guests (those that go away when the virConnectPtr is
> closed) and might not be related to the setup you were using
> 
> commit 7ccad0b16d12d7616c7c21b1359f6a55a9677521
> Author: Daniel P. Berrange <berrange>
> Date:   Thu Feb 28 12:18:48 2013 +0000
> 
>     Fix crash in QEMU auto-destroy with transient guests
>     
>     When the auto-destroy callback runs it is supposed to return
>     NULL if the virDomainObjPtr is no longer valid. It was not
>     doing this for transient guests, so we tried to virObjectUnlock
>     a mutex which had been freed. This often led to a crash.
>     
>     Signed-off-by: Daniel P. Berrange <berrange>

I'm not entirely sure how to configure auto destroy. Would the domain need to start with the VIR_DOMAIN_START_AUTODESTROY flag? AFAICT VDSM doesn't set this.

Lee

Comment 18 Eric Blake 2013-04-08 17:59:31 UTC
Hmm - another upstream message about a race still present (and THIS one sounds more like what we are hitting with guest shutdown):

https://www.redhat.com/archives/libvir-list/2013-April/msg00625.html

Comment 20 Eric Blake 2013-04-10 01:57:05 UTC
bug 915353 describes a crash on shutdown; it was fixed for libvirt-0.10.2-18.el6_4.1 - I'm starting to think that this particular fix is the one that solves the problem at hand.

Comment 23 Eric Blake 2013-05-15 15:00:39 UTC

*** This bug has been marked as a duplicate of bug 915353 ***