RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 924756 - libvirtd SIGABRT when shutting down a guest
Summary: libvirtd SIGABRT when shutting down a guest
Keywords:
Status: CLOSED DUPLICATE of bug 915353
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.4
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Eric Blake
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 896690 835616 960054
TreeView+ depends on / blocked
 
Reported: 2013-03-22 12:57 UTC by Julio Entrena Perez
Modified: 2018-12-01 16:07 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-05-15 15:00:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 336833 0 None None None Never

Description Julio Entrena Perez 2013-03-22 12:57:22 UTC
Description of problem:
libvirtd crashed around the same time a guest was shutdown from RHEV-M.

Core was generated by `/usr/sbin/libvirtd --listen'.
Program terminated with signal 6, Aborted.
#0  0x00007f92202408a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007f92202408a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f9220242085 in abort () at abort.c:92
#2  0x00007f922027e7b7 in __libc_message (do_abort=2, 
    fmt=0x7f9220365f80 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:198
#3  0x00007f92202840e6 in malloc_printerr (action=3, str=0x7f9220363f3e "corrupted double-linked list", 
    ptr=<value optimized out>) at malloc.c:6311
#4  0x00007f92202844f0 in malloc_consolidate (av=0x7f91dc000020) at malloc.c:5181
#5  0x00007f9220286ba8 in _int_free (av=0x7f91dc000020, p=0x7f91dc126b30, have_lock=0) at malloc.c:5054
#6  0x00007f92227114b9 in virFree (ptrptr=0x7f91dc126850) at util/memory.c:309
#7  0x00007f92227295dd in virHashFree (table=0x7f91dc126850) at util/virhash.c:265
#8  0x00007f922275fd86 in virDomainSnapshotObjListFree (snapshots=0x7f91dc10f830) at conf/snapshot_conf.c:724
#9  0x00007f9222723dbb in virObjectUnref (anyobj=<value optimized out>) at util/virobject.c:139
#10 0x0000000000489ad2 in qemuMonitorDispose (obj=<value optimized out>) at qemu/qemu_monitor.c:248
#11 0x00007f9222723dbb in virObjectUnref (anyobj=<value optimized out>) at util/virobject.c:139
#12 0x00007f92227095a8 in virEventPollCleanupHandles () at util/event_poll.c:567
#13 0x00007f9222709c90 in virEventPollRunOnce () at util/event_poll.c:636
#14 0x00007f9222708b67 in virEventRunDefaultImpl () at util/event.c:247
#15 0x00007f92227f863d in virNetServerRun (srv=0x1d7cf60) at rpc/virnetserver.c:748
#16 0x00000000004235b7 in main (argc=<value optimized out>, argv=<value optimized out>) at libvirtd.c:1228

Version-Release number of selected component (if applicable):
libvirtd 0.10.2-18.el6
vdsm 4.10.2-1.6.el6
qemu-kvm 0.12.1.2-2.355.el6_4.1
qemu-img 0.12.1.2-2.355.el6_4.1
spice-server 0.12.0-12.el6

How reproducible:
Unknown.

Steps to Reproduce:
1. A guest was shutdown from RHEV-M:

$ cat var/log/libvirt/qemu/i-web001.log | grep -B4 08\:34
inputs_connect: inputs channel client create
qemu: terminating on signal 15 from pid 12005
red_channel_client_disconnect: 0x7fdf7c298f80 (channel 0x7fdf7c21d670 type 4 id 0)
red_channel_client_disconnect: 0x7fdf7c2aa200 (channel 0x7fdf7c21d0b0 type 2 id 0)
2013-03-20 08:34:34.849+0000: shutting down

2. libvirtd crashed by SIGABRT:

$ xzgrep error var/log/libvirtd.log.3.xz  | grep -v debug
2013-03-20 08:34:34.848+0000: 12005: error : qemuMonitorIORead:513 : Unable to read from monitor: Connection reset by peer
2013-03-20 08:34:34.848+0000: 12005: error : qemuAgentIO:642 : internal error End of file from monitor
2013-03-20 08:34:34.850+0000: 12005: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet140" not in key map
2013-03-20 08:34:34.852+0000: 12005: error : virNetDevGetIndex:653 : Unable to get index for interface vnet140: No such device
2013-03-20 08:34:34.982+0000: 12005: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet141" not in key map
2013-03-20 08:34:34.984+0000: 12005: error : virNetDevGetIndex:653 : Unable to get index for interface vnet141: No such device
2013-03-20 08:34:34.848+000012005: error : qemuMonitorIORead:513 : Unable to read from monitor: Connection reset by peer
2013-03-20 08:34:34.848+000012005: error : qemuAgentIO:642 : internal error End of file from monitor
2013-03-20 08:34:34.850+000012005: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet140" not in key map
2013-03-20 08:34:34.852+000012005: error : virNetDevGetIndex:653 : Unable to get index for interface vnet140: No such device
2013-03-20 08:34:34.982+000012005: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet141" not in key map
2013-03-20 08:34:34.984+000012005: error : virNetDevGetIndex:653 : Unable to get index for interface vnet141: No such device

/var/log/libvirtd.log.3.xz :
2013-03-20 08:34:35.362+0000: 12005: debug : virCgroupRemoveRecursively:727 : Removing cgroup /cgroup
/freezer/libvirt/qemu/i-web001/
2013-03-20 08:34:35.378+0000: 12005: debug : virCgroupRemove:772 : Removing cgroup /cgroup/blkio/libv
irt/qemu/i-web001/ and all child cgroups
2013-03-20 08:34:35.378+0000: 12005: debug : virCgroupRemoveRecursively:727 : Removing cgroup /cgroup
/blkio/libvirt/qemu/i-web001/
2013-03-20 08:34:35.394+0000: 12005: debug : virObjectUnref:135 : OBJECT_UNREF: obj=0x7f91dc111420
2013-03-20 08:34:35.394+0000: 12005: debug : virObjectUnref:137 : OBJECT_DISPOSE: obj=0x7f91dc111420
2013-03-20 08:34:35.394+0000: 12005: debug : virObjectUnref:135 : OBJECT_UNREF: obj=0x7f91dc115bd0
2013-03-20 08:34:35.395+0000: 12005: debug : virObjectUnref:135 : OBJECT_UNREF: obj=0x7f91dc112c00
2013-03-20 08:34:35.395+0000: 12005: debug : virObjectUnref:137 : OBJECT_DISPOSE: obj=0x7f91dc112c00
2013-03-20 08:34:35.395+0000: 12005: debug : qemuMonitorDispose:246 : mon=0x7f91dc112c00
2013-03-20 08:34:35.395+0000: 12005: debug : virObjectUnref:135 : OBJECT_UNREF: obj=0x7f91dc115bd0
2013-03-20 08:34:35.395+0000: 12005: debug : virObjectUnref:137 : OBJECT_DISPOSE: obj=0x7f91dc115bd0
Caught abort signal dumping internal log buffer:
  
Actual results:


Expected results:


Additional info:

Comment 4 Huang Wenlong 2013-03-25 06:36:01 UTC
Hi,Julio

Could you provide some clear steps to reproduce this issue ? 
It is hard for me to reproduce it without steps 
Thanks very much 

Wenlong

Comment 5 Julio Entrena Perez 2013-03-25 11:58:13 UTC
(In reply to comment #4)
> Hi,Julio
> 
> Could you provide some clear steps to reproduce this issue ? 
> It is hard for me to reproduce it without steps 
> Thanks very much 
> 
> Wenlong

Not really: we're not sure how to trigger the condition.

Comment 6 Pavel Zhukov 2013-03-25 14:05:35 UTC
Duplicate https://bugzilla.redhat.com/show_bug.cgi?id=918959 ?

Comment 7 Eric Blake 2013-03-26 20:19:15 UTC
I wonder if this upstream patch has any relation:
https://www.redhat.com/archives/libvir-list/2013-March/msg01489.html

Comment 11 Huang Wenlong 2013-04-02 03:04:48 UTC
libvirt-0.10.2-18.el6_4.3.x86_64
vdsm-4.10.2-13.0.el6ev.x86_64
qemu-kvm-rhev-0.12.1.2-2.355.el6_4.2.x86_64

I use one host with 118 vms the host load average: 100.33, 92.45, 48.45 
it can be reproduced this issue , start and shutoff vms  via rhevm , libvirtd is still running .


Wenlong

Comment 12 Huang Wenlong 2013-04-02 03:26:38 UTC
(In reply to comment #11)
> libvirt-0.10.2-18.el6_4.3.x86_64
> vdsm-4.10.2-13.0.el6ev.x86_64
> qemu-kvm-rhev-0.12.1.2-2.355.el6_4.2.x86_64
> 
> I use one host with 118 vms the host load average: 100.33, 92.45, 48.45 
> it can be reproduced this issue , start and shutoff vms  via rhevm ,
> libvirtd is still running .
> 
> 
> Wenlong

Sorry !  I miss a NOT ,  

I can not reproduc this issue .

Comment 13 Eric Blake 2013-04-05 23:01:03 UTC
Do you have MALLOC_PERTURB_ set in the environment?  If not, can you set it to a non-zero value, which will help glibc detect heap smashing bugs closer to the point at which they happen?

Comment 14 Eric Blake 2013-04-05 23:03:41 UTC
bug 919057 describes what sounds to be a similar case of heap corruption triggered by a domain shutdown

Comment 15 Eric Blake 2013-04-05 23:05:40 UTC
was the domain being shut down transient or persistent?

Comment 16 Eric Blake 2013-04-05 23:10:07 UTC
This commit mentions a crash possible for transient domains, but seems to focus on auto-destroy guests (those that go away when the virConnectPtr is closed) and might not be related to the setup you were using

commit 7ccad0b16d12d7616c7c21b1359f6a55a9677521
Author: Daniel P. Berrange <berrange>
Date:   Thu Feb 28 12:18:48 2013 +0000

    Fix crash in QEMU auto-destroy with transient guests
    
    When the auto-destroy callback runs it is supposed to return
    NULL if the virDomainObjPtr is no longer valid. It was not
    doing this for transient guests, so we tried to virObjectUnlock
    a mutex which had been freed. This often led to a crash.
    
    Signed-off-by: Daniel P. Berrange <berrange>

Comment 17 Lee Yarwood 2013-04-08 16:56:24 UTC
(In reply to comment #15)
> was the domain being shut down transient or persistent?

VDSM (calling libvirtd here) only creates transient domains.

(In reply to comment #16)
> This commit mentions a crash possible for transient domains, but seems to
> focus on auto-destroy guests (those that go away when the virConnectPtr is
> closed) and might not be related to the setup you were using
> 
> commit 7ccad0b16d12d7616c7c21b1359f6a55a9677521
> Author: Daniel P. Berrange <berrange>
> Date:   Thu Feb 28 12:18:48 2013 +0000
> 
>     Fix crash in QEMU auto-destroy with transient guests
>     
>     When the auto-destroy callback runs it is supposed to return
>     NULL if the virDomainObjPtr is no longer valid. It was not
>     doing this for transient guests, so we tried to virObjectUnlock
>     a mutex which had been freed. This often led to a crash.
>     
>     Signed-off-by: Daniel P. Berrange <berrange>

I'm not entirely sure how to configure auto destroy. Would the domain need to start with the VIR_DOMAIN_START_AUTODESTROY flag? AFAICT VDSM doesn't set this.

Lee

Comment 18 Eric Blake 2013-04-08 17:59:31 UTC
Hmm - another upstream message about a race still present (and THIS one sounds more like what we are hitting with guest shutdown):

https://www.redhat.com/archives/libvir-list/2013-April/msg00625.html

Comment 20 Eric Blake 2013-04-10 01:57:05 UTC
bug 915353 describes a crash on shutdown; it was fixed for libvirt-0.10.2-18.el6_4.1 - I'm starting to think that this particular fix is the one that solves the problem at hand.

Comment 23 Eric Blake 2013-05-15 15:00:39 UTC

*** This bug has been marked as a duplicate of bug 915353 ***


Note You need to log in before you can comment on or make changes to this bug.