Bug 722748 - Segfault during peer2peer migration
Summary: Segfault during peer2peer migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Daniel Berrangé
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-17 09:52 UTC by Rami Vaknin
Modified: 2014-01-12 23:53 UTC (History)
11 users (show)

Fixed In Version: libvirt-0.9.3-6.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 11:16:46 UTC
Target Upstream Version:


Attachments (Terms of Use)
core file, libvirtd and vdsm logs (4.88 MB, application/x-compressed-tar)
2011-07-17 09:52 UTC, Rami Vaknin
no flags Details
libvirtd.log (64.48 KB, text/plain)
2011-07-19 07:12 UTC, weizhang
no flags Details
part of libvirtd.log (3.48 KB, text/plain)
2011-07-19 09:51 UTC, weizhang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1513 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2011-12-06 01:23:30 UTC

Description Rami Vaknin 2011-07-17 09:52:51 UTC
Created attachment 513521 [details]
core file, libvirtd and vdsm logs

Environment:
RHEVM 3.0 on dev env, last commit 12b1f476f4f1a01bb86cb5687c19f4ef0f0784c6
libvirt-0.9.3-5.el6.x86_64, vdsm-4.9-81.el6.x86_64


libvirtd[15034]: segfault at 7f5700000012 ip 00000030276787de sp 00007f5803df1330 error 6 in libc-2.12.so[3027600000+187000]

Loaded symbols for /lib64/libnss_dns-2.12.so
Core was generated by `libvirtd --daemon --listen'.
Program terminated with signal 11, Segmentation fault.
#0  _int_malloc (av=0x7f57f8000020, bytes=<value optimized out>) at malloc.c:4439
4439	      bck->fd = unsorted_chunks(av);
Missing separate debuginfos, use: debuginfo-install krb5-libs-1.9-9.el6.x86_64 libcurl-7.19.7-26.el6.x86_64 libgcrypt-1.4.5-5.el6.x86_64 openssl-1.0.0-10.el6.x86_64
(gdb) bt
#0  _int_malloc (av=0x7f57f8000020, bytes=<value optimized out>) at malloc.c:4439
#1  0x0000003027679add in __libc_malloc (bytes=100) at malloc.c:3660
#2  0x00000030276fff7b in __vasprintf_chk (result_ptr=0x7f5803df1628, flags=1, format=0x7f5809dd6dcf "Remove timer %d", args=0x7f5803df15e0) at vasprintf_chk.c:50
#3  0x00007f5809ceef64 in vasprintf (strp=<value optimized out>, fmt=<value optimized out>, list=<value optimized out>) at /usr/include/bits/stdio2.h:199
#4  virVasprintf (strp=<value optimized out>, fmt=<value optimized out>, list=<value optimized out>) at util/util.c:1623
#5  0x00007f5809cded75 in virLogMessage (category=0x7f5809dd6dab "file.util/event_poll.c", priority=1, funcname=0x7f5809dd7350 "virEventPollRemoveTimeout", linenr=276, flags=0, fmt=<value optimized out>) at util/logging.c:721
#6  0x00007f5809cd77a3 in virEventPollRemoveTimeout (timer=20) at util/event_poll.c:276
#7  0x00007f5809d0fd06 in virDomainEventStateFree (state=0x7f57f80d0420) at conf/domain_event.c:556
#8  0x00007f5809d637e3 in doRemoteClose (conn=<value optimized out>, priv=0x7f57f80f35e0) at remote/remote_driver.c:848
#9  0x00007f5809d6394b in remoteClose (conn=0x7f57f8013790) at remote/remote_driver.c:863
#10 0x00007f5809d2fadb in virReleaseConnect (conn=0x7f57f8013790) at datatypes.c:114
#11 0x00007f5809d30fe8 in virUnrefConnect (conn=0x7f57f8013790) at datatypes.c:149
#12 0x0000000000481488 in doPeer2PeerMigrate (driver=0x7f57f8020220, conn=0x7f57ec000a70, vm=0x7f57f00d7b20, xmlin=<value optimized out>, dconnuri=0x7f57f8117900 "qemu+tls://nott-vdsa.qa.lab.tlv.redhat.com/system", 
    uri=<value optimized out>, cookiein=0x0, cookieinlen=0, cookieout=0x7f5803df1b80, cookieoutlen=0x7f5803df1b8c, flags=3, dname=0x7f57f0046320 "stress_new_pool-23", resource=0, v3proto=true) at qemu/qemu_migration.c:2253
#13 qemuMigrationPerform (driver=0x7f57f8020220, conn=0x7f57ec000a70, vm=0x7f57f00d7b20, xmlin=<value optimized out>, dconnuri=0x7f57f8117900 "qemu+tls://nott-vdsa.qa.lab.tlv.redhat.com/system", uri=<value optimized out>, cookiein=0x0, 
    cookieinlen=0, cookieout=0x7f5803df1b80, cookieoutlen=0x7f5803df1b8c, flags=3, dname=0x7f57f0046320 "stress_new_pool-23", resource=0, v3proto=true) at qemu/qemu_migration.c:2315
#14 0x0000000000448a83 in qemuDomainMigratePerform3 (dom=0x7f57f8196300, xmlin=0x0, cookiein=<value optimized out>, cookieinlen=0, cookieout=0x7f5803df1b80, cookieoutlen=0x7f5803df1b8c, 
    dconnuri=0x7f57f8117900 "qemu+tls://nott-vdsa.qa.lab.tlv.redhat.com/system", uri=0x0, flags=3, dname=0x0, resource=0) at qemu/qemu_driver.c:6999
#15 0x00007f5809d4c9c4 in virDomainMigratePerform3 (domain=0x7f57f8196300, xmlin=0x0, cookiein=0x0, cookieinlen=<value optimized out>, cookieout=<value optimized out>, cookieoutlen=0x7f5803df1b8c, 
    dconnuri=0x7f57f8117900 "qemu+tls://nott-vdsa.qa.lab.tlv.redhat.com/system", uri=0x0, flags=3, dname=0x0, bandwidth=0) at libvirt.c:5162
#16 0x000000000041fbc2 in remoteDispatchDomainMigratePerform3 (server=<value optimized out>, client=<value optimized out>, hdr=<value optimized out>, rerr=0x7f5803df1c10, args=<value optimized out>, ret=0x7f57f810de40) at remote.c:2789
#17 remoteDispatchDomainMigratePerform3Helper (server=<value optimized out>, client=<value optimized out>, hdr=<value optimized out>, rerr=0x7f5803df1c10, args=<value optimized out>, ret=0x7f57f810de40) at remote_dispatch.h:2700
#18 0x000000000043ac2e in virNetServerProgramDispatchCall (prog=0xccc420, server=0xccbae0, client=0xcf2a00, msg=0xfc5f60) at rpc/virnetserverprogram.c:375
#19 virNetServerProgramDispatch (prog=0xccc420, server=0xccbae0, client=0xcf2a00, msg=0xfc5f60) at rpc/virnetserverprogram.c:252
#20 0x000000000043d401 in virNetServerHandleJob (jobOpaque=<value optimized out>, opaque=0xccbae0) at rpc/virnetserver.c:150
#21 0x00007f5809cecdaa in virThreadPoolWorker (opaque=0xccbbd0) at util/threadpool.c:98
#22 0x00007f5809cec812 in virThreadHelper (data=<value optimized out>) at util/threads-pthread.c:157
#23 0x0000003027e077e1 in start_thread (arg=0x7f5803df2700) at pthread_create.c:301
#24 0x00000030276e68ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Comment 2 Daniel Veillard 2011-07-18 11:23:29 UTC
Okay the last operations logged in libvirtd.log dump are:

12:17:35.880: 15033: debug : virEventPollDispatchHandles:454 : i=22 w=103
12:17:35.880: 15033: debug : virEventPollDispatchHandles:467 : Dispatch n=22 f=24 w=103 e=1 0x7f57f80f2ee0
12:17:35.880: 15034: debug : virEventPollRemoveHandle:184 : mark delete 22 24
12:17:35.880: 15034: debug : virEventPollInterruptLocked:677 : Interrupting
12:17:35.880: 15034: debug : virNetSocketFree:627 : sock=0x7f57f80f2ee0 fd=24

virNetSocketFree does some free and then malloc a bit later in
virEventPollRemoveTimeout() debug crashes. I would guess that somehow
the frees virNetSocketFree corrupted the heap or the associated data
were still in use leading to a heap data structure corruption.

Daniel

Comment 3 weizhang 2011-07-19 07:12:17 UTC
Created attachment 513720 [details]
libvirtd.log

Try to reproduce the bug with the following steps:
kernel-2.6.32-166.el6.x86_64
libvirt-0.9.3-5.el6.x86_64
qemu-kvm-0.12.1.2-2.169.el6.x86_64

1. build tls environment from source to target host
server: target 
client: source

2. build tls environment from source to source host with the same cacert.pem
server: source
client: source

on source host test tls environment:
3. # virsh -c qemu+tls://{target ip}/system
Welcome to virsh, the virtualization interactive terminal.

Type:  'help' for help with commands
       'quit' to quit

virsh # exit

# virsh -c qemu+tls://{source ip}/system
Welcome to virsh, the virtualization interactive terminal.

Type:  'help' for help with commands
       'quit' to quit

virsh # exit

4. start a guest on target host which image is on shared nfs mounted on both sides
# setsebool virt_use_nfs 1
# iptables -F

5. do migration on source host with command:
#  virsh -c qemu+tls://{target ip}/system migrate --p2p guest_name qemu+tls://{source ip}/system
error: Cannot recv data: Input/output error

on target host
# service libvirtd status
libvirtd dead but pid file exists

Comment 4 Daniel Veillard 2011-07-19 09:11:45 UTC
seems that attaching libvirtd to gdb on the target before running the
virsh migration would allow to get the crash and make sure it's the same bug,
Could you try that and show the stack trace ?
Also keep the current libvirtd.log on the target as it should contain a
lot of debug informations.

  thanks,

Daniel

Comment 5 weizhang 2011-07-19 09:51:37 UTC
Created attachment 513750 [details]
part of libvirtd.log

stack trace info:

#0  0x0000000000000000 in ?? ()
#1  0x0000003a36434150 in _gnutls_string_resize (dest=0x7f11440c3ca8, new_size=<value optimized out>) at gnutls_str.c:192
#2  0x0000003a3641a614 in _gnutls_io_read_buffered (session=0x7f11440c3000, iptr=0x7fffdb42e0a8, sizeOfPtr=5, recv_type=<value optimized out>) at gnutls_buffers.c:515
#3  0x0000003a36416031 in _gnutls_recv_int (session=0x7f11440c3000, type=GNUTLS_APPLICATION_DATA, htype=4294967295, data=0x7f11440c8498 "", sizeofdata=4) at gnutls_record.c:904
#4  0x00007f115f48e45d in virNetTLSSessionRead (sess=<value optimized out>, buf=<value optimized out>, len=<value optimized out>) at rpc/virnettlscontext.c:812
#5  0x00007f115f48a05d in virNetSocketReadWire (sock=0x7f11440c82b0, buf=0x7f11440c8498 "", len=4) at rpc/virnetsocket.c:801
#6  0x00007f115f48a2d0 in virNetSocketRead (sock=0x7f11440c82b0, buf=0x7f11440c8498 "", len=4) at rpc/virnetsocket.c:981
#7  0x00007f115f4865ed in virNetClientIOReadMessage (client=0x7f11440c8440) at rpc/virnetclient.c:717
#8  virNetClientIOHandleInput (client=0x7f11440c8440) at rpc/virnetclient.c:736
#9  0x00007f115f487dd0 in virNetClientIncomingEvent (sock=0x7f11440c82b0, events=<value optimized out>, opaque=0x7f11440c8440) at rpc/virnetclient.c:1127
#10 0x00007f115f3e56b2 in virEventPollDispatchHandles () at util/event_poll.c:469
#11 virEventPollRunOnce () at util/event_poll.c:610
#12 0x00007f115f3e4567 in virEventRunDefaultImpl () at util/event.c:247
#13 0x000000000043c9dd in virNetServerRun (srv=0x1c8e550) at rpc/virnetserver.c:662
#14 0x000000000041d828 in main (argc=<value optimized out>, argv=<value optimized out>) at libvirtd.c:1552

Comment 6 Daniel Veillard 2011-07-19 10:07:02 UTC
Seems more like the stacktrace of #722738 

https://bugzilla.redhat.com/show_bug.cgi?id=722738

The two may be related but a priori you're hitting something different,

Daniel

Comment 7 Daniel Berrangé 2011-07-19 13:26:16 UTC
This series should fix the problem

https://www.redhat.com/archives/libvir-list/2011-July/msg01179.html

Comment 8 Daniel Berrangé 2011-07-19 14:53:29 UTC
Should also add this to improve error reporting for migration

https://www.redhat.com/archives/libvir-list/2011-July/msg01201.html

Comment 12 Rami Vaknin 2011-07-21 13:40:17 UTC
Verified on libvirt-0.9.3-7.el6.x86_64 using the automation test that reproduced it several times.

Comment 13 errata-xmlrpc 2011-12-06 11:16:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html


Note You need to log in before you can comment on or make changes to this bug.