1161024 – libvirtd crashes after device hot-unplug crashes qemu

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1161024 - libvirtd crashes after device hot-unplug crashes qemu

Summary: libvirtd crashes after device hot-unplug crashes qemu

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	7.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Ján Tomko
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1186765
TreeView+	depends on / blocked

Reported:	2014-11-06 08:13 UTC by Luyao Huang
Modified:	2015-03-05 07:47 UTC (History)
CC List:	7 users (show)
Fixed In Version:	libvirt-1.2.8-14.el7
Doc Type:	Bug Fix
Doc Text:	Cause: Libvirt wasn't checking if the domain was still alive after exiting the monitor. Consequence: If the domain crashed, freed data from its defintion could be accessed. Fix: Check if domain is alive before accessing its defintion. Result: No crash.
Clone Of:
Clones:	1186765 (view as bug list)
Environment:
Last Closed:	2015-03-05 07:47:03 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
invalid free and read/write (170.00 KB, text/plain) 2014-11-06 13:01 UTC, Luyao Huang	no flags	Details
memory leak (110.50 KB, text/plain) 2014-11-06 13:05 UTC, Luyao Huang	no flags	Details
libvirtd debug log (228.46 KB, text/plain) 2014-11-08 13:55 UTC, Luyao Huang	no flags	Details
another issue valgrind log (16.65 KB, text/plain) 2014-11-18 05:36 UTC, Luyao Huang	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:0323	0	normal	SHIPPED_LIVE	Low: libvirt security, bug fix, and enhancement update	2015-03-05 12:10:54 UTC

Description Luyao Huang 2014-11-06 08:13:56 UTC

Description of problem:
libvirtd crashed after hot-unplug a vm which after migrate iscsi disk

version:
libvirt-1.2.8-6.el7.x86_64

How reproducible:
50%(not always crash libvirtd, but always cause qemu crashed)

Step to reproduce:
1.prepare a guest with 2 disk 
# virsh dumpxml r6
 <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/var/lib/libvirt/images/r6.img'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='network' device='lun'>
      <driver name='qemu' type='raw' cache='none'/>
      <source protocol='iscsi' name='iqn.2003-01.org.linux-iscsi.test1.x8664:sn.05011d8e73cb/0'>
        <host name='10.66.6.12' port='3260'/>
      </source>
      <target dev='sda' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='1' unit='0'/>
    </disk>

2.migrate guest to iscsi server with --copy-storage-all
# virsh migrate r6 --live --copy-storage-all qemu+ssh://10.66.6.12/system
root.6.12's password:

3.after migrate hot-unplug iscsi disk in target:

# virsh detach-disk r6 sda
error: Failed to detach disk
error: End of file while reading data: Input/output error
error: Failed to reconnect to the hypervisor
 

Expect result:
fix it .

some useful message:

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7f2edfd8d700 (LWP 15752)]
0x00007f2eeca7d5e9 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install sssd-client-1.12.1-2.el7.x86_64
(gdb) t a a bt

Thread 9 (Thread 0x7f2edfd8d700 (LWP 15752)):
#0  0x00007f2eeca7d5e9 in raise () from /lib64/libc.so.6
#1  0x00007f2eeca7ecf8 in abort () from /lib64/libc.so.6
#2  0x00007f2eecabddf7 in __libc_message () from /lib64/libc.so.6
#3  0x00007f2eecac51cd in _int_free () from /lib64/libc.so.6
#4  0x00007f2eef97586a in virFree (ptrptr=ptrptr@entry=0x7f2ec80169f8) at util/viralloc.c:582
#5  0x00007f2eef9c738a in virStorageSourceClear (def=0x7f2ec80169f0) at util/virstoragefile.c:2016
#6  0x00007f2eef9c6983 in virStorageSourceFree (def=0x7f2ec80169f0) at util/virstoragefile.c:2040
#7  0x00007f2eef9e7dba in virDomainDiskDefFree (def=def@entry=0x7f2ec80168c0) at conf/domain_conf.c:1232
#8  0x00007f2ed8cf0875 in qemuDomainRemoveDiskDevice (driver=driver@entry=0x7f2ed014dd40, vm=vm@entry=0x7f2ed01d1fc0, disk=disk@entry=0x7f2ec80168c0) at qemu/qemu_hotplug.c:2569
#9  0x00007f2ed8cf78ac in qemuDomainDetachDiskDevice (detach=<optimized out>, vm=<optimized out>, driver=<optimized out>) at qemu/qemu_hotplug.c:3082
#10 qemuDomainDetachDeviceDiskLive (driver=driver@entry=0x7f2ed014dd40, vm=vm@entry=0x7f2ed01d1fc0, dev=dev@entry=0x7f2ec8010b90) at qemu/qemu_hotplug.c:3128
#11 0x00007f2ed8d56d47 in qemuDomainDetachDeviceLive (dom=0x7f2ec8004cf0, dev=0x7f2ec8010b90, vm=0x7f2ed01d1fc0) at qemu/qemu_driver.c:6994
#12 qemuDomainDetachDeviceFlags (dom=0x7f2ec8004cf0, xml=<optimized out>, flags=<optimized out>) at qemu/qemu_driver.c:7782
#13 0x00007f2eefa6b0a6 in virDomainDetachDevice (domain=domain@entry=0x7f2ec8004cf0,
    xml=0x7f2ec8004fe0 "<disk type=\"network\" device=\"lun\">\n      <driver name=\"qemu\" type=\"raw\" cache=\"none\"/>\n      <source protocol=\"iscsi\" name=\"iqn.2003-01.org.linux-iscsi.test1.x8664:sn.05011d8e73cb/0\">\n        <host na"...) at libvirt.c:10489
#14 0x00007f2ef050a290 in remoteDispatchDomainDetachDevice (server=<optimized out>, msg=<optimized out>, args=0x7f2ec8015170, rerr=0x7f2edfd8cc80, client=<optimized out>)
    at remote_dispatch.h:3488
#15 remoteDispatchDomainDetachDeviceHelper (server=<optimized out>, client=<optimized out>, msg=<optimized out>, rerr=0x7f2edfd8cc80, args=0x7f2ec8015170, ret=<optimized out>)
    at remote_dispatch.h:3466
#16 0x00007f2eefac8ff2 in virNetServerProgramDispatchCall (msg=0x7f2ef1fbcf20, client=0x7f2ef1fbab20, server=0x7f2ef1fad400, prog=0x7f2ef1fb80d0) at rpc/virnetserverprogram.c:437
#17 virNetServerProgramDispatch (prog=0x7f2ef1fb80d0, server=server@entry=0x7f2ef1fad400, client=0x7f2ef1fbab20, msg=0x7f2ef1fbcf20) at rpc/virnetserverprogram.c:307
#18 0x00007f2ef05181fd in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7f2ef1fad400) at rpc/virnetserver.c:172
#19 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7f2ef1fad400) at rpc/virnetserver.c:193
#20 0x00007f2eef9cd6c5 in virThreadPoolWorker (opaque=opaque@entry=0x7f2ef1f8f0e0) at util/virthreadpool.c:145
#21 0x00007f2eef9cd05e in virThreadHelper (data=<optimized out>) at util/virthread.c:197
#22 0x00007f2eed227df3 in start_thread () from /lib64/libpthread.so.0
#23 0x00007f2eecb3e05d in clone () from /lib64/libc.so.6


And use gdb found the crashed cause backingStoreRaw out of bounds:

Breakpoint 1, virDomainDiskDefFree (def=0x7f0fe0016690) at conf/domain_conf.c:1228
1228	{
Missing separate debuginfos, use: debuginfo-install sssd-client-1.12.1-2.el7.x86_64
(gdb) c
Continuing.

Breakpoint 1, virDomainDiskDefFree (def=0x7f0fe0016890) at conf/domain_conf.c:1228
1228	{
(gdb) c
Continuing.
[Switching to Thread 0x7f100652e700 (LWP 17841)]

Breakpoint 1, virDomainDiskDefFree (def=def@entry=0x7f0fe0016890) at conf/domain_conf.c:1228
1228	{
(gdb) n
1229	    if (!def)
(gdb) 
1228	{
(gdb) 
1229	    if (!def)
(gdb) 
1232	    virStorageSourceFree(def->src);
(gdb) s
virStorageSourceFree (def=0x7f0fe00169c0) at util/virstoragefile.c:2036
2036	{
(gdb) n
2037	    if (!def)
(gdb) 
2036	{
(gdb) 
2037	    if (!def)
(gdb) p def
$1 = (virStorageSource *) 0x7f0fe00169c0
(gdb) p *def
$2 = {type = -536775920, path = 0x7f0fe0000078 "Py\001\340\017\177", protocol = 5, volume = 0x0, nhosts = 1, hosts = 0x7f0fe0016020, srcpool = 0x0, auth = 0x0, encryption = 0x0, 
  driverName = 0x0, format = 1, features = 0x0, compat = 0x0, nocow = false, sparse = false, perms = 0x0, timestamps = 0x0, allocation = 0, capacity = 0, nseclabels = 0, 
  seclabels = 0x0, readonly = false, shared = false, backingStore = 0x0, drv = 0x0, relPath = 0x0, backingStoreRaw = 0x400 <Address 0x400 out of bounds>}
(gdb) n
2040	    virStorageSourceClear(def);

Comment 1 Luyao Huang 2014-11-06 08:22:24 UTC

Full qemu crash backtrace:



Thread 5 (Thread 0x7fbd64ba7700 (LWP 18690)):
#0  0x00007fbd72c05f7d in __lll_lock_wait () from /usr/lib64/libpthread.so.0
#1  0x00007fbd72c01d41 in _L_lock_790 () from /usr/lib64/libpthread.so.0
#2  0x00007fbd72c01c47 in pthread_mutex_lock () from /usr/lib64/libpthread.so.0
#3  0x00007fbd7437ecd9 in qemu_mutex_lock (mutex=mutex@entry=0x7fbd7480b340 <qemu_global_mutex>) at util/qemu-thread-posix.c:76
#4  0x00007fbd741408b0 in qemu_mutex_lock_iothread () at /usr/src/debug/qemu-2.1.2/cpus.c:1053
#5  0x00007fbd741505e4 in kvm_cpu_exec (cpu=cpu@entry=0x7fbd76c71880) at /usr/src/debug/qemu-2.1.2/kvm-all.c:1724
#6  0x00007fbd7413f932 in qemu_kvm_cpu_thread_fn (arg=0x7fbd76c71880) at /usr/src/debug/qemu-2.1.2/cpus.c:883
#7  0x00007fbd72bffdf3 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007fbd6da7e05d in clone () from /usr/lib64/libc.so.6

Thread 4 (Thread 0x7fbd5ffff700 (LWP 18691)):
#0  0x00007fbd6da75147 in ioctl () from /usr/lib64/libc.so.6
#1  0x00007fbd74150525 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7fbd76cacc60, type=type@entry=44672) at /usr/src/debug/qemu-2.1.2/kvm-all.c:1853
#2  0x00007fbd741505dc in kvm_cpu_exec (cpu=cpu@entry=0x7fbd76cacc60) at /usr/src/debug/qemu-2.1.2/kvm-all.c:1722
#3  0x00007fbd7413f932 in qemu_kvm_cpu_thread_fn (arg=0x7fbd76cacc60) at /usr/src/debug/qemu-2.1.2/cpus.c:883
#4  0x00007fbd72bffdf3 in start_thread () from /usr/lib64/libpthread.so.0
#5  0x00007fbd6da7e05d in clone () from /usr/lib64/libc.so.6

Thread 3 (Thread 0x7fbd5f5ff700 (LWP 18693)):
#0  0x00007fbd72c058a0 in sem_timedwait () from /usr/lib64/libpthread.so.0
#1  0x00007fbd7437ef07 in qemu_sem_timedwait (sem=sem@entry=0x7fbd76d0fa78, ms=ms@entry=10000) at util/qemu-thread-posix.c:257
#2  0x00007fbd743294bc in worker_thread (opaque=0x7fbd76d0f9e0) at thread-pool.c:96
#3  0x00007fbd72bffdf3 in start_thread () from /usr/lib64/libpthread.so.0
#4  0x00007fbd6da7e05d in clone () from /usr/lib64/libc.so.6

Thread 2 (Thread 0x7fbd5e7ff700 (LWP 18694)):
#0  0x00007fbd6da73a8d in poll () from /usr/lib64/libc.so.6
#1  0x00007fbd6ec8dc17 in red_worker_main () from /usr/lib64/libspice-server.so.1
#2  0x00007fbd72bffdf3 in start_thread () from /usr/lib64/libpthread.so.0
#3  0x00007fbd6da7e05d in clone () from /usr/lib64/libc.so.6

Thread 1 (Thread 0x7fbd74015a40 (LWP 18685)):
#0  drive_del (dinfo=0x0) at blockdev.c:218
#1  0x00007fbd74209b46 in do_drive_del (mon=<optimized out>, qdict=<optimized out>, ret_data=<optimized out>) at blockdev.c:1801
#2  0x00007fbd74143ea7 in qmp_call_cmd (cmd=<optimized out>, params=0x7fbd76e31350, mon=0x7fbd76c388b0) at /usr/src/debug/qemu-2.1.2/monitor.c:5038
#3  handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /usr/src/debug/qemu-2.1.2/monitor.c:5104
#4  0x00007fbd7437b4a2 in json_message_process_token (lexer=0x7fbd76c382c0, token=0x7fbd76c19330, type=JSON_OPERATOR, x=93, y=96) at qobject/json-streamer.c:87
#5  0x00007fbd7438d25f in json_lexer_feed_char (lexer=lexer@entry=0x7fbd76c382c0, ch=<optimized out>, flush=flush@entry=false) at qobject/json-lexer.c:303
#6  0x00007fbd7438d32e in json_lexer_feed (lexer=0x7fbd76c382c0, buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:356
#7  0x00007fbd7437b639 in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>) at qobject/json-streamer.c:110
#8  0x00007fbd74141e3f in monitor_control_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-2.1.2/monitor.c:5125
#9  0x00007fbd74217d40 in qemu_chr_be_write (len=<optimized out>, buf=0x7fff55482190 "}\220\302v\275\177", s=0x7fbd76c28d00) at qemu-char.c:213
#10 tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x7fbd76c28d00) at qemu-char.c:2729
#11 0x00007fbd724f99ba in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#12 0x00007fbd74335e58 in glib_pollfds_poll () at main-loop.c:190
#13 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:235
---Type <return> to continue, or q <return> to quit---
#14 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:484
#15 0x00007fbd7411977e in main_loop () at vl.c:2010
#16 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4552

Comment 2 Luyao Huang 2014-11-06 08:37:52 UTC

Another information:
And this guest won't crashed libvirtd when i first do it.After migrate another guest which also use the iscsi disk.Than this guest will crash libvirtd, i don't
know why.Maybe this is a main point to reproduce this crashed.

1.
# virsh dumpxml test3
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/var/lib/libvirt/images/test3.img'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='network' device='lun'>
      <driver name='qemu' type='raw' cache='none'/>
      <source protocol='iscsi' name='iqn.2003-01.org.linux-iscsi.test1.x8664:sn.05011d8e73cb/0'>
        <host name='10.66.6.12' port='3260'/>
      </source>
      <target dev='sda' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='1' unit='0'/>
    </disk>

# virsh migrate test3 --live --copy-storage-all qemu+ssh://10.66.6.12/system
root.6.12's password: 

2.on target
# virsh detach-disk test3 sda
Disk detached successfully

# virsh detach-disk test3 sda
error: failed to get domain 'test3'
error: Domain not found: no domain with matching name 'test3'

# virsh list --all
 Id    Name                           State
----------------------------------------------------

3.use another guest just like comment 0.And do the same step with comment 0.

Comment 3 Ján Tomko 2014-11-06 11:18:14 UTC

I cannot reproduce the libvirtd crash. Could you try running it under valgrind, it looks like the memory was overwritten by something else:
valgrind --leak-check=full libvirtd

qemu-kvm-rhev-2.1.2-6.el7.x86_64 does crash for me (guessing from the line numbers in the backtrace in comment 1, you used a different version).
However it no longer crashes with the fix for stoppping the NBD server after successful migration (see bug 1160212).

Comment 6 Luyao Huang 2014-11-06 13:00:40 UTC

(In reply to Jan Tomko from comment #3)
> I cannot reproduce the libvirtd crash. Could you try running it under
> valgrind, it looks like the memory was overwritten by something else:
> valgrind --leak-check=full libvirtd
> 
> qemu-kvm-rhev-2.1.2-6.el7.x86_64 does crash for me (guessing from the line
> numbers in the backtrace in comment 1, you used a different version).
> However it no longer crashes with the fix for stoppping the NBD server after
> successful migration (see bug 1160212).

Found a lot of memory leak and invalid free

i will attach them

Comment 7 Luyao Huang 2014-11-06 13:01:57 UTC

Created attachment 954431 [details]
invalid free and read/write

Comment 8 Luyao Huang 2014-11-06 13:05:09 UTC

Created attachment 954435 [details]
memory leak

Comment 9 Luyao Huang 2014-11-08 13:55:31 UTC

Created attachment 955283 [details]
libvirtd debug log

Comment 10 Luyao Huang 2014-11-18 03:37:40 UTC

Hi Jan,

I found another libvirtd crashed, and after i check the backtrace, i think
the issue just like this bug, and this is steps:

1.# virsh list --all
 Id    Name                           State
----------------------------------------------------
 2     r7                             running

2.# cat lxcconsole.xml
  <console type='pty'>
      <target type='virtio' port='1'/>
    </console>

3. # virsh attach-device r7 lxcconsole.xml
error: Failed to attach device from lxcconsole.xml
error: Cannot recv data: Connection reset by peer
error: Failed to reconnect to the hypervisor


And the casue is qemu crash than libvirt crash:

==31298== Invalid read of size 8
==31298==    at 0x1C479B67: qemuDomainChrRemove (qemu_hotplug.c:1444)
==31298==    by 0x1C479F68: qemuDomainAttachChrDevice (qemu_hotplug.c:1509)
==31298==    by 0x1C4DD78D: qemuDomainAttachDeviceLive (qemu_driver.c:6948)
==31298==    by 0x1C4DD78D: qemuDomainAttachDeviceFlags (qemu_driver.c:7500)
==31298==    by 0x538FD05: virDomainAttachDevice (libvirt.c:10385)
==31298==    by 0x1428FF: remoteDispatchDomainAttachDevice (remote_dispatch.h:2485)
==31298==    by 0x1428FF: remoteDispatchDomainAttachDeviceHelper (remote_dispatch.h:2463)
==31298==    by 0x53EE1A1: virNetServerProgramDispatchCall (virnetserverprogram.c:437)
==31298==    by 0x53EE1A1: virNetServerProgramDispatch (virnetserverprogram.c:307)
==31298==    by 0x1501FC: virNetServerProcessMsg (virnetserver.c:172)
==31298==    by 0x1501FC: virNetServerHandleJob (virnetserver.c:193)
==31298==    by 0x52F27F4: virThreadPoolWorker (virthreadpool.c:145)
==31298==    by 0x52F218D: virThreadHelper (virthread.c:197)
==31298==    by 0x7C61DF2: start_thread (in /usr/lib64/libpthread-2.17.so)
==31298==    by 0x837305C: clone (in /usr/lib64/libc-2.17.so)
==31298==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==31298== 
==31298== 
==31298== Process terminating with default action of signal 11 (SIGSEGV)
==31298==  Access not within mapped region at address 0x0
==31298==    at 0x1C479B67: qemuDomainChrRemove (qemu_hotplug.c:1444)
==31298==    by 0x1C479F68: qemuDomainAttachChrDevice (qemu_hotplug.c:1509)
==31298==    by 0x1C4DD78D: qemuDomainAttachDeviceLive (qemu_driver.c:6948)
==31298==    by 0x1C4DD78D: qemuDomainAttachDeviceFlags (qemu_driver.c:7500)
==31298==    by 0x538FD05: virDomainAttachDevice (libvirt.c:10385)
==31298==    by 0x1428FF: remoteDispatchDomainAttachDevice (remote_dispatch.h:2485)
==31298==    by 0x1428FF: remoteDispatchDomainAttachDeviceHelper (remote_dispatch.h:2463)
==31298==    by 0x53EE1A1: virNetServerProgramDispatchCall (virnetserverprogram.c:437)
==31298==    by 0x53EE1A1: virNetServerProgramDispatch (virnetserverprogram.c:307)
==31298==    by 0x1501FC: virNetServerProcessMsg (virnetserver.c:172)
==31298==    by 0x1501FC: virNetServerHandleJob (virnetserver.c:193)
==31298==    by 0x52F27F4: virThreadPoolWorker (virthreadpool.c:145)
==31298==    by 0x52F218D: virThreadHelper (virthread.c:197)
==31298==    by 0x7C61DF2: start_thread (in /usr/lib64/libpthread-2.17.so)
==31298==    by 0x837305C: clone (in /usr/lib64/libc-2.17.so)

When libvirt still do qemuDomainAttachChrDevice and qemu crashed, libvirt call
qemuprocessstop to free the vm, after finish free, continue do qemuDomainAttachChrDevice then libvirtd crashed.

Do you think this issue the same with this bug? or need i open a new bug for this issue? Thanks in advance for your answer.

Thanks,
Luyao Huang

Comment 11 Luyao Huang 2014-11-18 05:36:12 UTC

Created attachment 958418 [details]
another issue valgrind log

Comment 12 Ján Tomko 2014-12-16 16:42:21 UTC

I think it's the same issue, we haven't been properly checking if the domain is still alive in a few functions:
https://www.redhat.com/archives/libvir-list/2014-December/msg00831.html

Comment 13 Ján Tomko 2015-01-07 15:45:04 UTC

v2: https://www.redhat.com/archives/libvir-list/2015-January/msg00150.html

Comment 14 Ján Tomko 2015-01-14 18:50:03 UTC

v3: https://www.redhat.com/archives/libvir-list/2015-January/msg00506.html

Comment 17 Luyao Huang 2015-01-27 10:36:19 UTC

Hi Jan,

when i try to verify this bug, but i still found libvirtd still crashed in these case(different reason):

1.# virsh list --all
 Id    Name                           State
----------------------------------------------------
 2     r7                             running

2.# cat lxcconsole.xml
  <console type='pty'>
      <target type='virtio' port='1'/>
    </console>

3. use gdb attach libvirtd set breakpoint at qemuDomainAttachChrDevice

4. # virsh attach-device r7 lxcconsole.xml

5.open another terminal kill qemu

# kill -11 26914


6.libvirtd failed to exit monitor then goto cleanup and return -1, but libvirtd will crash in qemuDomainAttachDeviceFlags:

1492	    if (qemuDomainChrInsert(vmdef, chr) < 0)
(gdb) 
1496	    qemuDomainObjEnterMonitor(driver, vm);
(gdb) 
1497	    if (qemuMonitorAttachCharDev(priv->mon, charAlias, &chr->source) < 0) {
(gdb) n
1509	        if (qemuDomainObjExitMonitor(driver, vm) < 0) {
(gdb) n
1530	    VIR_FREE(charAlias);
(gdb) 
1531	    VIR_FREE(devstr);
(gdb) 
1533	}
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007fe09158856c in free () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fe09158856c in free () from /lib64/libc.so.6
#1  0x00007fe09466effa in virFree (ptrptr=ptrptr@entry=0x7fe075d3a068) at util/viralloc.c:582
#2  0x00007fe0946e2739 in virDomainDeviceInfoClear (info=0x7fe075d3a068) at conf/domain_conf.c:2709
#3  0x00007fe0946e28fb in virDomainChrDefFree (def=0x7fe075d3a020) at conf/domain_conf.c:1651
#4  0x00007fe0946f0de9 in virDomainDeviceDefFree (def=def@entry=0x7fe0740040c0) at conf/domain_conf.c:1942
#5  0x00007fe07daa98ab in qemuDomainAttachDeviceFlags (dom=<optimized out>, xml=<optimized out>, flags=<optimized out>) at qemu/qemu_driver.c:7646
#6  0x00007fe094765dd6 in virDomainAttachDevice (domain=domain@entry=0x7fe075d36300, xml=0x7fe075aa3fe0 "  <console type='pty'>\n      <target type='virtio'/>\n    </console>\n") at libvirt.c:10385
#7  0x00007fe09520aaa0 in remoteDispatchDomainAttachDevice (server=<optimized out>, msg=<optimized out>, args=0x7fe074da12a0, rerr=0x7fe085af6c80, client=<optimized out>) at remote_dispatch.h:2485
#8  remoteDispatchDomainAttachDeviceHelper (server=<optimized out>, client=<optimized out>, msg=<optimized out>, rerr=0x7fe085af6c80, args=0x7fe074da12a0, ret=<optimized out>) at remote_dispatch.h:2463
#9  0x00007fe0947c4382 in virNetServerProgramDispatchCall (msg=0x7fe095aa1ae0, client=0x7fe095bbbab0, server=0x7fe095a91f10, prog=0x7fe095a9f000) at rpc/virnetserverprogram.c:437
#10 virNetServerProgramDispatch (prog=0x7fe095a9f000, server=server@entry=0x7fe095a91f10, client=0x7fe095bbbab0, msg=0x7fe095aa1ae0) at rpc/virnetserverprogram.c:307
#11 0x00007fe0952183fd in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7fe095a91f10) at rpc/virnetserver.c:172
#12 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7fe095a91f10) at rpc/virnetserver.c:193
#13 0x00007fe0946c7fa5 in virThreadPoolWorker (opaque=opaque@entry=0x7fe095a71b90) at util/virthreadpool.c:145
#14 0x00007fe0946c793e in virThreadHelper (data=<optimized out>) at util/virthread.c:197
#15 0x00007fe091ce7df5 in start_thread (arg=0x7fe085af7700) at pthread_create.c:308
#16 0x00007fe0915fe1ad in clone () from /lib64/libc.so.6
(gdb) 

So would you please help me to check out if it is the same issue? or need i open a new bug?

Comment 18 Ján Tomko 2015-01-27 14:21:38 UTC

This is the same issue, for attaching chardevs, the series didn't remove usage of freed data completely. Upstream patch posted:
https://www.redhat.com/archives/libvir-list/2015-January/msg00973.html

Comment 19 Luyao Huang 2015-01-27 14:24:19 UTC

(In reply to Jan Tomko from comment #18)
> This is the same issue, for attaching chardevs, the series didn't remove
> usage of freed data completely. Upstream patch posted:
> https://www.redhat.com/archives/libvir-list/2015-January/msg00973.html

Thanks a lot for your reply!

Comment 20 Ján Tomko 2015-01-28 11:21:32 UTC

v2 of the patch:
https://www.redhat.com/archives/libvir-list/2015-January/msg00993.html

Pushed upstream as:
commit daf51be5f1b0f7b41c0813d43d6b66edfbe4f6d9
    Split qemuDomainChrInsert into two parts
commit 21e0e8866e341da74e296ca3cf2d97812e847a66
    hotplug: only add a chardev to vmdef after monitor call
git describe: v1.2.12-29-g21e0e88

Comment 22 Ján Tomko 2015-01-28 14:07:12 UTC

Actually, let's track the crash on device detach in this bug and use bug 1186765 for the crash on chardev attach.

Comment 23 Luyao Huang 2015-01-29 03:48:36 UTC

Verify this bug with libvirt-1.2.8-15.el7.x86_64:

1.prepare a running guest with a gluster disk
# virsh domblklist test3
Target     Source
------------------------------------------------
hda        /var/lib/libvirt/images/test3.img
vda        /dev/vg2/test
vdb        /loopback/test3.img
vdd        gluster-vol1/rh6.img

2.use gdb attach libvirtd and set breakpoint in qemuDomainRemoveDiskDevice:

# gdb libvirtd `pidof libvirtd`

3.use a terminal to do virsh:
# virsh -k0 -K0 detach-disk test3 vdd

4.use another terminal crash qemu when libvirtd into monitor
2562	    qemuDomainObjEnterMonitor(driver, vm);
(gdb) 
2563	    qemuMonitorDriveDel(priv->mon, drivestr);

another terminal
# kill -11 32069

5.check the code is really failed qemuDomainObjExitMonitor and no crash after return -1:
2567	    if (qemuDomainObjExitMonitor(driver, vm) < 0)
(gdb) 
2560	        return -1;
(gdb) 
2601	}

5.check the client error:

# virsh -k0 -K0 detach-disk test3 vdd
error: Failed to detach disk
error: operation failed: domain is no longer running


I will test the other sense (patch fixed) in these days, if i find some other crash i will open a new bug to track.

Comment 25 errata-xmlrpc 2015-03-05 07:47:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0323.html

Note You need to log in before you can comment on or make changes to this bug.