Bug 1161024
Summary: | libvirtd crashes after device hot-unplug crashes qemu | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Luyao Huang <lhuang> | ||||||||||
Component: | libvirt | Assignee: | Ján Tomko <jtomko> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | high | ||||||||||||
Version: | 7.1 | CC: | dyuan, jtomko, mzhan, ovasik, rbalakri, zhwang, zpeng | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | libvirt-1.2.8-14.el7 | Doc Type: | Bug Fix | ||||||||||
Doc Text: |
Cause: Libvirt wasn't checking if the domain was still alive after exiting the monitor.
Consequence: If the domain crashed, freed data from its defintion could be accessed.
Fix: Check if domain is alive before accessing its defintion.
Result: No crash.
|
Story Points: | --- | ||||||||||
Clone Of: | |||||||||||||
: | 1186765 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2015-03-05 07:47:03 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1186765 | ||||||||||||
Attachments: |
|
Description
Luyao Huang
2014-11-06 08:13:56 UTC
Full qemu crash backtrace: Thread 5 (Thread 0x7fbd64ba7700 (LWP 18690)): #0 0x00007fbd72c05f7d in __lll_lock_wait () from /usr/lib64/libpthread.so.0 #1 0x00007fbd72c01d41 in _L_lock_790 () from /usr/lib64/libpthread.so.0 #2 0x00007fbd72c01c47 in pthread_mutex_lock () from /usr/lib64/libpthread.so.0 #3 0x00007fbd7437ecd9 in qemu_mutex_lock (mutex=mutex@entry=0x7fbd7480b340 <qemu_global_mutex>) at util/qemu-thread-posix.c:76 #4 0x00007fbd741408b0 in qemu_mutex_lock_iothread () at /usr/src/debug/qemu-2.1.2/cpus.c:1053 #5 0x00007fbd741505e4 in kvm_cpu_exec (cpu=cpu@entry=0x7fbd76c71880) at /usr/src/debug/qemu-2.1.2/kvm-all.c:1724 #6 0x00007fbd7413f932 in qemu_kvm_cpu_thread_fn (arg=0x7fbd76c71880) at /usr/src/debug/qemu-2.1.2/cpus.c:883 #7 0x00007fbd72bffdf3 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007fbd6da7e05d in clone () from /usr/lib64/libc.so.6 Thread 4 (Thread 0x7fbd5ffff700 (LWP 18691)): #0 0x00007fbd6da75147 in ioctl () from /usr/lib64/libc.so.6 #1 0x00007fbd74150525 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7fbd76cacc60, type=type@entry=44672) at /usr/src/debug/qemu-2.1.2/kvm-all.c:1853 #2 0x00007fbd741505dc in kvm_cpu_exec (cpu=cpu@entry=0x7fbd76cacc60) at /usr/src/debug/qemu-2.1.2/kvm-all.c:1722 #3 0x00007fbd7413f932 in qemu_kvm_cpu_thread_fn (arg=0x7fbd76cacc60) at /usr/src/debug/qemu-2.1.2/cpus.c:883 #4 0x00007fbd72bffdf3 in start_thread () from /usr/lib64/libpthread.so.0 #5 0x00007fbd6da7e05d in clone () from /usr/lib64/libc.so.6 Thread 3 (Thread 0x7fbd5f5ff700 (LWP 18693)): #0 0x00007fbd72c058a0 in sem_timedwait () from /usr/lib64/libpthread.so.0 #1 0x00007fbd7437ef07 in qemu_sem_timedwait (sem=sem@entry=0x7fbd76d0fa78, ms=ms@entry=10000) at util/qemu-thread-posix.c:257 #2 0x00007fbd743294bc in worker_thread (opaque=0x7fbd76d0f9e0) at thread-pool.c:96 #3 0x00007fbd72bffdf3 in start_thread () from /usr/lib64/libpthread.so.0 #4 0x00007fbd6da7e05d in clone () from /usr/lib64/libc.so.6 Thread 2 (Thread 0x7fbd5e7ff700 (LWP 18694)): #0 0x00007fbd6da73a8d in poll () from /usr/lib64/libc.so.6 #1 0x00007fbd6ec8dc17 in red_worker_main () from /usr/lib64/libspice-server.so.1 #2 0x00007fbd72bffdf3 in start_thread () from /usr/lib64/libpthread.so.0 #3 0x00007fbd6da7e05d in clone () from /usr/lib64/libc.so.6 Thread 1 (Thread 0x7fbd74015a40 (LWP 18685)): #0 drive_del (dinfo=0x0) at blockdev.c:218 #1 0x00007fbd74209b46 in do_drive_del (mon=<optimized out>, qdict=<optimized out>, ret_data=<optimized out>) at blockdev.c:1801 #2 0x00007fbd74143ea7 in qmp_call_cmd (cmd=<optimized out>, params=0x7fbd76e31350, mon=0x7fbd76c388b0) at /usr/src/debug/qemu-2.1.2/monitor.c:5038 #3 handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /usr/src/debug/qemu-2.1.2/monitor.c:5104 #4 0x00007fbd7437b4a2 in json_message_process_token (lexer=0x7fbd76c382c0, token=0x7fbd76c19330, type=JSON_OPERATOR, x=93, y=96) at qobject/json-streamer.c:87 #5 0x00007fbd7438d25f in json_lexer_feed_char (lexer=lexer@entry=0x7fbd76c382c0, ch=<optimized out>, flush=flush@entry=false) at qobject/json-lexer.c:303 #6 0x00007fbd7438d32e in json_lexer_feed (lexer=0x7fbd76c382c0, buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:356 #7 0x00007fbd7437b639 in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>) at qobject/json-streamer.c:110 #8 0x00007fbd74141e3f in monitor_control_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-2.1.2/monitor.c:5125 #9 0x00007fbd74217d40 in qemu_chr_be_write (len=<optimized out>, buf=0x7fff55482190 "}\220\302v\275\177", s=0x7fbd76c28d00) at qemu-char.c:213 #10 tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x7fbd76c28d00) at qemu-char.c:2729 #11 0x00007fbd724f99ba in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 #12 0x00007fbd74335e58 in glib_pollfds_poll () at main-loop.c:190 #13 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:235 ---Type <return> to continue, or q <return> to quit--- #14 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:484 #15 0x00007fbd7411977e in main_loop () at vl.c:2010 #16 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4552 Another information: And this guest won't crashed libvirtd when i first do it.After migrate another guest which also use the iscsi disk.Than this guest will crash libvirtd, i don't know why.Maybe this is a main point to reproduce this crashed. 1. # virsh dumpxml test3 <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source file='/var/lib/libvirt/images/test3.img'/> <target dev='hda' bus='ide'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <disk type='network' device='lun'> <driver name='qemu' type='raw' cache='none'/> <source protocol='iscsi' name='iqn.2003-01.org.linux-iscsi.test1.x8664:sn.05011d8e73cb/0'> <host name='10.66.6.12' port='3260'/> </source> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='1' unit='0'/> </disk> # virsh migrate test3 --live --copy-storage-all qemu+ssh://10.66.6.12/system root.6.12's password: 2.on target # virsh detach-disk test3 sda Disk detached successfully # virsh detach-disk test3 sda error: failed to get domain 'test3' error: Domain not found: no domain with matching name 'test3' # virsh list --all Id Name State ---------------------------------------------------- 3.use another guest just like comment 0.And do the same step with comment 0. I cannot reproduce the libvirtd crash. Could you try running it under valgrind, it looks like the memory was overwritten by something else: valgrind --leak-check=full libvirtd qemu-kvm-rhev-2.1.2-6.el7.x86_64 does crash for me (guessing from the line numbers in the backtrace in comment 1, you used a different version). However it no longer crashes with the fix for stoppping the NBD server after successful migration (see bug 1160212). (In reply to Jan Tomko from comment #3) > I cannot reproduce the libvirtd crash. Could you try running it under > valgrind, it looks like the memory was overwritten by something else: > valgrind --leak-check=full libvirtd > > qemu-kvm-rhev-2.1.2-6.el7.x86_64 does crash for me (guessing from the line > numbers in the backtrace in comment 1, you used a different version). > However it no longer crashes with the fix for stoppping the NBD server after > successful migration (see bug 1160212). Found a lot of memory leak and invalid free i will attach them Created attachment 954431 [details]
invalid free and read/write
Created attachment 954435 [details]
memory leak
Created attachment 955283 [details]
libvirtd debug log
Hi Jan, I found another libvirtd crashed, and after i check the backtrace, i think the issue just like this bug, and this is steps: 1.# virsh list --all Id Name State ---------------------------------------------------- 2 r7 running 2.# cat lxcconsole.xml <console type='pty'> <target type='virtio' port='1'/> </console> 3. # virsh attach-device r7 lxcconsole.xml error: Failed to attach device from lxcconsole.xml error: Cannot recv data: Connection reset by peer error: Failed to reconnect to the hypervisor And the casue is qemu crash than libvirt crash: ==31298== Invalid read of size 8 ==31298== at 0x1C479B67: qemuDomainChrRemove (qemu_hotplug.c:1444) ==31298== by 0x1C479F68: qemuDomainAttachChrDevice (qemu_hotplug.c:1509) ==31298== by 0x1C4DD78D: qemuDomainAttachDeviceLive (qemu_driver.c:6948) ==31298== by 0x1C4DD78D: qemuDomainAttachDeviceFlags (qemu_driver.c:7500) ==31298== by 0x538FD05: virDomainAttachDevice (libvirt.c:10385) ==31298== by 0x1428FF: remoteDispatchDomainAttachDevice (remote_dispatch.h:2485) ==31298== by 0x1428FF: remoteDispatchDomainAttachDeviceHelper (remote_dispatch.h:2463) ==31298== by 0x53EE1A1: virNetServerProgramDispatchCall (virnetserverprogram.c:437) ==31298== by 0x53EE1A1: virNetServerProgramDispatch (virnetserverprogram.c:307) ==31298== by 0x1501FC: virNetServerProcessMsg (virnetserver.c:172) ==31298== by 0x1501FC: virNetServerHandleJob (virnetserver.c:193) ==31298== by 0x52F27F4: virThreadPoolWorker (virthreadpool.c:145) ==31298== by 0x52F218D: virThreadHelper (virthread.c:197) ==31298== by 0x7C61DF2: start_thread (in /usr/lib64/libpthread-2.17.so) ==31298== by 0x837305C: clone (in /usr/lib64/libc-2.17.so) ==31298== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==31298== ==31298== ==31298== Process terminating with default action of signal 11 (SIGSEGV) ==31298== Access not within mapped region at address 0x0 ==31298== at 0x1C479B67: qemuDomainChrRemove (qemu_hotplug.c:1444) ==31298== by 0x1C479F68: qemuDomainAttachChrDevice (qemu_hotplug.c:1509) ==31298== by 0x1C4DD78D: qemuDomainAttachDeviceLive (qemu_driver.c:6948) ==31298== by 0x1C4DD78D: qemuDomainAttachDeviceFlags (qemu_driver.c:7500) ==31298== by 0x538FD05: virDomainAttachDevice (libvirt.c:10385) ==31298== by 0x1428FF: remoteDispatchDomainAttachDevice (remote_dispatch.h:2485) ==31298== by 0x1428FF: remoteDispatchDomainAttachDeviceHelper (remote_dispatch.h:2463) ==31298== by 0x53EE1A1: virNetServerProgramDispatchCall (virnetserverprogram.c:437) ==31298== by 0x53EE1A1: virNetServerProgramDispatch (virnetserverprogram.c:307) ==31298== by 0x1501FC: virNetServerProcessMsg (virnetserver.c:172) ==31298== by 0x1501FC: virNetServerHandleJob (virnetserver.c:193) ==31298== by 0x52F27F4: virThreadPoolWorker (virthreadpool.c:145) ==31298== by 0x52F218D: virThreadHelper (virthread.c:197) ==31298== by 0x7C61DF2: start_thread (in /usr/lib64/libpthread-2.17.so) ==31298== by 0x837305C: clone (in /usr/lib64/libc-2.17.so) When libvirt still do qemuDomainAttachChrDevice and qemu crashed, libvirt call qemuprocessstop to free the vm, after finish free, continue do qemuDomainAttachChrDevice then libvirtd crashed. Do you think this issue the same with this bug? or need i open a new bug for this issue? Thanks in advance for your answer. Thanks, Luyao Huang Created attachment 958418 [details]
another issue valgrind log
I think it's the same issue, we haven't been properly checking if the domain is still alive in a few functions: https://www.redhat.com/archives/libvir-list/2014-December/msg00831.html Hi Jan, when i try to verify this bug, but i still found libvirtd still crashed in these case(different reason): 1.# virsh list --all Id Name State ---------------------------------------------------- 2 r7 running 2.# cat lxcconsole.xml <console type='pty'> <target type='virtio' port='1'/> </console> 3. use gdb attach libvirtd set breakpoint at qemuDomainAttachChrDevice 4. # virsh attach-device r7 lxcconsole.xml 5.open another terminal kill qemu # kill -11 26914 6.libvirtd failed to exit monitor then goto cleanup and return -1, but libvirtd will crash in qemuDomainAttachDeviceFlags: 1492 if (qemuDomainChrInsert(vmdef, chr) < 0) (gdb) 1496 qemuDomainObjEnterMonitor(driver, vm); (gdb) 1497 if (qemuMonitorAttachCharDev(priv->mon, charAlias, &chr->source) < 0) { (gdb) n 1509 if (qemuDomainObjExitMonitor(driver, vm) < 0) { (gdb) n 1530 VIR_FREE(charAlias); (gdb) 1531 VIR_FREE(devstr); (gdb) 1533 } (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. 0x00007fe09158856c in free () from /lib64/libc.so.6 (gdb) bt #0 0x00007fe09158856c in free () from /lib64/libc.so.6 #1 0x00007fe09466effa in virFree (ptrptr=ptrptr@entry=0x7fe075d3a068) at util/viralloc.c:582 #2 0x00007fe0946e2739 in virDomainDeviceInfoClear (info=0x7fe075d3a068) at conf/domain_conf.c:2709 #3 0x00007fe0946e28fb in virDomainChrDefFree (def=0x7fe075d3a020) at conf/domain_conf.c:1651 #4 0x00007fe0946f0de9 in virDomainDeviceDefFree (def=def@entry=0x7fe0740040c0) at conf/domain_conf.c:1942 #5 0x00007fe07daa98ab in qemuDomainAttachDeviceFlags (dom=<optimized out>, xml=<optimized out>, flags=<optimized out>) at qemu/qemu_driver.c:7646 #6 0x00007fe094765dd6 in virDomainAttachDevice (domain=domain@entry=0x7fe075d36300, xml=0x7fe075aa3fe0 " <console type='pty'>\n <target type='virtio'/>\n </console>\n") at libvirt.c:10385 #7 0x00007fe09520aaa0 in remoteDispatchDomainAttachDevice (server=<optimized out>, msg=<optimized out>, args=0x7fe074da12a0, rerr=0x7fe085af6c80, client=<optimized out>) at remote_dispatch.h:2485 #8 remoteDispatchDomainAttachDeviceHelper (server=<optimized out>, client=<optimized out>, msg=<optimized out>, rerr=0x7fe085af6c80, args=0x7fe074da12a0, ret=<optimized out>) at remote_dispatch.h:2463 #9 0x00007fe0947c4382 in virNetServerProgramDispatchCall (msg=0x7fe095aa1ae0, client=0x7fe095bbbab0, server=0x7fe095a91f10, prog=0x7fe095a9f000) at rpc/virnetserverprogram.c:437 #10 virNetServerProgramDispatch (prog=0x7fe095a9f000, server=server@entry=0x7fe095a91f10, client=0x7fe095bbbab0, msg=0x7fe095aa1ae0) at rpc/virnetserverprogram.c:307 #11 0x00007fe0952183fd in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7fe095a91f10) at rpc/virnetserver.c:172 #12 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7fe095a91f10) at rpc/virnetserver.c:193 #13 0x00007fe0946c7fa5 in virThreadPoolWorker (opaque=opaque@entry=0x7fe095a71b90) at util/virthreadpool.c:145 #14 0x00007fe0946c793e in virThreadHelper (data=<optimized out>) at util/virthread.c:197 #15 0x00007fe091ce7df5 in start_thread (arg=0x7fe085af7700) at pthread_create.c:308 #16 0x00007fe0915fe1ad in clone () from /lib64/libc.so.6 (gdb) So would you please help me to check out if it is the same issue? or need i open a new bug? This is the same issue, for attaching chardevs, the series didn't remove usage of freed data completely. Upstream patch posted: https://www.redhat.com/archives/libvir-list/2015-January/msg00973.html (In reply to Jan Tomko from comment #18) > This is the same issue, for attaching chardevs, the series didn't remove > usage of freed data completely. Upstream patch posted: > https://www.redhat.com/archives/libvir-list/2015-January/msg00973.html Thanks a lot for your reply! v2 of the patch: https://www.redhat.com/archives/libvir-list/2015-January/msg00993.html Pushed upstream as: commit daf51be5f1b0f7b41c0813d43d6b66edfbe4f6d9 Split qemuDomainChrInsert into two parts commit 21e0e8866e341da74e296ca3cf2d97812e847a66 hotplug: only add a chardev to vmdef after monitor call git describe: v1.2.12-29-g21e0e88 Actually, let's track the crash on device detach in this bug and use bug 1186765 for the crash on chardev attach. Verify this bug with libvirt-1.2.8-15.el7.x86_64: 1.prepare a running guest with a gluster disk # virsh domblklist test3 Target Source ------------------------------------------------ hda /var/lib/libvirt/images/test3.img vda /dev/vg2/test vdb /loopback/test3.img vdd gluster-vol1/rh6.img 2.use gdb attach libvirtd and set breakpoint in qemuDomainRemoveDiskDevice: # gdb libvirtd `pidof libvirtd` 3.use a terminal to do virsh: # virsh -k0 -K0 detach-disk test3 vdd 4.use another terminal crash qemu when libvirtd into monitor 2562 qemuDomainObjEnterMonitor(driver, vm); (gdb) 2563 qemuMonitorDriveDel(priv->mon, drivestr); another terminal # kill -11 32069 5.check the code is really failed qemuDomainObjExitMonitor and no crash after return -1: 2567 if (qemuDomainObjExitMonitor(driver, vm) < 0) (gdb) 2560 return -1; (gdb) 2601 } 5.check the client error: # virsh -k0 -K0 detach-disk test3 vdd error: Failed to detach disk error: operation failed: domain is no longer running I will test the other sense (patch fixed) in these days, if i find some other crash i will open a new bug to track. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0323.html |