Bug 886080
| Summary: | Qemu segmentation fault when resume VM from stop at rebooting process after do some hot-plug/unplug and S3 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Sibiao Luo <sluo> | ||||
| Component: | qemu-kvm | Assignee: | Stefan Hajnoczi <stefanha> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 6.4 | CC: | acathrow, areis, asias, bsarathy, chayang, flang, juzhang, lnovich, michen, minovotn, mkenneth, pbonzini, qzhang, sluo, stefanha, virt-bugs, virt-maint, xfu | ||||
| Target Milestone: | rc | Keywords: | TestOnly | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | qemu-kvm-0.12.1.2-2.370.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-11-21 05:58:41 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 761491, 912287 | ||||||
| Attachments: |
|
||||||
Created attachment 661433 [details]
the guest kernel logs after do system_set and stop/cont.
(In reply to comment #0) > {"timestamp": {"seconds": 1355230564, "microseconds": 629887}, "event": > "STOP"} > {"timestamp": {"seconds": 1355230566, "microseconds": 316200}, "event": > "RESUME"} paste it by mistake, this two logs was generated when step 4. > 4.stop VM at rebooting process and resume it. > (qemu) stop > (qemu) cont FYI https://bugzilla.redhat.com/show_bug.cgi?id=822386 The scenario seems same but bt is different. I suspect what's happening is when system_reset is done, qemu doesn't forget older virtio-ring data, causing it to dereference stale guest memory. This bug should be triggered even without s3/s4. system_reset is a corner case, since it's not a way to cleanly shutdown a guest: it's like pressing the reset switch on a physical host. I'd say this is NOTABUG due to system_reset being done, but maybe Stefan wants to look at the backtrace. Reproduce this bug as follow version:
Host:
# uname -r
2.6.32-395.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.356.el6.x86_64
Guest:
2.6.32-358.el6.x86_64
Steps:
1.Boot guest
2.Do S3 and hot-remove virtio_blk data disk.
{"timestamp": {"seconds": 1372831546, "microseconds": 147564}, "event": "SUSPEND"}
{"timestamp": {"seconds": 1372831546, "microseconds": 446599}, "event": "WAKEUP"}
{"execute":"device_del","arguments":{"id":"sluo_disk"}}
{"return": {}}
2.Do S3 and hot-add data disk.
{"timestamp": {"seconds": 1372831620, "microseconds": 538826}, "event": "SUSPEND"}
{"timestamp": {"seconds": 1372831625, "microseconds": 772250}, "event": "WAKEUP"}
{"execute":"__com.redhat_drive_add", "arguments": {"file":"/home/test.qcow2","format":"qcow2","id":"data-disk"}}
{"return": {}}
{"execute":"device_add","arguments":{"driver":"virtio-blk-pci","drive":"data-disk","id":"sluo_disk"}}
{"return": {}}
3.do S3 and system_reset.
{"timestamp": {"seconds": 1372831700, "microseconds": 361727}, "event": "SUSPEND"}
{"timestamp": {"seconds": 1372831700, "microseconds": 524309}, "event": "WAKEUP"}
{ "execute": "system_reset" }
{"return": {}}
{"timestamp": {"seconds": 1372831725, "microseconds": 759034}, "event": "RESET"}
{"timestamp": {"seconds": 1372831730, "microseconds": 155271}, "event": "STOP"}
{"timestamp": {"seconds": 1372831731, "microseconds": 131282}, "event": "RESUME"}
5.stop VM at rebooting process and resume it.
(qemu) stop
(qemu) cont
Results:
(qemu) c
(qemu)
Program received signal SIGSEGV, Segmentation fault.
virtio_blk_handle_request (req=0xaa, mrb=0x7fffffffc2c0) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:379
379 if (req->elem.out_num < 1 || req->elem.in_num < 1) {
(gdb) bt
#0 virtio_blk_handle_request (req=0xaa, mrb=0x7fffffffc2c0)
at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:379
#1 0x00007ffff7df3ddb in virtio_blk_dma_restart_bh (opaque=0x7ffff8a6f810)
at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:471
#2 0x00007ffff7e15ff1 in qemu_bh_poll () at /usr/src/debug/qemu-kvm-0.12.1.2/async.c:70
#3 0x00007ffff7ddf419 in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4017
#4 0x00007ffff7e0197a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244
#5 0x00007ffff7de2008 in main_loop (argc=65, argv=<value optimized out>, envp=<value optimized out>)
at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4187
#6 main (argc=65, argv=<value optimized out>, envp=<value optimized out>)
at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6525
(gdb)
Verify this bug as follow version:
Host:
# uname -r
2.6.32-395.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.377.el6.x86_64
Guest:
2.6.32-358.el6.x86_64
Steps as same as reproduce
Results:
Tried more than 5 times, not hit call trace ,guest work well .
Addtional info:
1)As comment#5 that tried it without s3/s4. Also have no such issue any more, guest work well.
According to above test ,this bug fixed.
Setting to VERIFIED according to comment 17. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-1553.html |
Description of problem: do some hot-plug/unplug and S3, then reboot the guest and stop it at rebooting process, then do resume, the qemu will core dump. btw, if i did not do some hot-plug/unplug and S3, just system_reset and stop VM at rebooting process and then resume it, the qemu is well. Version-Release number of selected component (if applicable): host info: # uname -r && rpm -q qemu-kvm 2.6.32-347.el6.x86_64 qemu-kvm-0.12.1.2-2.340.el6.x86_64 spice-gtk-0.14-5.el6.x86_64 spice-server-0.12.0-7.el6.x86_64 spice-gtk-tools-0.14-5.el6.x86_64 guest info: # uname -r 2.6.32-347.el6.x86_64 How reproducible: always (4/7) Steps to Reproduce: 1.do S3 and hot-remove virtio_blk data disk. {"timestamp": {"seconds": 1355230505, "microseconds": 179036}, "event": "SUSPEND"} {"timestamp": {"seconds": 1355230508, "microseconds": 67717}, "event": "WAKEUP"} {"execute":"device_del","arguments":{"id":"sluo_disk"}} {"return": {}} 2.hot-add data disk and do S3. {"timestamp": {"seconds": 1355230526, "microseconds": 469966}, "event": "SUSPEND"} {"timestamp": {"seconds": 1355230526, "microseconds": 595620}, "event": "WAKEUP"} {"execute":"__com.redhat_drive_add", "arguments": {"file":"/dev/mapper/mpathc","format":"qcow2","id":"data-disk"}} {"return": {}} {"execute":"device_add","arguments":{"driver":"virtio-blk-pci","drive":"data-disk","id":"sluo_disk"}} {"return": {}} 3.do S3 and system_reset. {"timestamp": {"seconds": 1355230543, "microseconds": 260324}, "event": "SUSPEND"} {"timestamp": {"seconds": 1355230546, "microseconds": 377548}, "event": "WAKEUP"} { "execute": "system_reset" } {"return": {}} {"timestamp": {"seconds": 1355230553, "microseconds": 930122}, "event": "RESET"} {"timestamp": {"seconds": 1355230564, "microseconds": 629887}, "event": "STOP"} {"timestamp": {"seconds": 1355230566, "microseconds": 316200}, "event": "RESUME"} 4.stop VM at rebooting process and resume it. (qemu) stop (qemu) cont Actual results: after step 4, the qemu segmentation fault. (qemu) stop (qemu) cont (qemu) Program received signal SIGSEGV, Segmentation fault. virtio_blk_handle_request (req=0xa8000000a800, mrb=0x7fffffffbf90) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:373 373 if (req->elem.out_num < 1 || req->elem.in_num < 1) { (gdb) bt #0 virtio_blk_handle_request (req=0xa8000000a800, mrb=0x7fffffffbf90) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:373 #1 0x00007ffff7df713b in virtio_blk_dma_restart_bh (opaque=0x7ffff8874860) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:450 #2 0x00007ffff7e17711 in qemu_bh_poll () at async.c:70 #3 0x00007ffff7de2bd9 in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4017 #4 0x00007ffff7e04c2a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244 #5 0x00007ffff7de57c8 in main_loop (argc=69, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4187 #6 main (argc=69, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6525 (gdb) Expected results: resume guest successfully without any call dump. Additional info: /usr/libexec/qemu-kvm -M rhel6.4.0 -cpu Nehalem -enable-kvm -m 4096 -smp 4,sockets=2,cores=2,threads=1 -usb -device usb-tablet,id=input0 -name sluo -uuid 990ea161-6b67-47b2-b803-19fb01d30d30 -rtc base=localtime,clock=host,driftfix=slew -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/dev/mapper/mpathb,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=9C:4A:92:E0:D1:26,bus=pci.0,addr=0x5 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5931,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x6 -device hda-duplex -device usb-ehci,id=ehci,addr=0x7 -chardev spicevmc,name=usbredir,id=usbredirchardev1 -device usb-redir,chardev=usbredirchardev1,id=usbredirdev1,bus=ehci.0,debug=4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -serial unix:/tmp/ttyS0,server,nowait -qmp tcp:0:4444,server,nowait -monitor stdio -drive file=/dev/mapper/mpathc,if=none,id=data-disk,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,drive=data-disk,id=sluo_disk