Bug 886080

Summary: Qemu segmentation fault when resume VM from stop at rebooting process after do some hot-plug/unplug and S3
Product: Red Hat Enterprise Linux 6 Reporter: Sibiao Luo <sluo>
Component: qemu-kvmAssignee: Stefan Hajnoczi <stefanha>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.4CC: acathrow, areis, asias, bsarathy, chayang, flang, juzhang, lnovich, michen, minovotn, mkenneth, pbonzini, qzhang, sluo, stefanha, virt-bugs, virt-maint, xfu
Target Milestone: rcKeywords: TestOnly
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.370.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-21 05:58:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 761491, 912287    
Attachments:
Description Flags
the guest kernel logs after do system_set and stop/cont. none

Description Sibiao Luo 2012-12-11 13:28:27 UTC
Description of problem:
do some hot-plug/unplug and S3, then reboot the guest and stop it at rebooting process, then do resume, the qemu will core dump. 
btw, if i did not do some hot-plug/unplug and S3, just system_reset and stop VM at rebooting process and then resume it, the qemu is well.

Version-Release number of selected component (if applicable):
host info:
# uname -r && rpm -q qemu-kvm
2.6.32-347.el6.x86_64
qemu-kvm-0.12.1.2-2.340.el6.x86_64
spice-gtk-0.14-5.el6.x86_64
spice-server-0.12.0-7.el6.x86_64
spice-gtk-tools-0.14-5.el6.x86_64
guest info:
# uname -r
2.6.32-347.el6.x86_64

How reproducible:
always (4/7)

Steps to Reproduce:
1.do S3 and hot-remove virtio_blk data disk.
{"timestamp": {"seconds": 1355230505, "microseconds": 179036}, "event": "SUSPEND"}
{"timestamp": {"seconds": 1355230508, "microseconds": 67717}, "event": "WAKEUP"}

{"execute":"device_del","arguments":{"id":"sluo_disk"}}
{"return": {}}
2.hot-add data disk and do S3.
{"timestamp": {"seconds": 1355230526, "microseconds": 469966}, "event": "SUSPEND"}
{"timestamp": {"seconds": 1355230526, "microseconds": 595620}, "event": "WAKEUP"}

{"execute":"__com.redhat_drive_add", "arguments": {"file":"/dev/mapper/mpathc","format":"qcow2","id":"data-disk"}}
{"return": {}}
{"execute":"device_add","arguments":{"driver":"virtio-blk-pci","drive":"data-disk","id":"sluo_disk"}}
{"return": {}}
3.do S3 and system_reset.
{"timestamp": {"seconds": 1355230543, "microseconds": 260324}, "event": "SUSPEND"}
{"timestamp": {"seconds": 1355230546, "microseconds": 377548}, "event": "WAKEUP"}

{ "execute": "system_reset" }
{"return": {}}
{"timestamp": {"seconds": 1355230553, "microseconds": 930122}, "event": "RESET"}
{"timestamp": {"seconds": 1355230564, "microseconds": 629887}, "event": "STOP"}
{"timestamp": {"seconds": 1355230566, "microseconds": 316200}, "event": "RESUME"}
4.stop VM at rebooting process and resume it.
(qemu) stop
(qemu) cont

Actual results:
after step 4, the qemu segmentation fault.
(qemu) stop
(qemu) cont
(qemu) 
Program received signal SIGSEGV, Segmentation fault.
virtio_blk_handle_request (req=0xa8000000a800, mrb=0x7fffffffbf90) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:373
373        if (req->elem.out_num < 1 || req->elem.in_num < 1) {

(gdb) bt
#0  virtio_blk_handle_request (req=0xa8000000a800, mrb=0x7fffffffbf90) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:373
#1  0x00007ffff7df713b in virtio_blk_dma_restart_bh (opaque=0x7ffff8874860) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:450
#2  0x00007ffff7e17711 in qemu_bh_poll () at async.c:70
#3  0x00007ffff7de2bd9 in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4017
#4  0x00007ffff7e04c2a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244
#5  0x00007ffff7de57c8 in main_loop (argc=69, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4187
#6  main (argc=69, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6525
(gdb)  

Expected results:
resume guest successfully without any call dump.

Additional info:
/usr/libexec/qemu-kvm -M rhel6.4.0 -cpu Nehalem -enable-kvm -m 4096 -smp 4,sockets=2,cores=2,threads=1 -usb -device usb-tablet,id=input0 -name sluo -uuid 990ea161-6b67-47b2-b803-19fb01d30d30 -rtc base=localtime,clock=host,driftfix=slew -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/dev/mapper/mpathb,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=9C:4A:92:E0:D1:26,bus=pci.0,addr=0x5 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5931,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x6 -device hda-duplex -device usb-ehci,id=ehci,addr=0x7 -chardev spicevmc,name=usbredir,id=usbredirchardev1 -device usb-redir,chardev=usbredirchardev1,id=usbredirdev1,bus=ehci.0,debug=4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -serial unix:/tmp/ttyS0,server,nowait -qmp tcp:0:4444,server,nowait -monitor stdio -drive file=/dev/mapper/mpathc,if=none,id=data-disk,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,drive=data-disk,id=sluo_disk

Comment 1 Sibiao Luo 2012-12-11 13:29:49 UTC
Created attachment 661433 [details]
the guest kernel logs after do system_set and stop/cont.

Comment 3 Sibiao Luo 2012-12-11 13:34:31 UTC
(In reply to comment #0)
> {"timestamp": {"seconds": 1355230564, "microseconds": 629887}, "event":
> "STOP"}
> {"timestamp": {"seconds": 1355230566, "microseconds": 316200}, "event":
> "RESUME"}
paste it by mistake, this two logs was generated when step 4.
> 4.stop VM at rebooting process and resume it.
> (qemu) stop
> (qemu) cont

Comment 4 juzhang 2012-12-12 02:18:09 UTC
FYI
https://bugzilla.redhat.com/show_bug.cgi?id=822386

The scenario seems same but bt is different.

Comment 5 Amit Shah 2013-02-21 13:51:49 UTC
I suspect what's happening is when system_reset is done, qemu doesn't forget older virtio-ring data, causing it to dereference stale guest memory.

This bug should be triggered even without s3/s4.

system_reset is a corner case, since it's not a way to cleanly shutdown a guest: it's like pressing the reset switch on a physical host.

Comment 7 Amit Shah 2013-05-21 12:59:32 UTC
I'd say this is NOTABUG due to system_reset being done, but maybe Stefan wants to look at the backtrace.

Comment 17 langfang 2013-07-03 06:11:27 UTC
Reproduce this bug as follow version:
Host:
# uname -r 
2.6.32-395.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.356.el6.x86_64

Guest:
2.6.32-358.el6.x86_64 

Steps:
1.Boot guest
2.Do S3 and hot-remove virtio_blk data disk.
{"timestamp": {"seconds": 1372831546, "microseconds": 147564}, "event": "SUSPEND"}
{"timestamp": {"seconds": 1372831546, "microseconds": 446599}, "event": "WAKEUP"}
{"execute":"device_del","arguments":{"id":"sluo_disk"}}
{"return": {}}

2.Do S3 and hot-add data disk.

{"timestamp": {"seconds": 1372831620, "microseconds": 538826}, "event": "SUSPEND"}
{"timestamp": {"seconds": 1372831625, "microseconds": 772250}, "event": "WAKEUP"}
{"execute":"__com.redhat_drive_add", "arguments": {"file":"/home/test.qcow2","format":"qcow2","id":"data-disk"}}
{"return": {}}
{"execute":"device_add","arguments":{"driver":"virtio-blk-pci","drive":"data-disk","id":"sluo_disk"}}
{"return": {}}
3.do S3 and system_reset.
{"timestamp": {"seconds": 1372831700, "microseconds": 361727}, "event": "SUSPEND"}
{"timestamp": {"seconds": 1372831700, "microseconds": 524309}, "event": "WAKEUP"}
{ "execute": "system_reset" }
{"return": {}}
{"timestamp": {"seconds": 1372831725, "microseconds": 759034}, "event": "RESET"}
{"timestamp": {"seconds": 1372831730, "microseconds": 155271}, "event": "STOP"}
{"timestamp": {"seconds": 1372831731, "microseconds": 131282}, "event": "RESUME"}
5.stop VM at rebooting process and resume it.
(qemu) stop
(qemu) cont

Results:
(qemu) c
(qemu) 
Program received signal SIGSEGV, Segmentation fault.
virtio_blk_handle_request (req=0xaa, mrb=0x7fffffffc2c0) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:379
379	    if (req->elem.out_num < 1 || req->elem.in_num < 1) {
(gdb) bt
#0  virtio_blk_handle_request (req=0xaa, mrb=0x7fffffffc2c0)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:379
#1  0x00007ffff7df3ddb in virtio_blk_dma_restart_bh (opaque=0x7ffff8a6f810)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:471
#2  0x00007ffff7e15ff1 in qemu_bh_poll () at /usr/src/debug/qemu-kvm-0.12.1.2/async.c:70
#3  0x00007ffff7ddf419 in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4017
#4  0x00007ffff7e0197a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244
#5  0x00007ffff7de2008 in main_loop (argc=65, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4187
#6  main (argc=65, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6525
(gdb) 

Verify this bug as follow version:

Host:
# uname -r
2.6.32-395.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.377.el6.x86_64

Guest:
2.6.32-358.el6.x86_64 

Steps as same as reproduce

Results:
Tried more than 5 times, not hit call trace ,guest work well .

Addtional info:

1)As comment#5 that tried it without s3/s4.  Also have no such issue any more, guest work well.


According to above test ,this bug fixed.

Comment 18 Qunfang Zhang 2013-10-12 03:50:49 UTC
Setting to VERIFIED according to comment 17.

Comment 19 errata-xmlrpc 2013-11-21 05:58:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1553.html