886080 – Qemu segmentation fault when resume VM from stop at rebooting process after do some hot-plug/unplug and S3

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 886080 - Qemu segmentation fault when resume VM from stop at rebooting process after do some hot-plug/unplug and S3

Summary: Qemu segmentation fault when resume VM from stop at rebooting process after d...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	6.4
Hardware:	x86_64
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Stefan Hajnoczi
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	761491 912287
TreeView+	depends on / blocked

Reported:	2012-12-11 13:28 UTC by Sibiao Luo
Modified:	2013-11-21 05:58 UTC (History)
CC List:	18 users (show)
Fixed In Version:	qemu-kvm-0.12.1.2-2.370.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-11-21 05:58:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
the guest kernel logs after do system_set and stop/cont. (8.65 KB, text/plain) 2012-12-11 13:29 UTC, Sibiao Luo	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2013:1553	0	normal	SHIPPED_LIVE	Important: qemu-kvm security, bug fix, and enhancement update	2013-11-20 21:40:29 UTC

Description Sibiao Luo 2012-12-11 13:28:27 UTC

Description of problem:
do some hot-plug/unplug and S3, then reboot the guest and stop it at rebooting process, then do resume, the qemu will core dump. 
btw, if i did not do some hot-plug/unplug and S3, just system_reset and stop VM at rebooting process and then resume it, the qemu is well.

Version-Release number of selected component (if applicable):
host info:
# uname -r && rpm -q qemu-kvm
2.6.32-347.el6.x86_64
qemu-kvm-0.12.1.2-2.340.el6.x86_64
spice-gtk-0.14-5.el6.x86_64
spice-server-0.12.0-7.el6.x86_64
spice-gtk-tools-0.14-5.el6.x86_64
guest info:
# uname -r
2.6.32-347.el6.x86_64

How reproducible:
always (4/7)

Steps to Reproduce:
1.do S3 and hot-remove virtio_blk data disk.
{"timestamp": {"seconds": 1355230505, "microseconds": 179036}, "event": "SUSPEND"}
{"timestamp": {"seconds": 1355230508, "microseconds": 67717}, "event": "WAKEUP"}

{"execute":"device_del","arguments":{"id":"sluo_disk"}}
{"return": {}}
2.hot-add data disk and do S3.
{"timestamp": {"seconds": 1355230526, "microseconds": 469966}, "event": "SUSPEND"}
{"timestamp": {"seconds": 1355230526, "microseconds": 595620}, "event": "WAKEUP"}

{"execute":"__com.redhat_drive_add", "arguments": {"file":"/dev/mapper/mpathc","format":"qcow2","id":"data-disk"}}
{"return": {}}
{"execute":"device_add","arguments":{"driver":"virtio-blk-pci","drive":"data-disk","id":"sluo_disk"}}
{"return": {}}
3.do S3 and system_reset.
{"timestamp": {"seconds": 1355230543, "microseconds": 260324}, "event": "SUSPEND"}
{"timestamp": {"seconds": 1355230546, "microseconds": 377548}, "event": "WAKEUP"}

{ "execute": "system_reset" }
{"return": {}}
{"timestamp": {"seconds": 1355230553, "microseconds": 930122}, "event": "RESET"}
{"timestamp": {"seconds": 1355230564, "microseconds": 629887}, "event": "STOP"}
{"timestamp": {"seconds": 1355230566, "microseconds": 316200}, "event": "RESUME"}
4.stop VM at rebooting process and resume it.
(qemu) stop
(qemu) cont

Actual results:
after step 4, the qemu segmentation fault.
(qemu) stop
(qemu) cont
(qemu) 
Program received signal SIGSEGV, Segmentation fault.
virtio_blk_handle_request (req=0xa8000000a800, mrb=0x7fffffffbf90) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:373
373        if (req->elem.out_num < 1 || req->elem.in_num < 1) {

(gdb) bt
#0  virtio_blk_handle_request (req=0xa8000000a800, mrb=0x7fffffffbf90) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:373
#1  0x00007ffff7df713b in virtio_blk_dma_restart_bh (opaque=0x7ffff8874860) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:450
#2  0x00007ffff7e17711 in qemu_bh_poll () at async.c:70
#3  0x00007ffff7de2bd9 in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4017
#4  0x00007ffff7e04c2a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244
#5  0x00007ffff7de57c8 in main_loop (argc=69, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4187
#6  main (argc=69, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6525
(gdb)  

Expected results:
resume guest successfully without any call dump.

Additional info:
/usr/libexec/qemu-kvm -M rhel6.4.0 -cpu Nehalem -enable-kvm -m 4096 -smp 4,sockets=2,cores=2,threads=1 -usb -device usb-tablet,id=input0 -name sluo -uuid 990ea161-6b67-47b2-b803-19fb01d30d30 -rtc base=localtime,clock=host,driftfix=slew -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/dev/mapper/mpathb,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=9C:4A:92:E0:D1:26,bus=pci.0,addr=0x5 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5931,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x6 -device hda-duplex -device usb-ehci,id=ehci,addr=0x7 -chardev spicevmc,name=usbredir,id=usbredirchardev1 -device usb-redir,chardev=usbredirchardev1,id=usbredirdev1,bus=ehci.0,debug=4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -serial unix:/tmp/ttyS0,server,nowait -qmp tcp:0:4444,server,nowait -monitor stdio -drive file=/dev/mapper/mpathc,if=none,id=data-disk,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,drive=data-disk,id=sluo_disk

Comment 1 Sibiao Luo 2012-12-11 13:29:49 UTC

Created attachment 661433 [details]
the guest kernel logs after do system_set and stop/cont.

Comment 3 Sibiao Luo 2012-12-11 13:34:31 UTC

(In reply to comment #0)
> {"timestamp": {"seconds": 1355230564, "microseconds": 629887}, "event":
> "STOP"}
> {"timestamp": {"seconds": 1355230566, "microseconds": 316200}, "event":
> "RESUME"}
paste it by mistake, this two logs was generated when step 4.
> 4.stop VM at rebooting process and resume it.
> (qemu) stop
> (qemu) cont

Comment 4 juzhang 2012-12-12 02:18:09 UTC

FYI
https://bugzilla.redhat.com/show_bug.cgi?id=822386

The scenario seems same but bt is different.

Comment 5 Amit Shah 2013-02-21 13:51:49 UTC

I suspect what's happening is when system_reset is done, qemu doesn't forget older virtio-ring data, causing it to dereference stale guest memory.

This bug should be triggered even without s3/s4.

system_reset is a corner case, since it's not a way to cleanly shutdown a guest: it's like pressing the reset switch on a physical host.

Comment 7 Amit Shah 2013-05-21 12:59:32 UTC

I'd say this is NOTABUG due to system_reset being done, but maybe Stefan wants to look at the backtrace.

Comment 17 langfang 2013-07-03 06:11:27 UTC

Reproduce this bug as follow version:
Host:
# uname -r 
2.6.32-395.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.356.el6.x86_64

Guest:
2.6.32-358.el6.x86_64 

Steps:
1.Boot guest
2.Do S3 and hot-remove virtio_blk data disk.
{"timestamp": {"seconds": 1372831546, "microseconds": 147564}, "event": "SUSPEND"}
{"timestamp": {"seconds": 1372831546, "microseconds": 446599}, "event": "WAKEUP"}
{"execute":"device_del","arguments":{"id":"sluo_disk"}}
{"return": {}}

2.Do S3 and hot-add data disk.

{"timestamp": {"seconds": 1372831620, "microseconds": 538826}, "event": "SUSPEND"}
{"timestamp": {"seconds": 1372831625, "microseconds": 772250}, "event": "WAKEUP"}
{"execute":"__com.redhat_drive_add", "arguments": {"file":"/home/test.qcow2","format":"qcow2","id":"data-disk"}}
{"return": {}}
{"execute":"device_add","arguments":{"driver":"virtio-blk-pci","drive":"data-disk","id":"sluo_disk"}}
{"return": {}}
3.do S3 and system_reset.
{"timestamp": {"seconds": 1372831700, "microseconds": 361727}, "event": "SUSPEND"}
{"timestamp": {"seconds": 1372831700, "microseconds": 524309}, "event": "WAKEUP"}
{ "execute": "system_reset" }
{"return": {}}
{"timestamp": {"seconds": 1372831725, "microseconds": 759034}, "event": "RESET"}
{"timestamp": {"seconds": 1372831730, "microseconds": 155271}, "event": "STOP"}
{"timestamp": {"seconds": 1372831731, "microseconds": 131282}, "event": "RESUME"}
5.stop VM at rebooting process and resume it.
(qemu) stop
(qemu) cont

Results:
(qemu) c
(qemu) 
Program received signal SIGSEGV, Segmentation fault.
virtio_blk_handle_request (req=0xaa, mrb=0x7fffffffc2c0) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:379
379	    if (req->elem.out_num < 1 || req->elem.in_num < 1) {
(gdb) bt
#0  virtio_blk_handle_request (req=0xaa, mrb=0x7fffffffc2c0)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:379
#1  0x00007ffff7df3ddb in virtio_blk_dma_restart_bh (opaque=0x7ffff8a6f810)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/virtio-blk.c:471
#2  0x00007ffff7e15ff1 in qemu_bh_poll () at /usr/src/debug/qemu-kvm-0.12.1.2/async.c:70
#3  0x00007ffff7ddf419 in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4017
#4  0x00007ffff7e0197a in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2244
#5  0x00007ffff7de2008 in main_loop (argc=65, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4187
#6  main (argc=65, argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6525
(gdb) 

Verify this bug as follow version:

Host:
# uname -r
2.6.32-395.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.377.el6.x86_64

Guest:
2.6.32-358.el6.x86_64 

Steps as same as reproduce

Results:
Tried more than 5 times, not hit call trace ,guest work well .

Addtional info:

1)As comment#5 that tried it without s3/s4.  Also have no such issue any more, guest work well.


According to above test ,this bug fixed.

Comment 18 Qunfang Zhang 2013-10-12 03:50:49 UTC

Setting to VERIFIED according to comment 17.

Comment 19 errata-xmlrpc 2013-11-21 05:58:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1553.html

Note You need to log in before you can comment on or make changes to this bug.