Bug 1299876

Summary: system_reset should clear pending request for error (IDE)
Product: Red Hat Enterprise Linux 7 Reporter: Ademar Reis <areis>
Component: qemu-kvm-rhevAssignee: John Snow <jsnow>
Status: CLOSED ERRATA QA Contact: aihua liang <aliang>
Severity: medium Docs Contact:
Priority: high    
Version: 7.2CC: ailan, aliang, armbru, coli, huding, jsnow, juzhang, michen, mkenneth, qzhang, rbalakri, virt-bugs, virt-maint, xuwei, zhguo
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Windows   
Whiteboard:
Fixed In Version: QEMU 2.9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1299875
: 1393043 (view as bug list) Environment:
Last Closed: 2017-08-01 23:29:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1281713, 1299875    
Bug Blocks: 1393043    

Description Ademar Reis 2016-01-19 13:07:34 UTC
+++ This bug was initially created as a clone of Bug #1299875 +++

+++ This bug was initially created as a clone of Bug #1281713 +++

Description of problem:
qemu-kvm quit with Segmentation fault after execute system_reset when no space left on host.

Version-Release number of selected component (if applicable):
qemu-img-0.12.1.2-2.481.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.481.el6.x86_64
qemu-kvm-0.12.1.2-2.481.el6.x86_64
qemu-guest-agent-0.12.1.2-2.481.el6.x86_64
qemu-kvm-debuginfo-0.12.1.2-2.481.el6.x86_64
2.6.32-583.el6.x86_64

How reproducible:
70%

Steps to Reproduce:
1.Create a 25G win2012.qcow2 image and install a windows2012r2 guest.
2.In guest located filesystem, make it out of space by copy guest image several times until no space left on device prompt. Launch guest by qemu-kvm command:
/usr/libexec/qemu-kvm -name win2012 -m 2048 \
	-cpu Opteron_G4 \
	-smp 1,cores=1,threads=2,sockets=2,maxcpus=4 \
	 -vga qxl\
	-serial unix:/tmp/m,server,nowait \
	-drive file=win2012-64r2-virtio-scsi.qcow2,if=none,id=drive-scsi-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=drive-scsi-disk0,bus=scsi0.0,scsi-id=0,lun=0,id=scsi-disk0,bootindex=1 \
	-monitor stdio \
	-usb -device usb-kbd,id=input0 \
	-vnc :1

3. Interact with guest by browsing internet or other things until you see "block I/O error in device 'ide0-hd0': No space left on device (28)" prompt from qemu-kvm monitor(Prompt usually happen within 5 minutes), input system_reset in qemu monitor. And Segmentation fault will happen.

Actual results:
qemu-kvm quit with Segmentation fault after execute system_reset

Expected results:
qemu-kvm process should still alive and guest system can be reset without error

Additional info:
Stack info:
Core was generated by `/usr/libexec/qemu-kvm -name win2012 -m 2048 -cpu SandyBridge -smp 2,cores=1,thr'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f85d1b04a90 in ?? ()
(gdb) bt
#0  0x00007f85d1b04a90 in ?? ()
#1  0x00007f85d03f5aee in bdrv_aio_cancel (acb=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3842
#2  0x00007f85d052d46a in ide_dma_cancel (bm=0x7f85d26e1160)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2395
#3  0x00007f85d052d499 in ide_dma_reset (bm=0x7f85d26e1160)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2408
#4  0x00007f85d05335ad in piix3_reset (opaque=0x7f85d26e0010)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/piix.c:124
#5  0x00007f85d03b71d2 in qemu_system_reset (report=true)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3417
#6  0x00007f85d03dd050 in qemu_kvm_system_reset (report=true)
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1992
#7  0x00007f85d03dd253 in kvm_main_loop ()
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2272
#8  0x00007f85d03be317 in main_loop (argc=<value optimized out>, 
    argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4273
#9  main (argc=<value optimized out>, argv=<value optimized out>, 
    envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6731

Qemu-kvm won't quit with Segmentation fault on Opteron_G5 host but windows guest cannot be reset after system_reset.


--- Additional comment from Markus Armbruster on 2015-11-24 14:37:15 BRST ---

Can you reproduce this with a qemu-kvm built with --enable-debug?

--- Additional comment from Guo, Zhiyi on 2015-11-27 00:00:55 BRST ---

Hi,
I guess you may want to see the function call ?? or argument value has been optimized. 
Stack trace still the same as reported in description.
I have enabled --enable-debug option and rebuild the qemu-kvm. -g option has been added to compile procedure from configure file:
+ ../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE' --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster,rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse --disable-sdl --disable-curses --disable-curl --disable-check-utests --disable-bluez --enable-docs --disable-vde --disable-spice --trace-backend=nop --enable-smartcard --disable-smartcard-nss --enable-mixemu
Install prefix    /usr
BIOS directory    /usr/share/qemu
binary directory  /usr/bin
local state directory   /var
Manual directory  /usr/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path       /root/rpmbuild/BUILD/qemu-kvm-0.12.1.2
C compiler        gcc
Host C compiler   gcc
CFLAGS            -O2 -g

BR/
Zhiyi

--- Additional comment from Markus Armbruster on 2015-11-27 05:17:53 BRST ---

I can't see --enable-debug in your configure line.  I can see -O2.  You need to get one roughly like this:

../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-g -pipe -Wall -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE' --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster,rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse --disable-sdl --disable-curses --disable-curl --disable-check-utests --disable-bluez --enable-docs --disable-vde --disable-spice --trace-backend=nop --enable-smartcard --disable-smartcard-nss --enable-mixemu

Please try again :)

--- Additional comment from Guo, Zhiyi on 2015-11-27 09:40:59 BRST ---

(In reply to Markus Armbruster from comment #4)
> I can't see --enable-debug in your configure line.  I can see -O2.  You need
> to get one roughly like this:
> 
> ../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id
> -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-g -pipe -Wall -fexceptions
> -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE'
> --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr
> --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen
> --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster,
> rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse
> --disable-sdl --disable-curses --disable-curl --disable-check-utests
> --disable-bluez --enable-docs --disable-vde --disable-spice
> --trace-backend=nop --enable-smartcard --disable-smartcard-nss
> --enable-mixemu
> 
> Please try again :)

Stack trace with none optimized code: 
(gdb) bt
#0  0x00007f1e571cfb10 in ?? ()
#1  0x00007f1e55ebd5ed in bdrv_aio_cancel_async (acb=0x7f1e571cfc10) at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3876
#2  0x00007f1e55ebd499 in bdrv_aio_cancel (acb=0x7f1e571cfc10) at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3842
#3  0x00007f1e56008f37 in ide_dma_cancel (bm=0x7f1e57dac160) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2395
#4  0x00007f1e56008f5d in ide_dma_reset (bm=0x7f1e57dac160) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2408
#5  0x00007f1e5600c755 in piix3_reset (opaque=0x7f1e57dab010) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/piix.c:124
#6  0x00007f1e55e6765b in qemu_system_reset (report=true) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3417
#7  0x00007f1e55e990ad in qemu_kvm_system_reset (report=true) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1992
#8  0x00007f1e55e9997d in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2272
#9  0x00007f1e55e683ba in main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4273
#10 0x00007f1e55e6d451 in main (argc=24, argv=0x7fff7f33ac18, envp=0x7fff7f33ace0) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6731

--- Additional comment from Markus Armbruster on 2015-11-27 11:17:42 BRST ---

Aha!  acb->aiocb_info->cancel_async seems to be garbage.  Hunch: use after free?  Chimes with your report that Opteron_G5 fails differently...  Please reproduce with your debug build of qemu-kvm under valgrind, and capture valgrind's report.

--- Additional comment from Guo, Zhiyi on 2015-12-01 06:32 BRST ---

Log generated on Valgrind 3.11.0, Valgrind 3.8.1 will core dump under same steps

--- Additional comment from Guo, Zhiyi on 2015-12-01 07:18 BRST ---



--- Additional comment from Markus Armbruster on 2015-12-01 08:01:26 BRST ---

valgrind is reporting a huge number of unrelated issues, probably in part because we lack upstream patches to suppress false positives.  It hits a cutoff and stops reporting some time before the crash.  Please try again with --error-limit=no.

Additional question: is qemu-kvm-rhev affected as well?

--- Additional comment from Guo, Zhiyi on 2015-12-02 06:09 BRST ---

Issue also can be reproduced on rhel7.2 intel skylake host with rhev:
kernel:3.10.0-334.el7.x86_64
qemu-kvm-rhev-2.3.0-31.el7.x86_64
qemu-kvm-rhev-debuginfo-2.3.0-31.el7.x86_64
qemu-img-rhev-2.3.0-31.el7.x86_64
qemu-kvm-tools-rhev-2.3.0-31.el7.x86_64
qemu-kvm-common-rhev-2.3.0-31.el7.x86_64

Attachment include valgrind log reproduced on rhel6.7 and rhel7.2. rhev packages have been compiled with -g and without -O2 optimize. valgrind log generate with option --error-limit=no

--- Additional comment from Guo, Zhiyi on 2015-12-02 06:12:38 BRST ---

Command used to reproduce the issue and capture valgrind log:
valgrind --log-file=valgrind.txt --error-limit=no /usr/libexec/qemu-kvm -name win2012 -m 2048 -smp 4 -cpu host -vga qxl -vnc :1 -monitor stdio -hda win2012.qcow2

--- Additional comment from Guo, Zhiyi on 2015-12-02 06:27 BRST ---

Mistake valgrind log on rhel6.7 please ignore attachment in comment 10 and use log in this comment.

Comment 2 Markus Armbruster 2016-07-25 16:38:57 UTC
The backtrace is similar to the one in bug 1346237 (thanks to Stefan Hajnoczi for bringing it to my attention).  Can you still reproduce it with qemu-kvm-rhev-2.6.0-11.el7?

Comment 3 Guo, Zhiyi 2016-07-26 06:39:17 UTC
Thanks to Laszlo's simple reproduce steps, I can reproduce this issue against qemu-kvm-rhev-2.6.0-15.el7.x86_64

Steps:
1. #qemu-img create -f qcow2 test.qcow2 8G
2. #ulimit -f 256
3. #gdb /usr/libexec/qemu-kvm
(gdb) run -m 2048 -smp 2 -drive file=test.qcow2,werror=stop,rerror=stop,cache=writeback,id=hd0,if=none -device ide-hd,drive=hd0 -drive id=cd0,readonly,media=cdrom,cache=writeback,if=none,file=Fedora-Server-dvd-x86_64-24-1.2.iso -device ide-cd,drive=cd0 -vnc :0 -monitor stdio

4. Install fedora from vnc and guest will hang very soon during installation.
5. Do system_reset from qmp and qemu will crash.

Backtrace:
(qemu) 
Program received signal SIGSEGV, Segmentation fault.
0x0000555556b750f0 in ?? ()
(gdb) bt
#0  0x0000555556b750f0 in ?? ()
#1  0x000055555593cc4a in bdrv_aio_cancel_async (acb=0x555556b75570)
    at block/io.c:2060
#2  bdrv_aio_cancel (acb=0x555556b75570) at block/io.c:2041
#3  0x0000555555931ce5 in blk_aio_cancel (acb=<optimized out>)
    at block/block-backend.c:1044
#4  0x000055555584133a in ide_bus_reset (bus=bus@entry=0x5555597bf0d8)
    at hw/ide/core.c:2326
#5  0x0000555555844674 in piix3_reset (opaque=0x5555597be000)
    at hw/ide/piix.c:115
#6  0x00005555557d1abd in qemu_devices_reset () at vl.c:1738
#7  0x000055555574d166 in pc_machine_reset ()
    at /usr/src/debug/qemu-2.6.0/hw/i386/pc.c:1936
#8  0x00005555557d1b26 in qemu_system_reset (report=report@entry=true)
    at vl.c:1751
#9  0x00005555556c795b in main_loop_should_exit () at vl.c:1898
#10 main_loop () at vl.c:1938
#11 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at vl.c:4664

Comment 5 John Snow 2016-09-19 23:09:46 UTC
Moving back from POST to ASSIGNED as the 7.4 tree isn't open for submissions yet, and we decided not to include this for 7.3 as of 2016-09-19.

There is another bug proposed for Z-stream (BZ#1375520) which MAY require the same fix as this bug, so it is possible we may change our minds again in the near future based on analysis of that bug.

(While you're here reading bugzilla comments: The two bugs currently trigger the same exact stack trace, but the triggering mechanism appears to be different between the two BZs, hence the separate entries.)

--js

Comment 7 Ademar Reis 2016-09-28 01:48:54 UTC
For reference, this is the cluster of BZ related to this issue: bug 1281713, bug 1299876, bug 1299875, bug 1361487, bug 1361490, bug 1361488, bug 1375520

Comment 10 aihua liang 2017-03-28 03:16:48 UTC
Hi, John

 Test on RHEV7.4+qemu-kvm-rhev 2.9, the problem has been resolved, so please help to handle the bug, change its status to the correct one, thanks

*******Test Detail**************

Test Version:
 kernel version:3.10.0-623.el7.x86_64
 qemu-kvm-rhev version:qemu-kvm-rhev-2.9.0-0.el7.mrezanin201703210848.x86_64

Test Steps:
 1.Install a guest, ex, win2012
 2.Full write host disk.
 3.Start guest with qemu cmds, then start some apps on guest until guest hang
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox off  \
-machine pc  \
-nodefaults  \
-vga std  \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20161227-001116-PD2k1uXB,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control  \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20161227-001116-PD2k1uXB,server,nowait \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pvpanic,ioport=0x505,id=id95e1vw  \
-chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20161227-001116-PD2k1uXB,server,nowait \
-device isa-serial,chardev=serial_id_serial0  \
-chardev socket,id=seabioslog_id_20161227-001116-PD2k1uXB,path=/var/tmp/seabios-20161227-001116-PD2k1uXB,server,nowait \
-device isa-debugcon,chardev=seabioslog_id_20161227-001116-PD2k1uXB,iobase=0x402 \
-device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
-device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pci.0 \
-device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pci.0 \
-device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pci.0 \
-drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/win2012-64-virtio.qcow2 \
-device ide-hd,id=image1,drive=drive_image1,bus=ide.0,unit=0 \
-device virtio-net-pci,mac=9a:d7:d8:d9:da:db,id=idoYMY7R,vectors=4,netdev=iddvjhTd,bus=pci.0,addr=03  \
-netdev tap,id=iddvjhTd,vhost=on \
-m 4096  \
-smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
-cpu host \
-vnc :0  \
-rtc base=localtime,clock=host,driftfix=slew  \
-boot order=cdn,menu=off,strict=off \
-enable-kvm \
-monitor stdio \
-spice port=3000,ipv4,disable-ticketing \

  4.Check vm status
    (qemu)info status     -------> vm status:paused(io-error)

  5. Reset vm
    (qemu)system_reset
    (qemu)c               ------->vm restart

  6. Release some host space, then reset vm
    (qemu)system_reset
    (qemu)c               -------> vm restart and work normally

Comment 11 aihua liang 2017-03-29 11:07:34 UTC
Hi, John

 The problem has been resolved on RHEV7.4+qemu-kvm-rhev 2.9, please help to handle it to the correct status, thanks.

Comment 12 John Snow 2017-03-29 19:26:28 UTC
OK, I think it's up to QE to mark it as ON_QA or VERIFIED, so from the Dev perspective I'll mark it as MODIFIED to signify that the fix is in the tree. Hopefully this moves the BZ back into the normal flow of things.

Comment 13 aihua liang 2017-04-01 06:34:14 UTC
As the fix has been in the tree and verified as "pass", we change bug's status to "Verified", thanks...

Comment 15 errata-xmlrpc 2017-08-01 23:29:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 16 errata-xmlrpc 2017-08-02 01:07:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 17 errata-xmlrpc 2017-08-02 01:59:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 18 errata-xmlrpc 2017-08-02 02:40:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 19 errata-xmlrpc 2017-08-02 03:04:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 20 errata-xmlrpc 2017-08-02 03:24:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392