Bug 1299875

Summary: system_reset should clear pending request for error (IDE)
Product: Red Hat Enterprise Linux 7 Reporter: Ademar Reis <areis>
Component: qemu-kvmAssignee: John Snow <jsnow>
Status: CLOSED ERRATA QA Contact: aihua liang <aliang>
Severity: medium Docs Contact:
Priority: high    
Version: 7.2CC: ailan, armbru, coli, huding, jsnow, juzhang, michen, mkenneth, qzhang, rbalakri, virt-bugs, virt-maint, xuwei, zhguo
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Windows   
Whiteboard:
Fixed In Version: qemu-kvm-1.5.3-137.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1281713
: 1299876 1393042 (view as bug list) Environment:
Last Closed: 2017-08-01 17:46:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1281713    
Bug Blocks: 1299876, 1393042    

Description Ademar Reis 2016-01-19 13:05:40 UTC
+++ This bug was initially created as a clone of Bug #1281713 +++

Description of problem:
qemu-kvm quit with Segmentation fault after execute system_reset when no space left on host.

Version-Release number of selected component (if applicable):
qemu-img-0.12.1.2-2.481.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.481.el6.x86_64
qemu-kvm-0.12.1.2-2.481.el6.x86_64
qemu-guest-agent-0.12.1.2-2.481.el6.x86_64
qemu-kvm-debuginfo-0.12.1.2-2.481.el6.x86_64
2.6.32-583.el6.x86_64

How reproducible:
70%

Steps to Reproduce:
1.Create a 25G win2012.qcow2 image and install a windows2012r2 guest.
2.In guest located filesystem, make it out of space by copy guest image several times until no space left on device prompt. Launch guest by qemu-kvm command:
/usr/libexec/qemu-kvm -name win2012 -m 2048 \
	-cpu Opteron_G4 \
	-smp 1,cores=1,threads=2,sockets=2,maxcpus=4 \
	 -vga qxl\
	-serial unix:/tmp/m,server,nowait \
	-drive file=win2012-64r2-virtio-scsi.qcow2,if=none,id=drive-scsi-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=drive-scsi-disk0,bus=scsi0.0,scsi-id=0,lun=0,id=scsi-disk0,bootindex=1 \
	-monitor stdio \
	-usb -device usb-kbd,id=input0 \
	-vnc :1

3. Interact with guest by browsing internet or other things until you see "block I/O error in device 'ide0-hd0': No space left on device (28)" prompt from qemu-kvm monitor(Prompt usually happen within 5 minutes), input system_reset in qemu monitor. And Segmentation fault will happen.

Actual results:
qemu-kvm quit with Segmentation fault after execute system_reset

Expected results:
qemu-kvm process should still alive and guest system can be reset without error

Additional info:
Stack info:
Core was generated by `/usr/libexec/qemu-kvm -name win2012 -m 2048 -cpu SandyBridge -smp 2,cores=1,thr'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f85d1b04a90 in ?? ()
(gdb) bt
#0  0x00007f85d1b04a90 in ?? ()
#1  0x00007f85d03f5aee in bdrv_aio_cancel (acb=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3842
#2  0x00007f85d052d46a in ide_dma_cancel (bm=0x7f85d26e1160)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2395
#3  0x00007f85d052d499 in ide_dma_reset (bm=0x7f85d26e1160)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2408
#4  0x00007f85d05335ad in piix3_reset (opaque=0x7f85d26e0010)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/piix.c:124
#5  0x00007f85d03b71d2 in qemu_system_reset (report=true)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3417
#6  0x00007f85d03dd050 in qemu_kvm_system_reset (report=true)
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1992
#7  0x00007f85d03dd253 in kvm_main_loop ()
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2272
#8  0x00007f85d03be317 in main_loop (argc=<value optimized out>, 
    argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4273
#9  main (argc=<value optimized out>, argv=<value optimized out>, 
    envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6731

Qemu-kvm won't quit with Segmentation fault on Opteron_G5 host but windows guest cannot be reset after system_reset.


--- Additional comment from Markus Armbruster on 2015-11-24 14:37:15 BRST ---

Can you reproduce this with a qemu-kvm built with --enable-debug?

--- Additional comment from Guo, Zhiyi on 2015-11-27 00:00:55 BRST ---

Hi,
I guess you may want to see the function call ?? or argument value has been optimized. 
Stack trace still the same as reported in description.
I have enabled --enable-debug option and rebuild the qemu-kvm. -g option has been added to compile procedure from configure file:
+ ../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE' --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster,rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse --disable-sdl --disable-curses --disable-curl --disable-check-utests --disable-bluez --enable-docs --disable-vde --disable-spice --trace-backend=nop --enable-smartcard --disable-smartcard-nss --enable-mixemu
Install prefix    /usr
BIOS directory    /usr/share/qemu
binary directory  /usr/bin
local state directory   /var
Manual directory  /usr/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path       /root/rpmbuild/BUILD/qemu-kvm-0.12.1.2
C compiler        gcc
Host C compiler   gcc
CFLAGS            -O2 -g

BR/
Zhiyi

--- Additional comment from Markus Armbruster on 2015-11-27 05:17:53 BRST ---

I can't see --enable-debug in your configure line.  I can see -O2.  You need to get one roughly like this:

../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-g -pipe -Wall -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE' --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster,rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse --disable-sdl --disable-curses --disable-curl --disable-check-utests --disable-bluez --enable-docs --disable-vde --disable-spice --trace-backend=nop --enable-smartcard --disable-smartcard-nss --enable-mixemu

Please try again :)

--- Additional comment from Guo, Zhiyi on 2015-11-27 09:40:59 BRST ---

(In reply to Markus Armbruster from comment #4)
> I can't see --enable-debug in your configure line.  I can see -O2.  You need
> to get one roughly like this:
> 
> ../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id
> -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-g -pipe -Wall -fexceptions
> -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE'
> --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr
> --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen
> --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster,
> rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse
> --disable-sdl --disable-curses --disable-curl --disable-check-utests
> --disable-bluez --enable-docs --disable-vde --disable-spice
> --trace-backend=nop --enable-smartcard --disable-smartcard-nss
> --enable-mixemu
> 
> Please try again :)

Stack trace with none optimized code: 
(gdb) bt
#0  0x00007f1e571cfb10 in ?? ()
#1  0x00007f1e55ebd5ed in bdrv_aio_cancel_async (acb=0x7f1e571cfc10) at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3876
#2  0x00007f1e55ebd499 in bdrv_aio_cancel (acb=0x7f1e571cfc10) at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3842
#3  0x00007f1e56008f37 in ide_dma_cancel (bm=0x7f1e57dac160) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2395
#4  0x00007f1e56008f5d in ide_dma_reset (bm=0x7f1e57dac160) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2408
#5  0x00007f1e5600c755 in piix3_reset (opaque=0x7f1e57dab010) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/piix.c:124
#6  0x00007f1e55e6765b in qemu_system_reset (report=true) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3417
#7  0x00007f1e55e990ad in qemu_kvm_system_reset (report=true) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1992
#8  0x00007f1e55e9997d in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2272
#9  0x00007f1e55e683ba in main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4273
#10 0x00007f1e55e6d451 in main (argc=24, argv=0x7fff7f33ac18, envp=0x7fff7f33ace0) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6731

--- Additional comment from Markus Armbruster on 2015-11-27 11:17:42 BRST ---

Aha!  acb->aiocb_info->cancel_async seems to be garbage.  Hunch: use after free?  Chimes with your report that Opteron_G5 fails differently...  Please reproduce with your debug build of qemu-kvm under valgrind, and capture valgrind's report.

--- Additional comment from Guo, Zhiyi on 2015-12-01 06:32 BRST ---

Log generated on Valgrind 3.11.0, Valgrind 3.8.1 will core dump under same steps

--- Additional comment from Guo, Zhiyi on 2015-12-01 07:18 BRST ---



--- Additional comment from Markus Armbruster on 2015-12-01 08:01:26 BRST ---

valgrind is reporting a huge number of unrelated issues, probably in part because we lack upstream patches to suppress false positives.  It hits a cutoff and stops reporting some time before the crash.  Please try again with --error-limit=no.

Additional question: is qemu-kvm-rhev affected as well?

--- Additional comment from Guo, Zhiyi on 2015-12-02 06:09 BRST ---

Issue also can be reproduced on rhel7.2 intel skylake host with rhev:
kernel:3.10.0-334.el7.x86_64
qemu-kvm-rhev-2.3.0-31.el7.x86_64
qemu-kvm-rhev-debuginfo-2.3.0-31.el7.x86_64
qemu-img-rhev-2.3.0-31.el7.x86_64
qemu-kvm-tools-rhev-2.3.0-31.el7.x86_64
qemu-kvm-common-rhev-2.3.0-31.el7.x86_64

Attachment include valgrind log reproduced on rhel6.7 and rhel7.2. rhev packages have been compiled with -g and without -O2 optimize. valgrind log generate with option --error-limit=no

--- Additional comment from Guo, Zhiyi on 2015-12-02 06:12:38 BRST ---

Command used to reproduce the issue and capture valgrind log:
valgrind --log-file=valgrind.txt --error-limit=no /usr/libexec/qemu-kvm -name win2012 -m 2048 -smp 4 -cpu host -vga qxl -vnc :1 -monitor stdio -hda win2012.qcow2

--- Additional comment from Guo, Zhiyi on 2015-12-02 06:27 BRST ---

Mistake valgrind log on rhel6.7 please ignore attachment in comment 10 and use log in this comment.

Comment 2 John Snow 2016-09-20 18:52:25 UTC
Moving back to ASSIGNED as we decided to delay this to 7.4, at least for now. See comment 5 on #1299876

--js

Comment 4 Ademar Reis 2016-09-28 01:49:02 UTC
For reference, this is the cluster of BZ related to this issue: bug 1281713, bug 1299876, bug 1299875, bug 1361487, bug 1361490, bug 1361488, bug 1375520

Comment 6 aihua liang 2017-03-30 02:56:08 UTC
The issue still exist in RHEL7.4-3.10.0-618+qemu-kvm-1.5.3-134.

Comment 8 John Snow 2017-04-07 19:19:43 UTC
I'm having trouble with our build root at the moment, so I cannot re-post the patch currently.

Moving back to ASSIGNED so I can re-post the patch once the build root problem is addressed.

Thanks.

Comment 9 John Snow 2017-04-26 23:49:39 UTC
There.

Comment 10 Miroslav Rezanina 2017-04-28 04:17:37 UTC
Fix included in qemu-kvm-1.5.3-137.el7

Comment 12 aihua liang 2017-05-04 03:17:50 UTC
Verified,the problem has been resolved, so change its status to "Verified".

Test Version:
 kernel version:3.10.0-657.el7.x86_64
 qemu-kvm version:qemu-kvm-1.5.3-137.el7.x86_64

Test Steps:
 1.Full write host disk

 2.Start guest with qemu cmds bellow:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox off  \
-machine pc  \
-nodefaults  \
-vga std  \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20161219-042734-6fVMWCMz,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control  \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20161219-042734-6fVMWCMz,server,nowait \
-mon chardev=qmp_id_catch_monitor,mode=control \
-drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/rhel74-64-virtio.qcow2 \
-device ide-hd,id=image1,drive=drive_image1,bootindex=0,bus=ide.0 \
-device virtio-net-pci,mac=9a:f2:f3:f4:f5:f6,id=id30uvBS,vectors=4,netdev=idADyVP5,bus=pci.0,addr=04  \
-netdev tap,id=idADyVP5,vhost=on \
-m 2048  \
-smp 16,maxcpus=16,cores=8,threads=1,sockets=2  \
-cpu host \
-vnc :0  \
-rtc base=localtime,clock=host,driftfix=slew  \
-boot order=cdn,once=d,menu=off,strict=off \
-enable-kvm \
-spice port=3000,ipv4,disable-ticketing \
-monitor stdio \

 3.Copy files to guest until qemu error report:
  (qemu) block I/O error in device 'drive_image1': No space left on device (28)

 4.Check vm status
  (qemu)info status  --> VM status:paused(io-error)
 
 5.Reset vm and continue it
   (qemu)system_reset
   (qemu)c            --> VM restart successfully.

Comment 13 errata-xmlrpc 2017-08-01 17:46:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1856