Bug 1281713

Summary: system_reset should clear pending request for error (IDE)
Product: Red Hat Enterprise Linux 6 Reporter: Guo, Zhiyi <zhguo>
Component: qemu-kvmAssignee: John Snow <jsnow>
Status: CLOSED ERRATA QA Contact: aihua liang <aliang>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.7CC: ailan, coli, jsnow, juzhang, michen, mkenneth, qzhang, rbalakri, virt-maint, zhguo
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Windows   
Whiteboard:
Fixed In Version: qemu-kvm-0.12.1.2-2.494.el6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1299875 (view as bug list) Environment:
Last Closed: 2017-03-21 09:35:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1299875, 1299876    
Attachments:
Description Flags
valgrind log on intel broadwell host
none
valgrind log on amd Opteron 6376
none
valgrind log with option --error-limit=no
none
correct valgrind log including rhel6.7 and rhel7.2 none

Description Guo, Zhiyi 2015-11-13 09:03:11 UTC
Description of problem:
qemu-kvm quit with Segmentation fault after execute system_reset when no space left on host.

Version-Release number of selected component (if applicable):
qemu-img-0.12.1.2-2.481.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.481.el6.x86_64
qemu-kvm-0.12.1.2-2.481.el6.x86_64
qemu-guest-agent-0.12.1.2-2.481.el6.x86_64
qemu-kvm-debuginfo-0.12.1.2-2.481.el6.x86_64
2.6.32-583.el6.x86_64

How reproducible:
70%

Steps to Reproduce:
1.Create a 25G win2012.qcow2 image and install a windows2012r2 guest.
2.In guest located filesystem, make it out of space by copy guest image several times until no space left on device prompt. Launch guest by qemu-kvm command:
/usr/libexec/qemu-kvm -name win2012 -m 2048 \
	-cpu Opteron_G4 \
	-smp 1,cores=1,threads=2,sockets=2,maxcpus=4 \
	 -vga qxl\
	-serial unix:/tmp/m,server,nowait \
	-drive file=win2012-64r2-virtio-scsi.qcow2,if=none,id=drive-scsi-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=drive-scsi-disk0,bus=scsi0.0,scsi-id=0,lun=0,id=scsi-disk0,bootindex=1 \
	-monitor stdio \
	-usb -device usb-kbd,id=input0 \
	-vnc :1

3. Interact with guest by browsing internet or other things until you see "block I/O error in device 'ide0-hd0': No space left on device (28)" prompt from qemu-kvm monitor(Prompt usually happen within 5 minutes), input system_reset in qemu monitor. And Segmentation fault will happen.

Actual results:
qemu-kvm quit with Segmentation fault after execute system_reset

Expected results:
qemu-kvm process should still alive and guest system can be reset without error

Additional info:
Stack info:
Core was generated by `/usr/libexec/qemu-kvm -name win2012 -m 2048 -cpu SandyBridge -smp 2,cores=1,thr'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f85d1b04a90 in ?? ()
(gdb) bt
#0  0x00007f85d1b04a90 in ?? ()
#1  0x00007f85d03f5aee in bdrv_aio_cancel (acb=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3842
#2  0x00007f85d052d46a in ide_dma_cancel (bm=0x7f85d26e1160)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2395
#3  0x00007f85d052d499 in ide_dma_reset (bm=0x7f85d26e1160)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2408
#4  0x00007f85d05335ad in piix3_reset (opaque=0x7f85d26e0010)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/piix.c:124
#5  0x00007f85d03b71d2 in qemu_system_reset (report=true)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3417
#6  0x00007f85d03dd050 in qemu_kvm_system_reset (report=true)
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1992
#7  0x00007f85d03dd253 in kvm_main_loop ()
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2272
#8  0x00007f85d03be317 in main_loop (argc=<value optimized out>, 
    argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4273
#9  main (argc=<value optimized out>, argv=<value optimized out>, 
    envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6731

Qemu-kvm won't quit with Segmentation fault on Opteron_G5 host but windows guest cannot be reset after system_reset.

Comment 2 Markus Armbruster 2015-11-24 16:37:15 UTC
Can you reproduce this with a qemu-kvm built with --enable-debug?

Comment 3 Guo, Zhiyi 2015-11-27 02:00:55 UTC
Hi,
I guess you may want to see the function call ?? or argument value has been optimized. 
Stack trace still the same as reported in description.
I have enabled --enable-debug option and rebuild the qemu-kvm. -g option has been added to compile procedure from configure file:
+ ../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE' --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster,rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse --disable-sdl --disable-curses --disable-curl --disable-check-utests --disable-bluez --enable-docs --disable-vde --disable-spice --trace-backend=nop --enable-smartcard --disable-smartcard-nss --enable-mixemu
Install prefix    /usr
BIOS directory    /usr/share/qemu
binary directory  /usr/bin
local state directory   /var
Manual directory  /usr/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path       /root/rpmbuild/BUILD/qemu-kvm-0.12.1.2
C compiler        gcc
Host C compiler   gcc
CFLAGS            -O2 -g

BR/
Zhiyi

Comment 4 Markus Armbruster 2015-11-27 07:17:53 UTC
I can't see --enable-debug in your configure line.  I can see -O2.  You need to get one roughly like this:

../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-g -pipe -Wall -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE' --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster,rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse --disable-sdl --disable-curses --disable-curl --disable-check-utests --disable-bluez --enable-docs --disable-vde --disable-spice --trace-backend=nop --enable-smartcard --disable-smartcard-nss --enable-mixemu

Please try again :)

Comment 5 Guo, Zhiyi 2015-11-27 11:40:59 UTC
(In reply to Markus Armbruster from comment #4)
> I can't see --enable-debug in your configure line.  I can see -O2.  You need
> to get one roughly like this:
> 
> ../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id
> -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-g -pipe -Wall -fexceptions
> -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE'
> --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr
> --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen
> --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster,
> rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse
> --disable-sdl --disable-curses --disable-curl --disable-check-utests
> --disable-bluez --enable-docs --disable-vde --disable-spice
> --trace-backend=nop --enable-smartcard --disable-smartcard-nss
> --enable-mixemu
> 
> Please try again :)

Stack trace with none optimized code: 
(gdb) bt
#0  0x00007f1e571cfb10 in ?? ()
#1  0x00007f1e55ebd5ed in bdrv_aio_cancel_async (acb=0x7f1e571cfc10) at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3876
#2  0x00007f1e55ebd499 in bdrv_aio_cancel (acb=0x7f1e571cfc10) at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3842
#3  0x00007f1e56008f37 in ide_dma_cancel (bm=0x7f1e57dac160) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2395
#4  0x00007f1e56008f5d in ide_dma_reset (bm=0x7f1e57dac160) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2408
#5  0x00007f1e5600c755 in piix3_reset (opaque=0x7f1e57dab010) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/piix.c:124
#6  0x00007f1e55e6765b in qemu_system_reset (report=true) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3417
#7  0x00007f1e55e990ad in qemu_kvm_system_reset (report=true) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1992
#8  0x00007f1e55e9997d in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2272
#9  0x00007f1e55e683ba in main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4273
#10 0x00007f1e55e6d451 in main (argc=24, argv=0x7fff7f33ac18, envp=0x7fff7f33ace0) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6731

Comment 6 Markus Armbruster 2015-11-27 13:17:42 UTC
Aha!  acb->aiocb_info->cancel_async seems to be garbage.  Hunch: use after free?  Chimes with your report that Opteron_G5 fails differently...  Please reproduce with your debug build of qemu-kvm under valgrind, and capture valgrind's report.

Comment 7 Guo, Zhiyi 2015-12-01 08:32:23 UTC
Created attachment 1100766 [details]
valgrind log on intel broadwell host

Log generated on Valgrind 3.11.0, Valgrind 3.8.1 will core dump under same steps

Comment 8 Guo, Zhiyi 2015-12-01 09:18:13 UTC
Created attachment 1100779 [details]
valgrind log on amd Opteron 6376

Comment 9 Markus Armbruster 2015-12-01 10:01:26 UTC
valgrind is reporting a huge number of unrelated issues, probably in part because we lack upstream patches to suppress false positives.  It hits a cutoff and stops reporting some time before the crash.  Please try again with --error-limit=no.

Additional question: is qemu-kvm-rhev affected as well?

Comment 10 Guo, Zhiyi 2015-12-02 08:09:12 UTC
Created attachment 1101353 [details]
valgrind log with option --error-limit=no

Issue also can be reproduced on rhel7.2 intel skylake host with rhev:
kernel:3.10.0-334.el7.x86_64
qemu-kvm-rhev-2.3.0-31.el7.x86_64
qemu-kvm-rhev-debuginfo-2.3.0-31.el7.x86_64
qemu-img-rhev-2.3.0-31.el7.x86_64
qemu-kvm-tools-rhev-2.3.0-31.el7.x86_64
qemu-kvm-common-rhev-2.3.0-31.el7.x86_64

Attachment include valgrind log reproduced on rhel6.7 and rhel7.2. rhev packages have been compiled with -g and without -O2 optimize. valgrind log generate with option --error-limit=no

Comment 11 Guo, Zhiyi 2015-12-02 08:12:38 UTC
Command used to reproduce the issue and capture valgrind log:
valgrind --log-file=valgrind.txt --error-limit=no /usr/libexec/qemu-kvm -name win2012 -m 2048 -smp 4 -cpu host -vga qxl -vnc :1 -monitor stdio -hda win2012.qcow2

Comment 12 Guo, Zhiyi 2015-12-02 08:27:53 UTC
Created attachment 1101356 [details]
correct valgrind log including rhel6.7 and rhel7.2

Mistake valgrind log on rhel6.7 please ignore attachment in comment 10 and use log in this comment.

Comment 13 Jeff Nelson 2016-09-22 13:49:08 UTC
Fix included in qemu-kvm-0.12.1.2-2.494.el6

Comment 14 Ademar Reis 2016-09-28 01:48:28 UTC
For reference, this is the cluster of BZ related to this issue: bug 1281713, bug 1299876, bug 1299875, bug 1361487, bug 1361490, bug 1361488, bug 1375520

Comment 16 aihua liang 2016-12-28 03:16:33 UTC
Have verified, the problem still exist.

Test Version:
 kernel version: 2.6.32-680.el6.x86_64
 qemu-kvm-rhev version: qemu-kvm-rhev-0.12.1.2-2.499.el6.x86_64

Test Step:
1.Create 25G qcow2 image, install win2012r2 on it.

2.Full fill disk by copying the image, until prompt "No space left on device" appear.

3.Start guest by qemu cmd:
MALLOC_PERTURB_=1  /usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-machine rhel6.6.0  \
-nodefaults  \
-vga std \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20161227-002429-zWDOQysC,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control  \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20161227-002429-zWDOQysC,server,nowait \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pvpanic,ioport=0x505,id=idBpoMGZ  \
-chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20161227-002429-zWDOQysC,server,nowait \
-device isa-serial,chardev=serial_id_serial0  \
-chardev socket,id=seabioslog_id_20161227-002429-zWDOQysC,path=/var/tmp/seabios-20161227-002429-zWDOQysC,server,nowait \
-device isa-debugcon,chardev=seabioslog_id_20161227-002429-zWDOQysC,iobase=0x402 \
-device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
-device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pci.0 \
-device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pci.0 \
-device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pci.0 \
-drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/usr/share/avocado/data/avocado-vt/images/win2012-64r2-virtio.qcow2 \
-device ide-drive,id=image1,drive=drive_image1,bus=ide.0,unit=0 \
-device virtio-net-pci,mac=9a:b1:b2:b3:b4:b5,id=idFb0lIj,vectors=4,netdev=idNlTvHu,bus=pci.0,addr=04  \
-netdev tap,id=idNlTvHu,vhost=on \
-m 4096  \
-smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
-cpu 'SandyBridge' \
-drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/usr/share/avocado/data/avocado-vt/isos/ISO/Win2012R2/en_windows_server_2012_r2_with_update_x64_dvd_6052708.iso \
-device ide-drive,id=cd1,drive=drive_cd1,bus=ide.0,unit=1 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=05 \
-drive id=drive_winutils,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/usr/share/avocado/data/avocado-vt/isos/windows/winutils.iso \
-device scsi-cd,id=winutils,drive=drive_winutils \
-vnc :0  \
-rtc base=localtime,clock=host,driftfix=slew  \
-boot order=cdn,menu=off,strict=off \
-enable-kvm \
-monitor stdio \

4.Wait until error info "block I/O error in device 'drive_image1': No space left on device(28)"appear in hmp monitor, check vm status:
  (qemu)info status                   -----> VM status:paused(io-error)

5.Reset vm
  (qemu)system_reset                  -----> VM no response

6.Check vm status
  (qemu)info status                   -----> VM status:paused(io-error)


From step4,5,6, we can see that the problem hasn't been resolved, so change the bug status to Assigned.

Comment 18 aihua liang 2017-01-10 08:55:36 UTC
According to fam's suggestion in bz1361490, i lack test step "cont" after "system_reset". so retest it, details as bellow.

Test Version:
 kernel version: 2.6.32-681.el6.x86_64
 qemu-kvm-rhev version: qemu-kvm-rhev-0.12.1.2-2.499.el6.x86_64

Test Step:
1.Create 25G qcow2 image, install win2012r2 on it.

2.Full fill disk by copying the image, until prompt "No space left on device" appear.

3.Start guest by qemu cmd:
MALLOC_PERTURB_=1  /usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-machine rhel6.6.0  \
-nodefaults  \
-vga std \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20161227-002429-zWDOQysC,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control  \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20161227-002429-zWDOQysC,server,nowait \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pvpanic,ioport=0x505,id=idBpoMGZ  \
-chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20161227-002429-zWDOQysC,server,nowait \
-device isa-serial,chardev=serial_id_serial0  \
-chardev socket,id=seabioslog_id_20161227-002429-zWDOQysC,path=/var/tmp/seabios-20161227-002429-zWDOQysC,server,nowait \
-device isa-debugcon,chardev=seabioslog_id_20161227-002429-zWDOQysC,iobase=0x402 \
-device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
-device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pci.0 \
-device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pci.0 \
-device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pci.0 \
-drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/usr/share/avocado/data/avocado-vt/images/win2012-64r2-virtio.qcow2 \
-device ide-drive,id=image1,drive=drive_image1,bus=ide.0,unit=0 \
-device virtio-net-pci,mac=9a:b1:b2:b3:b4:b5,id=idFb0lIj,vectors=4,netdev=idNlTvHu,bus=pci.0,addr=04  \
-netdev tap,id=idNlTvHu,vhost=on \
-m 4096  \
-smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
-cpu 'SandyBridge' \
-drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/usr/share/avocado/data/avocado-vt/isos/ISO/Win2012R2/en_windows_server_2012_r2_with_update_x64_dvd_6052708.iso \
-device ide-drive,id=cd1,drive=drive_cd1,bus=ide.0,unit=1 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=05 \
-drive id=drive_winutils,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/usr/share/avocado/data/avocado-vt/isos/windows/winutils.iso \
-device scsi-cd,id=winutils,drive=drive_winutils \
-vnc :0  \
-rtc base=localtime,clock=host,driftfix=slew  \
-boot order=cdn,menu=off,strict=off \
-enable-kvm \
-monitor stdio \

4.Wait until error info "block I/O error in device 'drive_image1': No space left on device(28)"appear in hmp monitor, check vm status:
  (qemu)info status                   -----> VM status:paused(io-error)

5.Reset vm
  (qemu)system_reset                  -----> VM no response

6.Cont vm
  (qemu)cont                          -----> VM restart and load win2012r2 then appear error msg "block I/O error in device 'drive_image1': No space left on device (28)"

7.Check vm status
  (qemu)info status                   -----> VM status:paused(io-error)


8.Release some disk space, repeat step5~6.

Test Result:
After step8, vm works normally with status "running".


So, the problem has been resolved, change its status to "Verified".

Comment 21 errata-xmlrpc 2017-03-21 09:35:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0621.html