Red Hat Bugzilla – Bug 1299876
system_reset should clear pending request for error (IDE)
Last modified: 2017-08-01 23:24:58 EDT
+++ This bug was initially created as a clone of Bug #1299875 +++ +++ This bug was initially created as a clone of Bug #1281713 +++ Description of problem: qemu-kvm quit with Segmentation fault after execute system_reset when no space left on host. Version-Release number of selected component (if applicable): qemu-img-0.12.1.2-2.481.el6.x86_64 qemu-kvm-tools-0.12.1.2-2.481.el6.x86_64 qemu-kvm-0.12.1.2-2.481.el6.x86_64 qemu-guest-agent-0.12.1.2-2.481.el6.x86_64 qemu-kvm-debuginfo-0.12.1.2-2.481.el6.x86_64 2.6.32-583.el6.x86_64 How reproducible: 70% Steps to Reproduce: 1.Create a 25G win2012.qcow2 image and install a windows2012r2 guest. 2.In guest located filesystem, make it out of space by copy guest image several times until no space left on device prompt. Launch guest by qemu-kvm command: /usr/libexec/qemu-kvm -name win2012 -m 2048 \ -cpu Opteron_G4 \ -smp 1,cores=1,threads=2,sockets=2,maxcpus=4 \ -vga qxl\ -serial unix:/tmp/m,server,nowait \ -drive file=win2012-64r2-virtio-scsi.qcow2,if=none,id=drive-scsi-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=drive-scsi-disk0,bus=scsi0.0,scsi-id=0,lun=0,id=scsi-disk0,bootindex=1 \ -monitor stdio \ -usb -device usb-kbd,id=input0 \ -vnc :1 3. Interact with guest by browsing internet or other things until you see "block I/O error in device 'ide0-hd0': No space left on device (28)" prompt from qemu-kvm monitor(Prompt usually happen within 5 minutes), input system_reset in qemu monitor. And Segmentation fault will happen. Actual results: qemu-kvm quit with Segmentation fault after execute system_reset Expected results: qemu-kvm process should still alive and guest system can be reset without error Additional info: Stack info: Core was generated by `/usr/libexec/qemu-kvm -name win2012 -m 2048 -cpu SandyBridge -smp 2,cores=1,thr'. Program terminated with signal 11, Segmentation fault. #0 0x00007f85d1b04a90 in ?? () (gdb) bt #0 0x00007f85d1b04a90 in ?? () #1 0x00007f85d03f5aee in bdrv_aio_cancel (acb=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3842 #2 0x00007f85d052d46a in ide_dma_cancel (bm=0x7f85d26e1160) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2395 #3 0x00007f85d052d499 in ide_dma_reset (bm=0x7f85d26e1160) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2408 #4 0x00007f85d05335ad in piix3_reset (opaque=0x7f85d26e0010) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/piix.c:124 #5 0x00007f85d03b71d2 in qemu_system_reset (report=true) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3417 #6 0x00007f85d03dd050 in qemu_kvm_system_reset (report=true) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1992 #7 0x00007f85d03dd253 in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2272 #8 0x00007f85d03be317 in main_loop (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4273 #9 main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6731 Qemu-kvm won't quit with Segmentation fault on Opteron_G5 host but windows guest cannot be reset after system_reset. --- Additional comment from Markus Armbruster on 2015-11-24 14:37:15 BRST --- Can you reproduce this with a qemu-kvm built with --enable-debug? --- Additional comment from Guo, Zhiyi on 2015-11-27 00:00:55 BRST --- Hi, I guess you may want to see the function call ?? or argument value has been optimized. Stack trace still the same as reported in description. I have enabled --enable-debug option and rebuild the qemu-kvm. -g option has been added to compile procedure from configure file: + ../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE' --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster,rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse --disable-sdl --disable-curses --disable-curl --disable-check-utests --disable-bluez --enable-docs --disable-vde --disable-spice --trace-backend=nop --enable-smartcard --disable-smartcard-nss --enable-mixemu Install prefix /usr BIOS directory /usr/share/qemu binary directory /usr/bin local state directory /var Manual directory /usr/share/man ELF interp prefix /usr/gnemul/qemu-%M Source path /root/rpmbuild/BUILD/qemu-kvm-0.12.1.2 C compiler gcc Host C compiler gcc CFLAGS -O2 -g BR/ Zhiyi --- Additional comment from Markus Armbruster on 2015-11-27 05:17:53 BRST --- I can't see --enable-debug in your configure line. I can see -O2. You need to get one roughly like this: ../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-g -pipe -Wall -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE' --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster,rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse --disable-sdl --disable-curses --disable-curl --disable-check-utests --disable-bluez --enable-docs --disable-vde --disable-spice --trace-backend=nop --enable-smartcard --disable-smartcard-nss --enable-mixemu Please try again :) --- Additional comment from Guo, Zhiyi on 2015-11-27 09:40:59 BRST --- (In reply to Markus Armbruster from comment #4) > I can't see --enable-debug in your configure line. I can see -O2. You need > to get one roughly like this: > > ../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id > -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-g -pipe -Wall -fexceptions > -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE' > --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr > --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen > --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster, > rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse > --disable-sdl --disable-curses --disable-curl --disable-check-utests > --disable-bluez --enable-docs --disable-vde --disable-spice > --trace-backend=nop --enable-smartcard --disable-smartcard-nss > --enable-mixemu > > Please try again :) Stack trace with none optimized code: (gdb) bt #0 0x00007f1e571cfb10 in ?? () #1 0x00007f1e55ebd5ed in bdrv_aio_cancel_async (acb=0x7f1e571cfc10) at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3876 #2 0x00007f1e55ebd499 in bdrv_aio_cancel (acb=0x7f1e571cfc10) at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3842 #3 0x00007f1e56008f37 in ide_dma_cancel (bm=0x7f1e57dac160) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2395 #4 0x00007f1e56008f5d in ide_dma_reset (bm=0x7f1e57dac160) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2408 #5 0x00007f1e5600c755 in piix3_reset (opaque=0x7f1e57dab010) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/piix.c:124 #6 0x00007f1e55e6765b in qemu_system_reset (report=true) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3417 #7 0x00007f1e55e990ad in qemu_kvm_system_reset (report=true) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1992 #8 0x00007f1e55e9997d in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2272 #9 0x00007f1e55e683ba in main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4273 #10 0x00007f1e55e6d451 in main (argc=24, argv=0x7fff7f33ac18, envp=0x7fff7f33ace0) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6731 --- Additional comment from Markus Armbruster on 2015-11-27 11:17:42 BRST --- Aha! acb->aiocb_info->cancel_async seems to be garbage. Hunch: use after free? Chimes with your report that Opteron_G5 fails differently... Please reproduce with your debug build of qemu-kvm under valgrind, and capture valgrind's report. --- Additional comment from Guo, Zhiyi on 2015-12-01 06:32 BRST --- Log generated on Valgrind 3.11.0, Valgrind 3.8.1 will core dump under same steps --- Additional comment from Guo, Zhiyi on 2015-12-01 07:18 BRST --- --- Additional comment from Markus Armbruster on 2015-12-01 08:01:26 BRST --- valgrind is reporting a huge number of unrelated issues, probably in part because we lack upstream patches to suppress false positives. It hits a cutoff and stops reporting some time before the crash. Please try again with --error-limit=no. Additional question: is qemu-kvm-rhev affected as well? --- Additional comment from Guo, Zhiyi on 2015-12-02 06:09 BRST --- Issue also can be reproduced on rhel7.2 intel skylake host with rhev: kernel:3.10.0-334.el7.x86_64 qemu-kvm-rhev-2.3.0-31.el7.x86_64 qemu-kvm-rhev-debuginfo-2.3.0-31.el7.x86_64 qemu-img-rhev-2.3.0-31.el7.x86_64 qemu-kvm-tools-rhev-2.3.0-31.el7.x86_64 qemu-kvm-common-rhev-2.3.0-31.el7.x86_64 Attachment include valgrind log reproduced on rhel6.7 and rhel7.2. rhev packages have been compiled with -g and without -O2 optimize. valgrind log generate with option --error-limit=no --- Additional comment from Guo, Zhiyi on 2015-12-02 06:12:38 BRST --- Command used to reproduce the issue and capture valgrind log: valgrind --log-file=valgrind.txt --error-limit=no /usr/libexec/qemu-kvm -name win2012 -m 2048 -smp 4 -cpu host -vga qxl -vnc :1 -monitor stdio -hda win2012.qcow2 --- Additional comment from Guo, Zhiyi on 2015-12-02 06:27 BRST --- Mistake valgrind log on rhel6.7 please ignore attachment in comment 10 and use log in this comment.
The backtrace is similar to the one in bug 1346237 (thanks to Stefan Hajnoczi for bringing it to my attention). Can you still reproduce it with qemu-kvm-rhev-2.6.0-11.el7?
Thanks to Laszlo's simple reproduce steps, I can reproduce this issue against qemu-kvm-rhev-2.6.0-15.el7.x86_64 Steps: 1. #qemu-img create -f qcow2 test.qcow2 8G 2. #ulimit -f 256 3. #gdb /usr/libexec/qemu-kvm (gdb) run -m 2048 -smp 2 -drive file=test.qcow2,werror=stop,rerror=stop,cache=writeback,id=hd0,if=none -device ide-hd,drive=hd0 -drive id=cd0,readonly,media=cdrom,cache=writeback,if=none,file=Fedora-Server-dvd-x86_64-24-1.2.iso -device ide-cd,drive=cd0 -vnc :0 -monitor stdio 4. Install fedora from vnc and guest will hang very soon during installation. 5. Do system_reset from qmp and qemu will crash. Backtrace: (qemu) Program received signal SIGSEGV, Segmentation fault. 0x0000555556b750f0 in ?? () (gdb) bt #0 0x0000555556b750f0 in ?? () #1 0x000055555593cc4a in bdrv_aio_cancel_async (acb=0x555556b75570) at block/io.c:2060 #2 bdrv_aio_cancel (acb=0x555556b75570) at block/io.c:2041 #3 0x0000555555931ce5 in blk_aio_cancel (acb=<optimized out>) at block/block-backend.c:1044 #4 0x000055555584133a in ide_bus_reset (bus=bus@entry=0x5555597bf0d8) at hw/ide/core.c:2326 #5 0x0000555555844674 in piix3_reset (opaque=0x5555597be000) at hw/ide/piix.c:115 #6 0x00005555557d1abd in qemu_devices_reset () at vl.c:1738 #7 0x000055555574d166 in pc_machine_reset () at /usr/src/debug/qemu-2.6.0/hw/i386/pc.c:1936 #8 0x00005555557d1b26 in qemu_system_reset (report=report@entry=true) at vl.c:1751 #9 0x00005555556c795b in main_loop_should_exit () at vl.c:1898 #10 main_loop () at vl.c:1938 #11 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4664
Moving back from POST to ASSIGNED as the 7.4 tree isn't open for submissions yet, and we decided not to include this for 7.3 as of 2016-09-19. There is another bug proposed for Z-stream (BZ#1375520) which MAY require the same fix as this bug, so it is possible we may change our minds again in the near future based on analysis of that bug. (While you're here reading bugzilla comments: The two bugs currently trigger the same exact stack trace, but the triggering mechanism appears to be different between the two BZs, hence the separate entries.) --js
For reference, this is the cluster of BZ related to this issue: bug 1281713, bug 1299876, bug 1299875, bug 1361487, bug 1361490, bug 1361488, bug 1375520
Hi, John Test on RHEV7.4+qemu-kvm-rhev 2.9, the problem has been resolved, so please help to handle the bug, change its status to the correct one, thanks *******Test Detail************** Test Version: kernel version:3.10.0-623.el7.x86_64 qemu-kvm-rhev version:qemu-kvm-rhev-2.9.0-0.el7.mrezanin201703210848.x86_64 Test Steps: 1.Install a guest, ex, win2012 2.Full write host disk. 3.Start guest with qemu cmds, then start some apps on guest until guest hang /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox off \ -machine pc \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20161227-001116-PD2k1uXB,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20161227-001116-PD2k1uXB,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=id95e1vw \ -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20161227-001116-PD2k1uXB,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20161227-001116-PD2k1uXB,path=/var/tmp/seabios-20161227-001116-PD2k1uXB,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20161227-001116-PD2k1uXB,iobase=0x402 \ -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \ -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pci.0 \ -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pci.0 \ -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pci.0 \ -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/win2012-64-virtio.qcow2 \ -device ide-hd,id=image1,drive=drive_image1,bus=ide.0,unit=0 \ -device virtio-net-pci,mac=9a:d7:d8:d9:da:db,id=idoYMY7R,vectors=4,netdev=iddvjhTd,bus=pci.0,addr=03 \ -netdev tap,id=iddvjhTd,vhost=on \ -m 4096 \ -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \ -cpu host \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot order=cdn,menu=off,strict=off \ -enable-kvm \ -monitor stdio \ -spice port=3000,ipv4,disable-ticketing \ 4.Check vm status (qemu)info status -------> vm status:paused(io-error) 5. Reset vm (qemu)system_reset (qemu)c ------->vm restart 6. Release some host space, then reset vm (qemu)system_reset (qemu)c -------> vm restart and work normally
Hi, John The problem has been resolved on RHEV7.4+qemu-kvm-rhev 2.9, please help to handle it to the correct status, thanks.
OK, I think it's up to QE to mark it as ON_QA or VERIFIED, so from the Dev perspective I'll mark it as MODIFIED to signify that the fix is in the tree. Hopefully this moves the BZ back into the normal flow of things.
As the fix has been in the tree and verified as "pass", we change bug's status to "Verified", thanks...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392