Bug 1281713
Summary: | system_reset should clear pending request for error (IDE) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Guo, Zhiyi <zhguo> | ||||||||||
Component: | qemu-kvm | Assignee: | John Snow <jsnow> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | aihua liang <aliang> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 6.7 | CC: | ailan, coli, jsnow, juzhang, michen, mkenneth, qzhang, rbalakri, virt-maint, zhguo | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Windows | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | qemu-kvm-0.12.1.2-2.494.el6 | Doc Type: | If docs needed, set a value | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | |||||||||||||
: | 1299875 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2017-03-21 09:35:12 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1299875, 1299876 | ||||||||||||
Attachments: |
|
Description
Guo, Zhiyi
2015-11-13 09:03:11 UTC
Can you reproduce this with a qemu-kvm built with --enable-debug? Hi, I guess you may want to see the function call ?? or argument value has been optimized. Stack trace still the same as reported in description. I have enabled --enable-debug option and rebuild the qemu-kvm. -g option has been added to compile procedure from configure file: + ../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE' --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster,rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse --disable-sdl --disable-curses --disable-curl --disable-check-utests --disable-bluez --enable-docs --disable-vde --disable-spice --trace-backend=nop --enable-smartcard --disable-smartcard-nss --enable-mixemu Install prefix /usr BIOS directory /usr/share/qemu binary directory /usr/bin local state directory /var Manual directory /usr/share/man ELF interp prefix /usr/gnemul/qemu-%M Source path /root/rpmbuild/BUILD/qemu-kvm-0.12.1.2 C compiler gcc Host C compiler gcc CFLAGS -O2 -g BR/ Zhiyi I can't see --enable-debug in your configure line. I can see -O2. You need to get one roughly like this: ../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-g -pipe -Wall -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE' --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster,rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse --disable-sdl --disable-curses --disable-curl --disable-check-utests --disable-bluez --enable-docs --disable-vde --disable-spice --trace-backend=nop --enable-smartcard --disable-smartcard-nss --enable-mixemu Please try again :) (In reply to Markus Armbruster from comment #4) > I can't see --enable-debug in your configure line. I can see -O2. You need > to get one roughly like this: > > ../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id > -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-g -pipe -Wall -fexceptions > -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE' > --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr > --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen > --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster, > rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse > --disable-sdl --disable-curses --disable-curl --disable-check-utests > --disable-bluez --enable-docs --disable-vde --disable-spice > --trace-backend=nop --enable-smartcard --disable-smartcard-nss > --enable-mixemu > > Please try again :) Stack trace with none optimized code: (gdb) bt #0 0x00007f1e571cfb10 in ?? () #1 0x00007f1e55ebd5ed in bdrv_aio_cancel_async (acb=0x7f1e571cfc10) at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3876 #2 0x00007f1e55ebd499 in bdrv_aio_cancel (acb=0x7f1e571cfc10) at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3842 #3 0x00007f1e56008f37 in ide_dma_cancel (bm=0x7f1e57dac160) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2395 #4 0x00007f1e56008f5d in ide_dma_reset (bm=0x7f1e57dac160) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2408 #5 0x00007f1e5600c755 in piix3_reset (opaque=0x7f1e57dab010) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/piix.c:124 #6 0x00007f1e55e6765b in qemu_system_reset (report=true) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3417 #7 0x00007f1e55e990ad in qemu_kvm_system_reset (report=true) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1992 #8 0x00007f1e55e9997d in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2272 #9 0x00007f1e55e683ba in main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4273 #10 0x00007f1e55e6d451 in main (argc=24, argv=0x7fff7f33ac18, envp=0x7fff7f33ace0) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6731 Aha! acb->aiocb_info->cancel_async seems to be garbage. Hunch: use after free? Chimes with your report that Opteron_G5 fails differently... Please reproduce with your debug build of qemu-kvm under valgrind, and capture valgrind's report. Created attachment 1100766 [details]
valgrind log on intel broadwell host
Log generated on Valgrind 3.11.0, Valgrind 3.8.1 will core dump under same steps
Created attachment 1100779 [details]
valgrind log on amd Opteron 6376
valgrind is reporting a huge number of unrelated issues, probably in part because we lack upstream patches to suppress false positives. It hits a cutoff and stops reporting some time before the crash. Please try again with --error-limit=no. Additional question: is qemu-kvm-rhev affected as well? Created attachment 1101353 [details]
valgrind log with option --error-limit=no
Issue also can be reproduced on rhel7.2 intel skylake host with rhev:
kernel:3.10.0-334.el7.x86_64
qemu-kvm-rhev-2.3.0-31.el7.x86_64
qemu-kvm-rhev-debuginfo-2.3.0-31.el7.x86_64
qemu-img-rhev-2.3.0-31.el7.x86_64
qemu-kvm-tools-rhev-2.3.0-31.el7.x86_64
qemu-kvm-common-rhev-2.3.0-31.el7.x86_64
Attachment include valgrind log reproduced on rhel6.7 and rhel7.2. rhev packages have been compiled with -g and without -O2 optimize. valgrind log generate with option --error-limit=no
Command used to reproduce the issue and capture valgrind log: valgrind --log-file=valgrind.txt --error-limit=no /usr/libexec/qemu-kvm -name win2012 -m 2048 -smp 4 -cpu host -vga qxl -vnc :1 -monitor stdio -hda win2012.qcow2 Created attachment 1101356 [details] correct valgrind log including rhel6.7 and rhel7.2 Mistake valgrind log on rhel6.7 please ignore attachment in comment 10 and use log in this comment. Fix included in qemu-kvm-0.12.1.2-2.494.el6 For reference, this is the cluster of BZ related to this issue: bug 1281713, bug 1299876, bug 1299875, bug 1361487, bug 1361490, bug 1361488, bug 1375520 Have verified, the problem still exist. Test Version: kernel version: 2.6.32-680.el6.x86_64 qemu-kvm-rhev version: qemu-kvm-rhev-0.12.1.2-2.499.el6.x86_64 Test Step: 1.Create 25G qcow2 image, install win2012r2 on it. 2.Full fill disk by copying the image, until prompt "No space left on device" appear. 3.Start guest by qemu cmd: MALLOC_PERTURB_=1 /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -machine rhel6.6.0 \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20161227-002429-zWDOQysC,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20161227-002429-zWDOQysC,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idBpoMGZ \ -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20161227-002429-zWDOQysC,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20161227-002429-zWDOQysC,path=/var/tmp/seabios-20161227-002429-zWDOQysC,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20161227-002429-zWDOQysC,iobase=0x402 \ -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \ -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pci.0 \ -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pci.0 \ -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pci.0 \ -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/usr/share/avocado/data/avocado-vt/images/win2012-64r2-virtio.qcow2 \ -device ide-drive,id=image1,drive=drive_image1,bus=ide.0,unit=0 \ -device virtio-net-pci,mac=9a:b1:b2:b3:b4:b5,id=idFb0lIj,vectors=4,netdev=idNlTvHu,bus=pci.0,addr=04 \ -netdev tap,id=idNlTvHu,vhost=on \ -m 4096 \ -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ -cpu 'SandyBridge' \ -drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/usr/share/avocado/data/avocado-vt/isos/ISO/Win2012R2/en_windows_server_2012_r2_with_update_x64_dvd_6052708.iso \ -device ide-drive,id=cd1,drive=drive_cd1,bus=ide.0,unit=1 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=05 \ -drive id=drive_winutils,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/usr/share/avocado/data/avocado-vt/isos/windows/winutils.iso \ -device scsi-cd,id=winutils,drive=drive_winutils \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot order=cdn,menu=off,strict=off \ -enable-kvm \ -monitor stdio \ 4.Wait until error info "block I/O error in device 'drive_image1': No space left on device(28)"appear in hmp monitor, check vm status: (qemu)info status -----> VM status:paused(io-error) 5.Reset vm (qemu)system_reset -----> VM no response 6.Check vm status (qemu)info status -----> VM status:paused(io-error) From step4,5,6, we can see that the problem hasn't been resolved, so change the bug status to Assigned. According to fam's suggestion in bz1361490, i lack test step "cont" after "system_reset". so retest it, details as bellow. Test Version: kernel version: 2.6.32-681.el6.x86_64 qemu-kvm-rhev version: qemu-kvm-rhev-0.12.1.2-2.499.el6.x86_64 Test Step: 1.Create 25G qcow2 image, install win2012r2 on it. 2.Full fill disk by copying the image, until prompt "No space left on device" appear. 3.Start guest by qemu cmd: MALLOC_PERTURB_=1 /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -machine rhel6.6.0 \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20161227-002429-zWDOQysC,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20161227-002429-zWDOQysC,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=idBpoMGZ \ -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20161227-002429-zWDOQysC,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20161227-002429-zWDOQysC,path=/var/tmp/seabios-20161227-002429-zWDOQysC,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20161227-002429-zWDOQysC,iobase=0x402 \ -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \ -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pci.0 \ -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pci.0 \ -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pci.0 \ -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/usr/share/avocado/data/avocado-vt/images/win2012-64r2-virtio.qcow2 \ -device ide-drive,id=image1,drive=drive_image1,bus=ide.0,unit=0 \ -device virtio-net-pci,mac=9a:b1:b2:b3:b4:b5,id=idFb0lIj,vectors=4,netdev=idNlTvHu,bus=pci.0,addr=04 \ -netdev tap,id=idNlTvHu,vhost=on \ -m 4096 \ -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ -cpu 'SandyBridge' \ -drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/usr/share/avocado/data/avocado-vt/isos/ISO/Win2012R2/en_windows_server_2012_r2_with_update_x64_dvd_6052708.iso \ -device ide-drive,id=cd1,drive=drive_cd1,bus=ide.0,unit=1 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=05 \ -drive id=drive_winutils,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/usr/share/avocado/data/avocado-vt/isos/windows/winutils.iso \ -device scsi-cd,id=winutils,drive=drive_winutils \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot order=cdn,menu=off,strict=off \ -enable-kvm \ -monitor stdio \ 4.Wait until error info "block I/O error in device 'drive_image1': No space left on device(28)"appear in hmp monitor, check vm status: (qemu)info status -----> VM status:paused(io-error) 5.Reset vm (qemu)system_reset -----> VM no response 6.Cont vm (qemu)cont -----> VM restart and load win2012r2 then appear error msg "block I/O error in device 'drive_image1': No space left on device (28)" 7.Check vm status (qemu)info status -----> VM status:paused(io-error) 8.Release some disk space, repeat step5~6. Test Result: After step8, vm works normally with status "running". So, the problem has been resolved, change its status to "Verified". Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0621.html |