Bug 1346237
Summary: | win 10.x86_64 guest coredump when execute avocado test case: win_virtio_update.install_driver | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Yanan Fu <yfu> | ||||
Component: | qemu-kvm-rhev | Assignee: | Stefan Hajnoczi <stefanha> | ||||
Status: | CLOSED ERRATA | QA Contact: | FuXiangChun <xfu> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.3 | CC: | chayang, jsnow, juzhang, knoel, mrezanin, stefanha, virt-maint, yfu | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Windows | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-rhev-2.6.0-11.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-11-07 21:17:00 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
1. I'm unable to download the core file ("Forbidden. You don't have permission to access /pub/section2/coredump/var/crash/yfu/bug-1346237/core.10236 on this server."). 2. Is the Windows guest is a new installation or a prepared one? Because using the same qemu command line I'm unable to start Windows installation on my host. Please try this build: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11223356 It includes a fix for IDE/DMA helpers which should solve the bdrv_aio_cancel() abort(3) you experienced. This bug appears to occur when Windows issues an IDE TRIM request. I'm not sure how often Windows does this so it may be hard to reproduce again. Fix included in qemu-kvm-rhev-2.6.0-11.el7 rerun the avocado case "win_virtio_update.install_driver" for many times, but it always be blocked by one autotest bug: Bug 1233526 - [KVM-AUTOTEST] win_virtio_update.install_driver:Unhandled ShellTimeoutError: Timeout expired while waiting for shell command to complete: 'cmd /c E:\\install_driver.bat F:\\NetKVM\\w7\\amd64' (WIn7, Win8) So i have not reproduce this bug yet. And the autotest bug is under handling in avocado framework. I will update the test result later. Hi Stefan, Just as you have said in comment 5, this bug is so hard to be reproduced again. And the autotest case "win_virtio_update.install_driver" has been replaced by "single_driver_install", i have rerun this case for 50 times,can not hit this issue. With fixed version: kernel-3.10.0-501.el7.x86_64 qemu-kvm-rhev-2.6.0-23.el7.x86_64 Rerun "singel_driver_install" for 10*5 = 50 times, can not hit this issue too. job link: http://10.66.4.244/kvm_autotest_job_log/?jobid=1495497 Do you have a reproducer that can easy trigger this bug? or can we verify it with the result of my job? Here is a deterministic reproducer using a Linux guest: $ qemu-img create -f qcow2 -o preallocation=full test.qcow2 128M This is a fully preallocated image so all clusters have been created and are filled with zeroes. We will send an IDE TRIM request to discard a cluster and then reboot the guest to trigger the same code path as Windows. $ qemu-system-x86_64 -enable-kvm -m 1024 -drive if=ide,id=ide-drive,discard=unmap,file=blkdebug::test.qcow2,format=qcow2 -drive if=none,id=virtio-drive,file=rhel72.img,format=raw -device virtio-blk-pci,drive=virtio-drive,bootindex=0 The IDE drive is for the test and the virtio-blk device is just there to boot a Linux guest. Notice that the IDE drive has discard=unmap so the TRIM request will be handled instead of ignored. The blkdebug protocol in the filename enables debugging support that let's us suspend the TRIM request to make this reproducer reliable and not based on timing. (qemu) qemu-io ide-drive "break cluster_free A" This adds an I/O request breakpoint. When qcow2 processes a discard request it will free a cluster and the breakpoint suspends the request at that time. It's as if we have an infinitely slow disk. This way we avoid race conditions in the test steps. guest# blkdiscard -l 65536 /dev/sda Submit an IDE TRIM request for a full 64 KB qcow2 cluster. This causes qcow2 to free a cluster and triggers our I/O request breakpoint. blkdiscard(8) should hang inside the guest because it is waiting for the suspended IDE TRIM request to complete. (qemu) system_reset Now reboot the guest to trigger the ide_bus_reset()/bdrv_aio_cancel() code path. Expected behavior (qemu-kvm-rhev-2.6.0-11.el7): QEMU hangs because we didn't provide a way to resume the suspended IDE TRIM request: blkdebug: Suspended request 'A' (In the Windows scenario QEMU will not hang because the IDE TRIM request isn't suspended. As soon as the request completes the guest will reboot and be responsive.) Actual behavior (qemu-kvm-rhev-2.6.0-10.el7): QEMU calls abort(3): blkdebug: Suspended request 'A' Aborted (core dumped) Thanks for stefanha's reproducer. ------------------------reproduce------------------- Test version: qemu: qemu-kvm-rhev-2.6.0-10.el7.x86_64 guest: rhel7.3 Test steps: 1. Create one preallocated image. #qemu-img create -f qcow2 -o preallocation=full test.qcow2 128M 2. Boot one guest with following commands: -drive id=ide-drive,if=ide,discard=unmap,file=blkdebug::/home/test.qcow2,format=qcow2 \ -drive id=virtio-drive,if=none,file=/home/RHEL-Server-7.3-64-virtio.qcow2,format=qcow2 \ -device virtio-blk-pci,drive=virtio-drive,id=virtio-blk-disk,bootindex=0 \ 3. In host qemu monitor, execute: (qemu) qemu-io ide-drive "break cluster_free A" 4. In guest, execute: # blkdiscard -l 65536 /dev/sda It will block in guest,and qemu print "blkdebug: Suspended request 'A'" 5. In qemu monitor: # system_reset QEMU coredump , (qemu) ***** Aborted (core dumped) reproduce this bug successfully. ------------------------verification------------------- Test version: qemu: qemu-kvm-rhev-2.6.0-25.el7.x86_64 guest: rhel7.3 Test steps: Same test steps with above. And after system_reset in step 5, guest hang, can not input with qemu monitor, only "kill -9 $QEMU_PID" can quit. According to the comment 14 and the test result above, move it to VERIFIED CLI: /usr/libexec/qemu-kvm \ -enable-kvm \ -m 2048 \ -drive id=ide-drive,if=ide,discard=unmap,file=blkdebug::/home/test.qcow2,format=qcow2 \ -drive id=virtio-drive,if=none,file=/home/win2012r2-virtio-blk.qcow2,format=qcow2 \ -device virtio-blk-pci,drive=virtio-drive,id=virtio-blk-disk,bootindex=0 \ -usb \ -device usb-tablet \ -vnc :0 \ -monitor stdio Be sure, do not add other command lines, because: "there is a chance that some options could involve a call to bdrv_drain_all() inside QEMU. This function waits until all I/O requests have completed. That would hang QEMU (including the monitor)". ---->analysis from stefanha. so you can not input "system_reset" in step 5, and this bug need system_reset to trigger. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2673.html |
Created attachment 1167829 [details] avocado test log for this bug Description of problem: This issue was hit with avocado test case win_virtio_update.install_driver. After "Install drivers balloon", then reboot the guest, coredump occur. Both intel and amd host all hit this issue. Version-Release number of selected component (if applicable): kernel:3.10.0-422.el7.x86_64 qemu:qemu-kvm-rhev-2.6.0-5.el7.x86_64 virtio-win: virtio-win-1.8.0-4.iso How reproducible: 100% Steps to Reproduce: 1.Boot one win10.x86_64 guest 2.Install driver balloon from virtio-win Actual results: guest coredump Expected results: win_virtio_update.install_driver should finished successfully. Additional info: #gdb /usr/libexec/qemu-kvm core.10236 (gdb) bt #0 0x00007ff893b935f7 in raise () from /lib64/libc.so.6 #1 0x00007ff893b94ce8 in abort () from /lib64/libc.so.6 #2 0x00007ff89c13b3d6 in bdrv_aio_cancel (acb=0x7ff8a441c0a0) at block/io.c:2048 #3 0x00007ff89c130535 in blk_aio_cancel (acb=<optimized out>) at block/block-backend.c:1044 #4 0x00007ff89c040d5a in ide_bus_reset (bus=bus@entry=0x7ff8a28e7480) at hw/ide/core.c:2326 #5 0x00007ff89c044088 in piix3_reset (opaque=0x7ff8a28e6c00) at hw/ide/piix.c:115 #6 0x00007ff89bfd18dd in qemu_devices_reset () at vl.c:1738 #7 0x00007ff89bf4d216 in pc_machine_reset () at /usr/src/debug/qemu-2.6.0/hw/i386/pc.c:1936 #8 0x00007ff89bfd1946 in qemu_system_reset (report=report@entry=true) at vl.c:1751 #9 0x00007ff89bec879b in main_loop_should_exit () at vl.c:1898 #10 main_loop () at vl.c:1938 #11 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4667