RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1346237 - win 10.x86_64 guest coredump when execute avocado test case: win_virtio_update.install_driver
Summary: win 10.x86_64 guest coredump when execute avocado test case: win_virtio_updat...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.3
Hardware: x86_64
OS: Windows
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Stefan Hajnoczi
QA Contact: FuXiangChun
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-14 10:47 UTC by Yanan Fu
Modified: 2016-11-07 21:17 UTC (History)
8 users (show)

Fixed In Version: qemu-kvm-rhev-2.6.0-11.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-07 21:17:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
avocado test log for this bug (1.30 MB, application/x-tar)
2016-06-14 10:47 UTC, Yanan Fu
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2673 0 normal SHIPPED_LIVE qemu-kvm-rhev bug fix and enhancement update 2016-11-08 01:06:13 UTC

Description Yanan Fu 2016-06-14 10:47:13 UTC
Created attachment 1167829 [details]
avocado test log for this bug

Description of problem:
This issue was hit with avocado test case win_virtio_update.install_driver.
After "Install drivers balloon", then reboot the guest, coredump occur.
Both intel and amd host all hit this issue.

Version-Release number of selected component (if applicable):
kernel:3.10.0-422.el7.x86_64
qemu:qemu-kvm-rhev-2.6.0-5.el7.x86_64
virtio-win: virtio-win-1.8.0-4.iso

How reproducible:
100%

Steps to Reproduce:
1.Boot one win10.x86_64 guest
2.Install driver balloon from virtio-win

Actual results:
guest coredump

Expected results:
win_virtio_update.install_driver should finished successfully.

Additional info:
#gdb /usr/libexec/qemu-kvm core.10236

(gdb) bt
#0  0x00007ff893b935f7 in raise () from /lib64/libc.so.6
#1  0x00007ff893b94ce8 in abort () from /lib64/libc.so.6
#2  0x00007ff89c13b3d6 in bdrv_aio_cancel (acb=0x7ff8a441c0a0) at block/io.c:2048
#3  0x00007ff89c130535 in blk_aio_cancel (acb=<optimized out>) at block/block-backend.c:1044
#4  0x00007ff89c040d5a in ide_bus_reset (bus=bus@entry=0x7ff8a28e7480) at hw/ide/core.c:2326
#5  0x00007ff89c044088 in piix3_reset (opaque=0x7ff8a28e6c00) at hw/ide/piix.c:115
#6  0x00007ff89bfd18dd in qemu_devices_reset () at vl.c:1738
#7  0x00007ff89bf4d216 in pc_machine_reset () at /usr/src/debug/qemu-2.6.0/hw/i386/pc.c:1936
#8  0x00007ff89bfd1946 in qemu_system_reset (report=report@entry=true) at vl.c:1751
#9  0x00007ff89bec879b in main_loop_should_exit () at vl.c:1898
#10 main_loop () at vl.c:1938
#11 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4667

Comment 3 Gal Hammer 2016-06-16 11:49:19 UTC
1. I'm unable to download the core file ("Forbidden. You don't have permission to access /pub/section2/coredump/var/crash/yfu/bug-1346237/core.10236 on this server.").

2. Is the Windows guest is a new installation or a prepared one? Because using the same qemu command line I'm unable to start Windows installation on my host.

Comment 5 Stefan Hajnoczi 2016-06-20 16:26:48 UTC
Please try this build:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11223356

It includes a fix for IDE/DMA helpers which should solve the bdrv_aio_cancel() abort(3) you experienced.

This bug appears to occur when Windows issues an IDE TRIM request.  I'm not sure how often Windows does this so it may be hard to reproduce again.

Comment 10 Miroslav Rezanina 2016-07-01 08:24:10 UTC
Fix included in qemu-kvm-rhev-2.6.0-11.el7

Comment 12 Yanan Fu 2016-07-06 10:56:11 UTC
rerun the avocado case "win_virtio_update.install_driver" for many times, but it always be blocked by one autotest bug:
Bug 1233526 - [KVM-AUTOTEST] win_virtio_update.install_driver:Unhandled ShellTimeoutError: Timeout expired while waiting for shell command to complete: 'cmd /c E:\\install_driver.bat F:\\NetKVM\\w7\\amd64' (WIn7, Win8) 

So i have not reproduce this bug yet.
And the autotest bug is under handling in avocado framework. I will update the test result later.

Comment 13 Yanan Fu 2016-09-09 10:50:24 UTC
Hi Stefan,
Just as you have said in comment 5, this bug is so hard to be reproduced again.
And the autotest case "win_virtio_update.install_driver" has been replaced by "single_driver_install", i have rerun this case for 50 times,can not hit this issue.

With fixed version: 
kernel-3.10.0-501.el7.x86_64
qemu-kvm-rhev-2.6.0-23.el7.x86_64

Rerun "singel_driver_install" for 10*5 = 50 times, can not hit this issue too.
job link:
http://10.66.4.244/kvm_autotest_job_log/?jobid=1495497

Do you have a reproducer that can easy trigger this bug? or can we verify it with the result of my job?

Comment 14 Stefan Hajnoczi 2016-09-14 11:47:11 UTC
Here is a deterministic reproducer using a Linux guest:

$ qemu-img create -f qcow2 -o preallocation=full test.qcow2 128M

This is a fully preallocated image so all clusters have been created and are filled with zeroes.  We will send an IDE TRIM request to discard a cluster and then reboot the guest to trigger the same code path as Windows.

$ qemu-system-x86_64 -enable-kvm -m 1024 -drive if=ide,id=ide-drive,discard=unmap,file=blkdebug::test.qcow2,format=qcow2 -drive if=none,id=virtio-drive,file=rhel72.img,format=raw -device virtio-blk-pci,drive=virtio-drive,bootindex=0

The IDE drive is for the test and the virtio-blk device is just there to boot a Linux guest.  Notice that the IDE drive has discard=unmap so the TRIM request will be handled instead of ignored.  The blkdebug protocol in the filename enables debugging support that let's us suspend the TRIM request to make this reproducer reliable and not based on timing.

(qemu) qemu-io ide-drive "break cluster_free A"

This adds an I/O request breakpoint.  When qcow2 processes a discard request it will free a cluster and the breakpoint suspends the request at that time.  It's as if we have an infinitely slow disk.  This way we avoid race conditions in the test steps.

guest# blkdiscard -l 65536 /dev/sda

Submit an IDE TRIM request for a full 64 KB qcow2 cluster.  This causes qcow2 to free a cluster and triggers our I/O request breakpoint.  blkdiscard(8) should hang inside the guest because it is waiting for the suspended IDE TRIM request to complete.

(qemu) system_reset

Now reboot the guest to trigger the ide_bus_reset()/bdrv_aio_cancel() code path.

Expected behavior (qemu-kvm-rhev-2.6.0-11.el7):

QEMU hangs because we didn't provide a way to resume the suspended IDE TRIM request:
blkdebug: Suspended request 'A'

(In the Windows scenario QEMU will not hang because the IDE TRIM request isn't suspended.  As soon as the request completes the guest will reboot and be responsive.)

Actual behavior (qemu-kvm-rhev-2.6.0-10.el7):

QEMU calls abort(3):
blkdebug: Suspended request 'A'
Aborted (core dumped)

Comment 15 Yanan Fu 2016-09-15 03:42:00 UTC
Thanks for stefanha's reproducer.

------------------------reproduce-------------------
Test version:
qemu: qemu-kvm-rhev-2.6.0-10.el7.x86_64
guest: rhel7.3

Test steps:
1. Create one preallocated image.
    #qemu-img create -f qcow2 -o preallocation=full test.qcow2 128M

2. Boot one guest with following commands:
    -drive id=ide-drive,if=ide,discard=unmap,file=blkdebug::/home/test.qcow2,format=qcow2 \
    -drive id=virtio-drive,if=none,file=/home/RHEL-Server-7.3-64-virtio.qcow2,format=qcow2 \
    -device virtio-blk-pci,drive=virtio-drive,id=virtio-blk-disk,bootindex=0 \

3. In host qemu monitor, execute:
   (qemu) qemu-io ide-drive "break cluster_free A"

4. In guest, execute:
   # blkdiscard -l 65536 /dev/sda
   It will block in guest,and qemu print "blkdebug: Suspended request 'A'"

5. In qemu monitor:
   # system_reset  
   QEMU coredump , (qemu) *****  Aborted       (core dumped)

reproduce this bug successfully.


------------------------verification-------------------
Test version:
qemu: qemu-kvm-rhev-2.6.0-25.el7.x86_64
guest: rhel7.3

Test steps:
Same test steps with above.
And after system_reset in step 5, guest hang, can not input with qemu monitor, only "kill -9 $QEMU_PID" can quit.

According to the comment 14 and the test result above, move it to VERIFIED


CLI:
/usr/libexec/qemu-kvm \
    -enable-kvm \
    -m 2048 \
    -drive id=ide-drive,if=ide,discard=unmap,file=blkdebug::/home/test.qcow2,format=qcow2 \
    -drive id=virtio-drive,if=none,file=/home/win2012r2-virtio-blk.qcow2,format=qcow2 \
    -device virtio-blk-pci,drive=virtio-drive,id=virtio-blk-disk,bootindex=0 \
    -usb \
    -device usb-tablet  \
    -vnc :0 \
    -monitor stdio

Be sure, do not add other command lines, because:
"there is a chance that some options could involve a call to bdrv_drain_all() inside QEMU.  This function waits until all I/O requests have completed. That would hang QEMU (including the monitor)". ---->analysis from stefanha.
so you can not input "system_reset" in step 5, and this bug need system_reset to trigger.

Comment 17 errata-xmlrpc 2016-11-07 21:17:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html


Note You need to log in before you can comment on or make changes to this bug.