Bug 1125037 - Windows 7 x86 occasionally hangs at boot with AHCI
Summary: Windows 7 x86 occasionally hangs at boot with AHCI
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm
Version: 7.2
Hardware: Unspecified
OS: Linux
Target Milestone: rc
: ---
Assignee: Vadim Rozenfeld
QA Contact: Virtualization Bugs
Depends On:
TreeView+ depends on / blocked
Reported: 2014-07-30 23:12 UTC by John Snow
Modified: 2015-08-11 12:26 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2015-08-10 17:47:48 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description John Snow 2014-07-30 23:12:13 UTC
Description of problem:
Occasionally, A fresh installation of Windows 7 x86 will hang at boot if using the AHCI device. This is known to happen from the en_windows_7_ultimate_x86_dvd_x15-65921.iso version of the operating system, though I haven't been able to reproduce via the RC1 version.

Version-Release number of selected component (if applicable):
Happens on qemu upstream at least as of v2.1.0-rc5.

How reproducible:
seldom - roughly 1/15

Steps to Reproduce:
1. Create a qcow2 image,

2. Install Windows 7 using a command line such this:
    x86_64-softmmu/qemu-system-x86_64 \
        -m 512 -enable-kvm -drive \
        id=disk,file=win32.qcow2,if=none \
        -device ahci,id=ahci -device ide-drive,drive=disk,bus=ahci.0 \
        -cdrom en_windows_7_ultimate_x86_dvd_x15-65921.iso

You can use all default settings.

3. Launch windows with a command line like:
    x86_64-softmmu/qemu-system-x86_64 \
        -m 512 -enable-kvm -drive \
        id=disk,file=${HOME}/img/win32.qcow2,if=none,cache=unsafe \
        -device ahci,id=ahci -device ide-drive,drive=disk,bus=ahci.0 -cdrom \
        ${HOME}/iso/en_windows_7_ultimate_x86_dvd_x15-65921.iso -snapshot

where -snapshot appears to make the bug easier to trigger.

Actual results:
In about 1/15 tries, Windows 7 will hang on the pulsing Windows 7 flag logo prior to loading the login screen. The hang will last from 10-20 seconds and will be quite noticeable. QEMU will be largely idle during this time.

Expected results:
The boot sequence up to the login screen should not exceed 2-3 seconds.

Additional info:
In this bug, Windows 7 will occasionally decide to reset the AHCI port (but not the AHCI HBA or the AHCI PCI device) that the IDE device is connected to. There is no noticeable or discernible error from the point of view of the AHCI register set, and the Windows 7 system log produces no errors of interest. Comparing traces of AHCI calls of successful boots to unsuccessful ones does not highlight any useful control flow differences. Several hundred command trace entries prior to the hang are identical.

In verbose windows boot mode, the hang occurs after all the drivers have been loaded, but before the login screen appears.
The hang occurs after windows has switched from native commands to NCQ commands.

It is not immediately evident for what reason W7 decides it must reset the state of the AHCI port, but something is clearly timing out. Perhaps an interrupt of interest is not getting posted and race conditions leave us to be unlucky sometimes.

During this hang period, there are no AHCI calls being made and the device is completely idle until the reset request is intercepted.

Comment 2 Vadim Rozenfeld 2014-08-04 01:11:45 UTC
XPERF can be used to capture a slow boot.

Comment 3 John Snow 2014-10-28 17:26:37 UTC
/Potentially/ fixed, see http://lists.gnu.org/archive/html/qemu-devel/2014-10/msg03093.html and the corresponding patch series.

Comment 5 Vadim Rozenfeld 2015-08-10 09:34:13 UTC
Is it still the case? or can we close this bug safely?


Comment 6 John Snow 2015-08-10 17:47:48 UTC
I believe the bug has been solved alongside a slew of AHCI fixes upstream since this was reported. At least, I haven't seen it in a while and neither has Michael Tsirkin.

The issue might still exist downstream, but since Q35 tech preview was removed from qemu-kvm, we can just change this to CLOSED WONTFIX.

The issue is either already fixed (2.3) or will be fixed (2.4) in a future release of qemu-kvm-rhev. It was likely fixed for upstream's 2.3 release, but there are possible contributing factors patched for the 2.4 release, too.

Note You need to log in before you can comment on or make changes to this bug.