Description of problem:
Occasionally, A fresh installation of Windows 7 x86 will hang at boot if using the AHCI device. This is known to happen from the en_windows_7_ultimate_x86_dvd_x15-65921.iso version of the operating system, though I haven't been able to reproduce via the RC1 version.
Version-Release number of selected component (if applicable):
Happens on qemu upstream at least as of v2.1.0-rc5.
seldom - roughly 1/15
Steps to Reproduce:
1. Create a qcow2 image,
2. Install Windows 7 using a command line such this:
-m 512 -enable-kvm -drive \
-device ahci,id=ahci -device ide-drive,drive=disk,bus=ahci.0 \
You can use all default settings.
3. Launch windows with a command line like:
-m 512 -enable-kvm -drive \
-device ahci,id=ahci -device ide-drive,drive=disk,bus=ahci.0 -cdrom \
where -snapshot appears to make the bug easier to trigger.
In about 1/15 tries, Windows 7 will hang on the pulsing Windows 7 flag logo prior to loading the login screen. The hang will last from 10-20 seconds and will be quite noticeable. QEMU will be largely idle during this time.
The boot sequence up to the login screen should not exceed 2-3 seconds.
In this bug, Windows 7 will occasionally decide to reset the AHCI port (but not the AHCI HBA or the AHCI PCI device) that the IDE device is connected to. There is no noticeable or discernible error from the point of view of the AHCI register set, and the Windows 7 system log produces no errors of interest. Comparing traces of AHCI calls of successful boots to unsuccessful ones does not highlight any useful control flow differences. Several hundred command trace entries prior to the hang are identical.
In verbose windows boot mode, the hang occurs after all the drivers have been loaded, but before the login screen appears.
The hang occurs after windows has switched from native commands to NCQ commands.
It is not immediately evident for what reason W7 decides it must reset the state of the AHCI port, but something is clearly timing out. Perhaps an interrupt of interest is not getting posted and race conditions leave us to be unlucky sometimes.
During this hang period, there are no AHCI calls being made and the device is completely idle until the reset request is intercepted.
XPERF can be used to capture a slow boot.
/Potentially/ fixed, see http://lists.gnu.org/archive/html/qemu-devel/2014-10/msg03093.html and the corresponding patch series.
Is it still the case? or can we close this bug safely?
I believe the bug has been solved alongside a slew of AHCI fixes upstream since this was reported. At least, I haven't seen it in a while and neither has Michael Tsirkin.
The issue might still exist downstream, but since Q35 tech preview was removed from qemu-kvm, we can just change this to CLOSED WONTFIX.
The issue is either already fixed (2.3) or will be fixed (2.4) in a future release of qemu-kvm-rhev. It was likely fixed for upstream's 2.3 release, but there are possible contributing factors patched for the 2.4 release, too.