Description: On some dom0 hosts, booting Windows 2008 R2 64-bit domU results in nearly immediate blue-screen Windows crash (STOP 0x000007E). This occures with disk images that work normally on many different hosts. This also occurs if installing Windows 2008 R2 64-bit inside domU from virtual CDROM, using basic QEMU emulation: The initial installation goes well, but upon first reboot, system crashes in same manner. Thus far, the only commonality between the systems that don't work and that do are that the systems which *don't* work have a specific revision of Intel Xeon processor and similar BIOS. (Xeon Family 6, Model 23, Stepping 10). This may be unrelated, though. How reproducible: 100% on certain hosts. Steps to Reproduce: Install or boot a Windows 2008 R2 domU. Additional info: Here are some details from an affected test host: Kernel: 2.6.18-92.1.13.el5.a3finerscheduler_msi_backport.38025xen #1 SMP x86_64 x86_64 x86_64 GNU/Linux xm info: release : 2.6.18-92.1.13.el5.a3finerscheduler_msi_backport.38025xen version : #1 SMP Wed Sep 2 06:42:46 SAST 2009 machine : x86_64 nr_cpus : 4 nr_nodes : 1 sockets_per_node : 1 cores_per_socket : 4 threads_per_core : 1 cpu_mhz : 2666 hw_caps : bfebfbff:20100800:00000000:00000140:040ce3bd:00000000:00000001 total_memory : 16382 free_memory : 12535 node_to_cpu : node0:0-3 xen_major : 3 xen_minor : 1 xen_extra : .2-92.1.13.el5. xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : unavailable cc_compiler : gcc version 4.1.1 20070105 (Red Hat 4.1.1-52) cc_compile_by : builder cc_compile_domain : ec2.internal cc_compile_date : Wed Sep 2 06:41:30 SAST 2009 xend_config_format : 2 cpuinfo: vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz stepping : 10 cpu MHz : 2666.760 cache size : 6144 KB physical id : 3 siblings : 1 core id : 0 cpu cores : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm bogomips : 6671.26 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: # lspci | sort 00:00.0 Host bridge: Intel Corporation 5100 Chipset Memory Controller Hub (rev 90) 00:04.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x16 Port 4-7 (rev 90) 00:05.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x4 Port 5 (rev 90) 00:06.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x4 Port 6 (rev 90) 00:07.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x4 Port 7 (rev 90) 00:10.0 Host bridge: Intel Corporation 5100 Chipset FSB Registers (rev 90) 00:10.1 Host bridge: Intel Corporation 5100 Chipset FSB Registers (rev 90) 00:10.2 Host bridge: Intel Corporation 5100 Chipset FSB Registers (rev 90) 00:11.0 Host bridge: Intel Corporation 5100 Chipset Reserved Registers (rev 90) 00:13.0 Host bridge: Intel Corporation 5100 Chipset Reserved Registers (rev 90) 00:15.0 Host bridge: Intel Corporation 5100 Chipset DDR Channel 0 Registers (rev 90) 00:16.0 Host bridge: Intel Corporation 5100 Chipset DDR Channel 1 Registers (rev 90) 00:19.0 Ethernet controller: Intel Corporation 82566DM-2 Gigabit Network Connection (rev 02) 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02) 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02) 00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02) 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02) 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92) 00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02) 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02) 06:00.0 Ethernet controller: Intel Corporation 82573V Gigabit Ethernet Controller (Copper) (rev 03) 07:03.0 VGA compatible controller: ASPEED Technology, Inc. AST2000 BIOS Information Vendor: Dell Computer Corporation Version: S45_3A20 System Information Manufacturer: Dell Product Name: DCS CS24-SC Base Board Information Manufacturer: Dell Product Name: S45 From http://aumha.org/a/stop.htm: 0x0000007E: SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (Click to consult the online MSDN article.) A system thread generated an exception which the error handler did not catch. There are numerous individual causes for this problem, including hardware incompatibility, a faulty device driver or system service, or some software issues. and then: http://msdn.microsoft.com/en-gb/library/ms795746.aspx Cause The SYSTEM_THREAD_EXCEPTION_NOT_HANDLED bug check is a very common bug check. To interpret it, you must identify which exception was generated. Common exception codes include the follwoing: * 0x80000002: STATUS_DATATYPE_MISALIGNMENT indicates an unaligned data reference was encountered. * 0x80000003: STATUS_BREAKPOINT indicates a breakpoint or ASSERT was encountered when no kernel debugger was attached to the system. * 0xC0000005: STATUS_ACCESS_VIOLATION indicates a memory access violation occurred. RHEL 5.4 has been tried with no success on these hardware.
0xC0000096 is a privileged instruction exception, so there is some hope of fixing the issue. Can you get a crash dump of the system after it gets the BSOD? Testing the hypervisor should not be necessary, I was informed that the bug indeed affects only the 32-bit PAE kernel.
There is a kbase for Windows Server 2008 R2-based about this: http://support.microsoft.com/kb/974598 -- Copy&Paste below for completeness: -- Assume that you enable the Hyper-V role on a computer that is running Windows Server 2008 R2. You restart the computer after you enable the Hyper-V role. However, you receive the following Stop error message during the restart operation: Stop 0x0000007E (ffffffffc0000096, parameter2, parameter3, parameter4) SYSTEM_THREAD_EXCEPTION_NOT_HANDLED Notes: * The parameters in these Stop error messages may vary, depending on the actual configuration. * The symptoms of a Stop error may vary, depending on your computer's system failure options. For example, the computer may restart when a Stop error occurs. Cause: This problem occurs because the system uses a C-state that is supported by the processor. However, the C-state is not supported by Hyper-V. Resolution: To resolve this problem, follow these steps: 1. Disable Processor Virtualization in the BIOS. 2. Start the computer normally. 3. Apply this hotfix and then restart the computer. Status: Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section. Workaround: Important This section, method, or task contains steps that tell you how to modify the registry. However, serious problems might occur if you modify the registry incorrectly. Therefore, make sure that you follow these steps carefully. For added protection, back up the registry before you modify it. Then, you can restore the registry if a problem occurs. For more information about how to back up and restore the registry, click the following article number to view the article in the Microsoft Knowledge Base: 322756 (http://support.microsoft.com/kb/322756/ ) How to back up and restore the registry in Windows To work around this problem, follow these steps: 1. Disable Processor Virtualization in the BIOS. 2. Start the computer normally. 3. Open an elevated command prompt, and then run the following command: reg add HKLM\System\CurrentControlSet\Control\Processor /v Capabilities /t REG_DWORD /d 0x0007E066 4. Restart the computer. This workaround adds a registry entry that disables the C2 state and the C3 state.
Event posted on 10-22-2009 04:46am EDT by kentf Here we go: The XSAVE/XRESTOR feature is not supported in this Xen. The BIOS in some of these boxes is exposing the feature, and not on others - it appears that all of the boxes with 5430 parts have it, and some of the 5410s. This patch fixes HVM and PV, I think - it successfully boots Win 2008 Server R2 on a host that did not work before: diff -Naur xen/arch/x86/hvm/hvm.c xen.new/arch/x86/hvm/hvm.c --- xen/arch/x86/hvm/hvm.c 2009-10-22 01:08:55.000000000 -0700 +++ xen.new/arch/x86/hvm/hvm.c 2009-10-22 01:12:57.000000000 -0700 @@ -675,6 +675,7 @@ struct vcpu *v = current; clear_bit(X86_FEATURE_MWAIT & 31, ecx); + clear_bit(X86_FEATURE_XSAVE & 31, ecx); if ( vlapic_hw_disabled(vcpu_vlapic(v)) ) clear_bit(X86_FEATURE_APIC & 31, edx); diff -Naur xen/arch/x86/hvm/vmx/vmx.c xen.new/arch/x86/hvm/vmx/vmx.c --- xen/arch/x86/hvm/vmx/vmx.c 2009-10-22 01:08:55.000000000 -0700 +++ xen.new/arch/x86/hvm/vmx/vmx.c 2009-10-22 01:16:48.000000000 -0700 @@ -1249,6 +1249,8 @@ */ boot_cpu_data.x86_capability[4] = cpuid_ecx(1); + clear_bit(X86_FEATURE_XSAVE, &boot_cpu_data.x86_capability); + if ( !test_bit(X86_FEATURE_VMXE, &boot_cpu_data.x86_capability) ) return 0; diff -Naur xen/arch/x86/traps.c xen.new/arch/x86/traps.c --- xen/arch/x86/traps.c 2009-10-22 01:08:55.000000000 -0700 +++ xen.new/arch/x86/traps.c 2009-10-22 01:14:52.000000000 -0700 @@ -615,6 +615,7 @@ clear_bit(X86_FEATURE_SEP, &d); if ( !IS_PRIV(current->domain) ) clear_bit(X86_FEATURE_MTRR, &d); + clear_bit(X86_FEATURE_XSAVE % 32, &c); } else if ( regs->eax == 0x80000001 ) { diff -Naur xen/include/asm-x86/cpufeature.h xen.new/include/asm-x86/cpufeature.h --- xen/include/asm-x86/cpufeature.h 2007-12-06 09:48:39.000000000 -0800 +++ xen.new/include/asm-x86/cpufeature.h 2009-10-21 23:24:14.000000000 -0700 @@ -82,6 +82,7 @@ #define X86_FEATURE_CID (4*32+10) /* Context ID */ #define X86_FEATURE_CX16 (4*32+13) /* CMPXCHG16B */ #define X86_FEATURE_XTPR (4*32+14) /* Send Task Priority Messages */ +#define X86_FEATURE_XSAVE (4*32+26) /* XSAVE/XRESTOR feature set */ /* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */ #define X86_FEATURE_XSTORE (5*32+ 2) /* on-CPU RNG present (xstore insn) */ This event sent from IssueTracker by jabrown issue 354327
We've just put a similar patch into RHEL-5.5 kernel, that should also fix the issue. Would it be possible to try out the kernel here: http://people.redhat.com/dzickus/el5/170.el5/ Or at least try out the patch entitled "xen-mask-out-xsave-for-hvm-guest"? Thanks, Chris Lalancette