Bug 808346
Summary: | windows2k8-64 BSOD on boot with -cpu SandyBridge or Westmere & vPMU enabled | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Miya Chen <michen> | ||||
Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 6.2 | CC: | acathrow, areis, bsarathy, dyasny, gleb, juzhang, mkenneth, rhod, shuang, tburke, virt-maint, vrozenfe, yvugenfi | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-04-12 18:36:04 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Miya Chen
2012-03-30 08:04:56 UTC
Created attachment 573910 [details]
screen shot of BSOD
Are you running on a sandyBridge Host? Can you re-try w/ -cpu sandyBridge,enforce Can you re-try w/ -cpu sandyBridge,-xsave (In reply to comment #3) > Are you running on a sandyBridge Host? yes, it is a SandyBridge host > Can you re-try w/ -cpu sandyBridge,enforce Tried, the same BSOD > Can you re-try w/ -cpu sandyBridge,-xsave Tried, the same BSOD So, exception code is 0xC0000096 STATUS_PRIVILEGED_INSTRUCTION. This is where the crash happens: 0xfffff8000a8450d0: push %rsp 0xfffff8000a8450d1: and $0x24,%al 0xfffff8000a8450d3: jae 0xfffff8000a8450d9 0xfffff8000a8450d5: movb $0x0,-0x1(%rcx) 0xfffff8000a8450d9: add $0x38,%rcx 0xfffff8000a8450dd: sub $0x1,%r8 0xfffff8000a8450e1: jne 0xfffff8000a8450bc 0xfffff8000a8450e3: jmp 0xfffff8000a8450fd 0xfffff8000a8450e5: xor %ecx,%ecx 0xfffff8000a8450e7: callq *-0x15bf5(%rip) # 0xfffff8000a82f4f8 0xfffff8000a8450ed: xor %edx,%edx 0xfffff8000a8450ef: lea 0x20(%rsp),%r8 0xfffff8000a8450f4: lea 0xa(%rdx),%ecx 0xfffff8000a8450f7: callq *-0x15c05(%rip) # 0xfffff8000a82f4f8 0xfffff8000a8450fd: mov -0xbe8c(%rip),%r9d # 0xfffff8000a839278 0xfffff8000a845104: xor %r8d,%r8d 0xfffff8000a845107: test %r9d,%r9d 0xfffff8000a84510a: je 0xfffff8000a84512e This looks like the CPUID checking code. 0xA (set on %ecx) is probably the CPUID leaf being checked. I will assume that -0xbe8c(%rip) is where the CPUID EAX result is written. This is the content of the memory at that address: fffff8000a839278: 0x00000004 0x00000003 0x00000000 0x00000000 fffff8000a839288: 0x00000000 0x00000000 0x00000000 0x00000000 I don't know if this is the value seen by that code, because I am looking at the memory _after_ Windows already crashed. 0xfffff8000a84510c: xor %edx,%edx 0xfffff8000a84510e: lea 0x186(%r8),%ecx 0xfffff8000a845115: shr $0x20,%rdx 0xfffff8000a845119: xor %eax,%eax 0xfffff8000a84511b: wrmsr Here it's trying to write to MSR 0x186 (PerfEvtSel0). It is available only if CPUID.0AH:EAX[15:8] > 0, but leaf 0xA _is_ available on the rhel6.3.0 machine-type. Now we have to check why/if KVM is raising an exception when the guest tries to write to that MSR. I just tested using -M rhel6.2.0 (that doesn't have the CPU monitoring leaf available), and it works as expected. It also boots if using -M rhel6.3.0 -cpu SandyBridge,level=9, to disable the CPUID 0xA leaf. We can't set level=9 on SandyBridge, though, as leaf 0xD is necessary for XSAVE. Gleb, what do you think? Should we aim to get vPMU working smoothly on SandyBridge, or should we disable PMU on SandyBridge to avoid risk? Note that this bug affects Westmere too (-M rhel6.3.0 -cpu Westmere), as it has level=11. (In reply to comment #8) > I just tested using -M rhel6.2.0 (that doesn't have the CPU monitoring leaf > available), and it works as expected. > > It also boots if using -M rhel6.3.0 -cpu SandyBridge,level=9, to disable the > CPUID 0xA leaf. > > We can't set level=9 on SandyBridge, though, as leaf 0xD is necessary for > XSAVE. > > Gleb, what do you think? Should we aim to get vPMU working smoothly on > SandyBridge, or should we disable PMU on SandyBridge to avoid risk? From commend #1 the kernel is 2.6.32-221.el6.x86_64. vMPU was introduce in kernel-2.6.32-245.el6. The configuration is not valid. It is not rhel6.3. But we shouldn't return garbage in leaf 0xA regardless. In that setup leaf 0xA should return zeroes in all registers, if it is not it's the bug that should be fixed. (In reply to comment #10) > From commend #1 the kernel is 2.6.32-221.el6.x86_64. vMPU was introduce in > kernel-2.6.32-245.el6. The configuration is not valid. It is not rhel6.3. But > we shouldn't return garbage in leaf 0xA regardless. In that setup leaf 0xA > should return zeroes in all registers, if it is not it's the bug that should be > fixed. Please retest with current packages. By looking at the kernel-2.6.32-221.el6 source code, it looks like KVM get_supported_cpuid() incorrectly returns the host CPU CPUID bits completely unmodified on leaf 0xA. I suppose the 6.2.0 kernel (-220.el6) also does that. So this is a bug in the 6.2 kernel that will cause issues if using the 6.3 qemu-kvm binary. Is it a bug we will want to fix on 6.2.z, or is it an use case we don't support? (In reply to comment #11) > (In reply to comment #10) > > From commend #1 the kernel is 2.6.32-221.el6.x86_64. vMPU was introduce in > > kernel-2.6.32-245.el6. The configuration is not valid. It is not rhel6.3. But > > we shouldn't return garbage in leaf 0xA regardless. In that setup leaf 0xA > > should return zeroes in all registers, if it is not it's the bug that should be > > fixed. > > Please retest with current packages. Tried with 2.6.32-264.el6.x86_64, windows guest can boot up successfully. I believe running RHEL6.3's qemu with the RHEL6.2 kernel is not supported, so I'm closing this bug. Please reopen if I'm wrong. |