Bug 922923

Summary: div64_u64 crash from intel pstate on i686-PAE Sandy Bridge
Product: [Fedora] Fedora Reporter: Josh Stone <jistone>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: dirk.brandewie, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-20 17:33:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Josh Stone 2013-03-18 19:21:42 UTC
Description of problem:
My virtual machine crashes immediately on booting 3.9 i686-PAE kernels.

Version-Release number of selected component (if applicable):
kernel-PAE-3.9.0-0.rc2.git0.3.fc19.i686

I also have a rawhide VM with the same symptom:
kernel-PAE-3.9.0-0.rc3.git0.3.fc20.i686

How reproducible:
100%

Steps to Reproduce:
1. Set the virt-manager CPU config to SandyBridge.
2. Install kernel-PAE-3.9*, reboot. :)

Actual results:
Panic at EIP div64_u64+02d/0x160.

I can't see the initial details of the panic, as the stack trace pushed it off the screen.  And my serial console has no output, so I'm guessing this is happening too early in the boot.  I can grab a screencap of the console if you like.  The highest point I can see is intel_pstate_timer_func+0x1e5/0x420.

Expected results:
Normal boot.

Additional info:
My host has a Sandy Bridge processor, and I have set virt-manager to use SandyBridge features for all my VMs.  I read that this new pstate driver is just for SB, so I tried setting the VMs down to Westmere instead, and that boots just fine.

My VM is set for 2 vcpus.  In case this is some SMP race in initializing pstate, I tried with only 1 vcpu, but that still panics.

This only panics on my i686 VMs; x86_64 boots the similar kernel as SB just fine, even showing several "Intel pstate controlling: cpu N" in dmesg.

Comment 1 Dirk Brandewie 2013-03-19 01:04:50 UTC
It looks like this configuration is passing the test for pstate registers having non-zero values which is where the x86_64 build was falling over.  From inspection it looks like the MPERF MSR is not advancing.  Could you try the following patch it checks for all the MSRs tht the driver uses have something rational in them.



commit 866111646f2c5d4c6c25e2bb97f5c61c3992defb
Author: Dirk Brandewie <dirk.brandewie>
Date:   Mon Mar 18 16:55:02 2013 -0700

    cpufreq/intel_pstate: Add function to check that all MSR's are valid
    
    Some VMs seem to try to implement some MSRs but not all the registers
    the driver needs.  Check to make sure all the MSR that we need are
    available. If any of the required MSRs are not available refuse to
    load.
    
    Signed-off-by: Dirk Brandewie <dirk.brandewie>
---
 drivers/cpufreq/intel_pstate.c |   26 ++++++++++++++++++++++++++
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index f6dd1e7..cd9c5f4 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -752,6 +752,29 @@ static struct cpufreq_driver intel_pstate_driver = {
 
 static int __initdata no_load;
 
+static int intel_pstate_msrs_not_valid(void)
+{
+	/* Check that all the msr's we are using are valid. */
+	u64 aperf, mperf, tmp;
+
+	rdmsrl(MSR_IA32_APERF, aperf);
+	rdmsrl(MSR_IA32_MPERF, mperf);
+
+	if (!intel_pstate_min_pstate() ||
+		!intel_pstate_max_pstate() ||
+		!intel_pstate_turbo_pstate())
+		return -ENODEV;
+
+	rdmsrl(MSR_IA32_APERF, tmp);
+	if (!(tmp - aperf))
+		return -ENODEV;
+
+	rdmsrl(MSR_IA32_MPERF, tmp);
+	if (!(tmp - mperf))
+		return -ENODEV;
+
+	return 0;
+}
 static int __init intel_pstate_init(void)
 {
 	int cpu, rc = 0;
@@ -764,6 +787,9 @@ static int __init intel_pstate_init(void)
 	if (!id)
 		return -ENODEV;
 
+	if (intel_pstate_msrs_not_valid())
+		return -ENODEV;
+
 	pr_info("Intel P-state driver initializing.\n");
 
 	all_cpu_data = vmalloc(sizeof(void *) * num_possible_cpus());

Comment 2 Josh Stone 2013-03-19 16:36:33 UTC
(In reply to comment #1)
> It looks like this configuration is passing the test for pstate registers
> having non-zero values which is where the x86_64 build was falling over. 
> From inspection it looks like the MPERF MSR is not advancing.  Could you try
> the following patch it checks for all the MSRs tht the driver uses have
> something rational in them.

Works for me - i686 SB now boots without activating the pstate driver.  Actually, x86_64 SB also skips the pstate driver now, so the fact that it activated before may have been more of a fluke.  I suppose it makes sense that this stuff isn't really needed for vcpus, as the host will deal with it.

Comment 3 Dave Jones 2013-03-20 17:33:50 UTC
applied in kernel-3.9.0-0.rc3.git0.4.fc19