Bug 607947

Summary: RHEL 5.5 x86_64 guest OS kernel panic at cpuid4_cache_lookup() for AMD (Magny-Cours and Lisbon) platforms
Product: Red Hat Enterprise Linux 5 Reporter: Lin Avator <lavator>
Component: kernel-xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: low    
Version: 5.5CC: drjones, imusayev, sbarcomb, tru, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-25 13:12:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Description Flags
xen-4.png: serial console output on kernel panic. none

Description Lin Avator 2010-06-25 09:28:51 UTC
Description of problem:

  On our AMD (Magny-Cours and Lisbon) platforms, we got kernel panic at cpuid4_cache_lookup() during installing RHEL 5.5 X86_64 guest OS on Xen .

  However, it is OK if we install RHEL 5.3 / 5.4 X86_64 guest OS on the same Xen server over AMD (Magny-Cours and Lisbon) platforms.

  On xen-unstable maillist, we got the following information from http://www.gossamer-threads.com/lists/xen/changelog/171037?do=post_view_threaded .
[xen-unstable] Remove CPUID4 emulation for AMD CPUs

Remove CPUID4 emulation for AMD CPUs 

The CPUID4 emulation code for AMD CPUs in intel_cacheinfo.c won't be 
executed. This emulation code was from upstream kernel, where CPUID4 
is used for cache information report in sysfs. But in Xen, this code 
path won't be executed on AMD CPUs. init_amd() uses 
display_cacheinfo() to find out CPU cache size instead.

  Besides, when we install RHEL 5.5 X86_64 guest OS on other virtualization server (e.g. Windows Hyper-V), it also got the same symptom at cpuid4_cache_lookup(). It seems not Xen-only issue. Hence we submit a bug report for this phenomenon. Is there any kernel parameters as quick work-around to install RHEL 5.5 X86_64 guest OS over AMD (Magny-Cours and Lisbon) virtualization platforms ?

Steps to Reproduce:

1. Install RHEL 5.5 X86_64 with Xen on AMD (Magny-Cours and Lisbon) platforms
2. Click button "Applications"/ "System Tools"/ "Virtual Machine Manager".
3. Click button "New".
4. Key in "Name".
5. Select "Fully virtualized".
6. Select "Local install media(ISO image or CDROM)" and choose "OS Type" and "OS Variant" for version RHEL 5.5
7. Choose "CD-ROM or DVD" to locate the install media.
8. Begin a series of steps to create virtual machine, kernel panic at cpuid4_cache_lookup() as shown in picture "xen-4.png"

Comment 1 Lin Avator 2010-06-25 09:33:21 UTC
Created attachment 426811 [details]
xen-4.png: serial console output on kernel panic.

Comment 2 Andrew Jones 2010-06-25 13:12:14 UTC
There's no way to avoid this code path for a quick work-around, but the first hunk of

5ddb83f [cpu] fix boot crash in 32-bit install on AMD cpus

which is

diff --git a/arch/i386/kernel/cpu/intel_cacheinfo.c b/arch/i386/kernel/cpu/intel
index 81c2c40..2f3328a 100644
--- a/arch/i386/kernel/cpu/intel_cacheinfo.c
+++ b/arch/i386/kernel/cpu/intel_cacheinfo.c
@@ -331,6 +331,10 @@ amd_check_l3_disable(int index, struct _cpuid4_info *this_l
             (boot_cpu_data.x86_mask  < 0x1)))
+       /* not in virtualized environments */
+       if (num_k8_northbridges == 0)
+               return;
        this_leaf->can_disable = true;
        this_leaf->l3_indices  = amd_calc_l3_indices();

should fix this (even though the patch says it's for 32-bit installs). A -197 or later kernel will have it.

I'll close this bug as currentrelease. You can reopen if this doesn't work for you.

Comment 3 ilya m. 2010-08-26 19:22:08 UTC

I would like to add that this issue is also seen on non-xen kernels. We have HP DL385G7 with AMD MagnyCours running VMWare/ESX Vsphere 4.0 u1. When attempting to deploy RHEL5.5 x86_64 via kickstart i get exactly the same crash (identical) as posted by Lin.

How do we retrofit this in current RHEL5.5 x86_64 builds? I would like to avoid re-compiling kernel as then it becomes unsupported by RH.


Comment 4 Andrew Jones 2010-08-27 09:25:56 UTC
Right, it's not xen specific, but rather a virt specific problem. When running in virt environment the northbridge isn't exposed to the guest, so bad things happen when attempting to use it. You shouldn't have to recompile the kernel to get that code, you just need to upgrade to a z-stream release >= 2.6.18-194.2.1.el5. You can go through your normal RH channels to get it.

Comment 5 ilya m. 2010-08-27 12:42:05 UTC

I don't see this bug mentioned anywhere in the change logs for all of the previos or current zstream kernels.

The kernel version you referenced fixes another issue with 32bit os amd 12core CPUs.

I could still try to use the newer kernel, but I need the stage2 files, since we want to leverage kickstart for deployment.


Comment 6 Andrew Jones 2010-08-27 13:51:43 UTC
5.5.z has this commit with the patch

1b8a7568 [cpu] fix boot crash in 32-bit install on AMD cpus

Just like I said in comment 2, even though it's talking about 32-bit boot problems, the code, i.e. the condition that returns in the absence of a northbridge is there, so it should work for your problem.  Please try one of these later kernels I've pointed to. If it doesn't work, then we can reopen this bug to try and figure out why, but I think it'll work.