Bug 477247

Summary: KVM initialization function vmx_check_processor_compat() needs to print more diagnostics when CPUs not compatible
Product: [Fedora] Fedora Reporter: Nathan Watson <nfwatson>
Component: kvmAssignee: Glauber Costa <gcosta>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 10CC: berrange, clalance, gcosta, markmc, nfwatson, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-18 07:20:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
output of "cat /proc/cpuinfo", demonstrates all 8 cores (2CPUx4core) have vmx and otherwise should be compatible none

Description Nathan Watson 2008-12-19 20:46:41 UTC
Description of problem:

 * was using single Xeon quad-core CPU to run KVM virtual machines
   (under qemu-kvm) ... apparently Fedora 10 ended up using
   real Intel-VT hardware virtualization in this scenario
 * installed a second Xeon quad-core CPU to add capacity
 * both CPUs are identical Intel model, all 8 cores (2CPUx4core)
   report this in "cat /proc/cpuinfo" ...
     "model name : Intel(R) Xeon(R) CPU E5310 @ 1.60GHz"
 * all 8 cores report the exact same 'flags' field in the
   "cat /proc/cpufino" output, and all cores contain
   the 'vmx' flag
 * noticed extreme slowdown in virtual machine performance
 * noticed in /var/log/libvirt/qemu/VIRTUAL_MACHINE_NAME.log
   the new output that wasn't present before introducing
   the second CPU ... 
     "open /dev/kvm: No such file or directory"
     "Could not initialize KVM, will disable KVM support"
     ... and apparently qemu-kvm then ends up calling
     or using qemu-system-x86_64 to do full hardware
     emulation rather than using the raw hardware to run
     these virtual machines
 * dug further in logs and noticed the following lines
   in 'dmesg' and '/var/log/messages' that were not
   present before installing the product:
     "kvm: CPU 0 feature inconsistency!"
     "kvm: CPU 1 feature inconsistency!"
 * both these CPUs are Intel Xeon quad-core E5310 1.60GHz,
   perhaps not from the same lot, but they gotta be compatible

Version-Release number of selected component (if applicable):

  * Fedora 10, up-to-date as of 2008/12/19
  * kernel-2.6.27.7-134.fc10.x86_64
  * kvm-74-6.fc10.x86_64
  * ... if other version info required, please let me know

How reproducible:

  * always

Steps to Reproduce:
1. Install 2 Intel Xeon quad-core CPUs E5310
2. Install Fedora 10 + KVM (+ libvirt/qemu/etc.)
3. Set up virtual machines
4. notice that Kernel/KVM-CPU-compatibility-checking code
   complains
5. notice that qemu-kvm does not find KVM support
6. remove one Xeon CPU and problem goes away
  
Actual results:

KVM support disabled

Expected results:

should be able to use all 8 cores w/ KVM and Intel-VT/vmx support
to run virtual machines, etc.

Additional info:

 * attaching output of "/proc/cpuinfo"

Comment 1 Nathan Watson 2008-12-19 21:17:07 UTC
By the way, Intel-VT is enabled in the BIOS ... the BIOS settings
are identical now with 2 CPUs as they were when there was
only 1 CPU.

Glauber:  tenha um bom dia

Comment 2 Nathan Watson 2008-12-19 21:25:56 UTC
Created attachment 327500 [details]
output of "cat /proc/cpuinfo", demonstrates all 8 cores (2CPUx4core) have vmx and otherwise should be compatible

apparently I forgot to attach the output of "cat /proc/cpuinfo" ...
here it is

Comment 3 Nathan Watson 2008-12-19 23:50:06 UTC
I pulled the source for kvm-74-6.fc10.src.rpm (the version
I'm using).  In tracing the source code that emits
the kernel/kernel-module-level error messages (NOTE:  I am
not a kernel hacker, I know minimal stuff about this),
I see something that's confusing:

 * the error message "kvm: CPU %d feature inconsistency!"
   is generated in file .../x86/vmx.c in function
   vmx_check_processor_compat()
 * the function vmx_check_processor_compat() compares
   a LOCAL in-function structure "struct vmcs_config vmcs_conf"
   against a GLOBAL structure "static struct vmcs_config"
 * the comparison is done with a straightforward
   (0 == memcmp(..., ........))
 * the GLOBAL "static struct vmcs_config" is initialized once
   by a call to hardware_setup() in the same file,
   probably in context of one of the two physical CPUs,
   probably long before function vmx_check_processor_compat()
   is called for each CPU (QUESTION:  is it possible this
   function is called in context of each core and we only
   see the info for 2 of the 8 available cores before KVM gives up?)
 * both the hardware_setup() and vmx_check_processor_compat()
   use the same function setup_vmcs_config() to initialize their
   respective GLOBAL and/or LOCAL copies of the function
 * presumably the GLOBAL copy and ONE of the LOCAL copies
   of the "vmcs_config" structure will have been called
   in the context of the same CPU
 * ... AND YET, THE ERROR OCCURS NOT ONLY FOR THE NON-IDENTICAL
   CPUs, BUT THE ERROR SHOWS UP FOR THE SAME CPU ALSO

This leads me to suspect that the function setup_vmcs_config()
is not initializing the structure properly in a way such that
after relevant details are filled in, a (0 == memcmp(...)) will
lead to result '0'.

Comment 4 Nathan Watson 2008-12-20 00:00:28 UTC
... to further elaborate on my last comment, the KVM module
initialization appears to do the following (with a little speculation):

 * initialize GLOBAL_vmcs_config in context of CPU #0
   in function hardware_setup()
 * ... later, for CPU #0:
   * initialize LOCAL_vmcs_config in context of CPU #0
   * compare GLOBAL_vmcs_config with LOCAL_vmcs_config
   * leads to ERROR, "kvm: CPU 0 feature inconsistency!"
 * ... and then (or else in parallel), for CPU #1
   * initialize LOCAL_vmcs_config in context of CPU #1
   * compare GLOBAL_vmcs_config with LOCAL_vmcs_config
   * leads to ERROR, "kvm: CPU 1 feature inconsistency!"

I can see a situation where the second error message would
come about, but I have a hard time seeing why the first one,
about CPU #0, printing out ... they both happen in context
of CPU #0 (or one of the CPUs, not sure which).  I'd expect
setup_vmcs_config() to generate identical structures
for both of them.

Comment 5 Nathan Watson 2008-12-20 00:09:22 UTC
i don't know how to build/deploy kernel modules, now I guess
is time to learn.

i'm not sure whether all fields that make it into the
"struct vmcs_config" are initialized correctly.  even if they
are, i'm also not sure what ALIGNMENT CONSIDERATIONS are
on x86_64 platform ... since there's a (0 == memcmp(...)) going
on it could be that some gaps exist in the structure between
elements and that even though for all fields in ...

static struct vmcs_config {
        int size;
        int order;
        u32 revision_id;
        u32 pin_based_exec_ctrl;
        u32 cpu_based_exec_ctrl;
        u32 cpu_based_2nd_exec_ctrl;
        u32 vmexit_ctrl;
        u32 vmentry_ctrl;
} vmcs_config;

... a comparison would yield equality, the overall
(0 == memcmp()) MIGHT FAIL!!!

Just some suggestions for whoever's looking into this.

I'll keep probing.

Of course, it could be that the two separate
"Intel Xeon quad-core E5310 1.6GHz" chips are incompatible
... if that's so I should just shoot myself.

Comment 6 Nathan Watson 2008-12-22 05:00:03 UTC
Well, I guess I need to shoot myself.  I patched the in-kernel
KVM code to list values for all fields of the "global" and
"local" vmcs_config structures that failed equality within
function vmx_check_processor_compat().  The result:

  kernel: kvm: CPU 0 feature inconsistency! _MODIFIED_BY_NFW_TO_SHOW_FIELDS_
  kernel: kvm:  kvm_NFW_vmcs_config_version:  GLOBAL
  kernel: kvm:            size                       = 2048
  kernel: kvm:            order                      = 0
  kernel: kvm:            revision_id                = 11
  kernel: kvm:            pin_based_exec_ctrl        = 0x3f
  kernel: kvm:            cpu_based_exec_ctrl        = 0x96a1e1fa
  kernel: kvm:            cpu_based_2nd_exec_ctrl    = 0x1
  kernel: kvm:            vmexit_ctrl                = 0x36fff
  kernel: kvm:            vmentry_ctrl               = 0x11ff
  kernel: kvm:  kvm_NFW_vmcs_config_version:  LOCAL_
  kernel: kvm:            size                       = 1024
  kernel: kvm:            order                      = 0
  kernel: kvm:            revision_id                = 7
  kernel: kvm:            pin_based_exec_ctrl        = 0x1f
  kernel: kvm:            cpu_based_exec_ctrl        = 0x16a1e1fa
  kernel: kvm:            cpu_based_2nd_exec_ctrl    = 0x0
  kernel: kvm:            vmexit_ctrl                = 0x36fff
  kernel: kvm:            vmentry_ctrl               = 0x11ff

Looks like my two Intel Xeon Quad-Core E5310 processors
don't match up.

... from http://www.intel.com/support/motherboards/server/sb/CS-022346.htm#5300
it looks like there are SEVERAL VERSIONS of E5310, including:

  E5310 SLACB 	1.60 GHz 1066 MHz 8 MB 	B3 80 	
  E5310 SL9XR 	1.60 GHz 1066 MHz 8 MB 	B3 80 	
  E5310 SLAEM 	1.60 GHz 1066 MHz 8 MB 	G0 80

Looks like I purchased an "older" processor a long time ago,
either the "SLACB" or "SL9XR", and paired it with a newer
"SLAEM" processor.  Yeah!  I put an ox and a donkey together
in the yoke, it just don't work.

Comment 7 Nathan Watson 2008-12-23 01:02:33 UTC
The complaint from the KVM in-kernel initialization looks
like it's valid.

It would be nice when there's a "feature inconsistency" between
process if the KVM in-kernel initialization printed out the
full field content from the two "" compared
in function "vmx_check_processor_compat()".

I'm changing summary from "2 Identical CPUs not recognized as
Intel-VT & KVM-compatible" to "KVM initization function vmx_check_processor_compat() needs to print more diagnostics when CPUs not compatible" and downgrading the severity.

Something along the lines of what's in comment #6 would be nice.

Comment 8 Glauber Costa 2008-12-23 13:08:11 UTC
Would you mind submitting a patch for it upstream? If it's debated and accepted, we'd be more than happy to include it in upcoming versions of Fedora.

Comment 9 Bug Zapper 2009-11-18 10:31:30 UTC
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 10 Bug Zapper 2009-12-18 07:20:29 UTC
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.