Bug 1184303 - libvirt 1.2.11 CPU detection fails for Nehalem
Summary: libvirt 1.2.11 CPU detection fails for Nehalem
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-01-21 02:21 UTC by devsk
Modified: 2016-11-18 16:18 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-04-26 05:31:36 UTC
Embargoed:


Attachments (Terms of Use)
debug log from the virt-manager --debug (11.20 KB, text/plain)
2015-01-21 15:43 UTC, devsk
no flags Details
The XML of the domain (3.79 KB, text/html)
2015-01-21 15:48 UTC, devsk
no flags Details

Description devsk 2015-01-21 02:21:58 UTC
Description of problem:

I get the following error when I select the CPU to be Nehalem manually from the drop down in the virt-manager:

libvirt.py", line 1007, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: unsupported configuration: guest and host CPU are not compatible: Host CPU does not provide required features: syscall

If I run the VM using qemu command line manually with -cpu Nehalem, it works perfectly.

/proc/cpuinfo in the host suggests that syscall is supported feature. The host CPU is in fact Nehalem i7 920.

Version-Release number of selected component (if applicable):
1.2.11

How reproducible:
Everytime.

Steps to Reproduce:
1. Create a VM with Nehalem CPU (or kvm64 or emu64, anything other than "hypervisor default")
2. Boot
3.

Actual results:

It fails to boot with above error.

Expected results:

It should boot.

Additional info:

Note that when I provide it "hypervisor default" option, it boots the VM fine and the -cpu qemu64 is used, which is sort of contrary to what choosing "qemu64" directly does (it does not boot). So, there is internal libvirt mishap somewhere.

Comment 1 Ján Tomko 2015-01-21 09:11:02 UTC
Can you provide the domain XML generated by 'virt-manager --debug' and the output of 'cpuid --raw -1'?

On the qemu command line, does -cpu Nehalem,enforce work? By default qemu silently drops unsupported features.

Comment 2 devsk 2015-01-21 15:41:31 UTC
qemu runs fine with enforce:

$ qemu-system-x86_64 -cpu Nehalem,enforce -machine accel=kvm -boot d -cdrom ubuntu-13.04-desktop-i386.iso -m 1000

I don't see any messages from qemu on the terminal and the machine boots and guest shows Nehalem 9xx in /proc/cpuinfo.

CPUID:

$ cpuid --raw -1
CPU:
   0x00000000 0x00: eax=0x0000000b ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69
   0x00000001 0x00: eax=0x000106a5 ebx=0x05100800 ecx=0x0098e3bd edx=0xbfebfbff
   0x00000002 0x00: eax=0x55035a01 ebx=0x00f0b2e4 ecx=0x00000000 edx=0x09ca212c
   0x00000003 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000004 0x00: eax=0x1c004121 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
   0x00000004 0x01: eax=0x1c004122 ebx=0x00c0003f ecx=0x0000007f edx=0x00000000
   0x00000004 0x02: eax=0x1c004143 ebx=0x01c0003f ecx=0x000001ff edx=0x00000000
   0x00000004 0x03: eax=0x1c03c163 ebx=0x03c0003f ecx=0x00001fff edx=0x00000002
   0x00000005 0x00: eax=0x00000040 ebx=0x00000040 ecx=0x00000003 edx=0x00001120
   0x00000006 0x00: eax=0x00000003 ebx=0x00000002 ecx=0x00000001 edx=0x00000000
   0x00000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000008 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x00000009 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x0000000a 0x00: eax=0x07300403 ebx=0x00000044 ecx=0x00000000 edx=0x00000603
   0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000005
   0x0000000b 0x01: eax=0x00000004 ebx=0x00000008 ecx=0x00000201 edx=0x00000005
   0x80000000 0x00: eax=0x80000008 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000001 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000001 edx=0x28100000
   0x80000002 0x00: eax=0x65746e49 ebx=0x2952286c ecx=0x726f4320 edx=0x4d542865
   0x80000003 0x00: eax=0x37692029 ebx=0x55504320 ecx=0x20202020 edx=0x20202020
   0x80000004 0x00: eax=0x30323920 ebx=0x20402020 ecx=0x37362e32 edx=0x007a4847
   0x80000005 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000006 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x01006040 edx=0x00000000
   0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000100
   0x80000008 0x00: eax=0x00003024 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80860000 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000005
   0xc0000000 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000005

> Can you provide the domain XML generated by 'virt-manager --debug'

The output generated by the above is attached.

Comment 3 devsk 2015-01-21 15:43:53 UTC
Created attachment 982381 [details]
debug log from the virt-manager --debug

This is run when I change the processor from "hypervisor default" to "Nehalem" manually.

Any idea what n270 is? That's what's captured in the qemu:///system capabilities.

Comment 4 devsk 2015-01-21 15:48:25 UTC
Created attachment 982383 [details]
The XML of the domain

I am also attaching the full XML of the domain in case its needed.

Comment 5 devsk 2015-01-21 16:13:09 UTC
Interesting. When using the "copy host", I get n270 (same as capabilities), and as per qemu -cpu help:

x86             n270  Intel(R) Atom(TM) CPU N270   @ 1.60GHz

So, libvirt thinks I have an Intel Atom processor? (and Windows refuses to boot with n270, saying Unsupported CPU.) How did it conclude that?

As per cpuid, I have "Intel Core i7-900 (Bloomfield D0) / Xeon Processor 3500 (Bloomfield D0) / Xeon Processor 5500 (Gainestown D0), 45nm
   miscellaneous (1/ebx)"

Comment 6 devsk 2015-01-22 04:18:38 UTC
Any idea about why libvirt thinks I have an Intel Atom CPU? Any quick change I can try.

This is kind of urgent for me because I am moving a Windows VM from Virtualbox to KVM and I am in a Activation pickle and getting reminded about it. I don't want to redo that once I get the CPU sorted out and have to change it again.

Comment 7 Ján Tomko 2015-01-22 13:52:45 UTC
(In reply to devsk from comment #2)
> qemu runs fine with enforce:
> 
> $ qemu-system-x86_64 -cpu Nehalem,enforce -machine accel=kvm -boot d -cdrom
> ubuntu-13.04-desktop-i386.iso -m 1000
> 
> I don't see any messages from qemu on the terminal and the machine boots and
> guest shows Nehalem 9xx in /proc/cpuinfo.
> 
> CPUID:
> 
> $ cpuid --raw -1
> CPU:

>    0x80000001 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000001
> edx=0x28100000

0x800 in edx is the bit that should be set for the syscall feature.
The latest model without the 'syscall' feature libvirt knows about is the atom, so it chooses that one.

Though I'm not sure why qemu doesn't fail with enforce.

Maybe some setting in BIOS setup disabled this feature?

Comment 8 devsk 2015-01-22 15:43:16 UTC
May be CPUID is buggy and not reliable then? I say that because syscall shows up in the /proc/cpuinfo, which is what kernel thinks are the features supported by the CPU.

Looks like that's the case. kernel thinks its there but CPUID instruction returns otherwise.

# cpuid -1 --kernel|grep -i syscall
      SYSCALL and SYSRET instructions        = true

# cpuid -1 |grep -i syscall
      SYSCALL and SYSRET instructions        = false

$ grep syscall /proc/cpuinfo  | head -1
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe *syscall* nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid

Also, I don't see vmware or virtualbox complain about this. They happily detect this CPU as i7 920 and work with it. If my BIOS setup was the issue, they won't work either.

How can I work around this?

Comment 9 devsk 2015-01-25 08:13:10 UTC
any ideas?

Comment 10 devsk 2015-01-26 05:22:06 UTC
Working around this issue using this crude method for now (I need to move on with things, running out of time):

# cat libvirt_cpu_model.patch
--- cpu_map.xml.old     2014-12-19 09:32:18.000000000 -0800
+++ cpu_map.xml 2015-01-22 08:22:07.816451011 -0800
@@ -467,7 +467,6 @@
       <feature name='clflush'/>
       <feature name='pni'/>
       <feature name='ssse3'/>
-      <feature name='syscall'/>
       <feature name='nx'/>
       <feature name='lm'/>
       <feature name='lahf_lm'/>

I would really like for someone to help me with a proper solution to this problem. I am pretty sure I am not the first (or only) one to run Qemu using Libvirt on i7 920 CPU.

Comment 11 Jiri Denemark 2015-01-26 09:59:48 UTC
The right solution is to use the following CPU configuration in domain XML:

    <cpu mode="custom" match="exact">
        <model fallback="forbid">Nehalem</model>
        <<feature policy='disable' name='syscall'/>
        <topology sockets="1" cores="2" threads="1"/>
    </cpu>

Comment 12 devsk 2015-01-26 17:37:38 UTC
That's not the "right" solution. Its a workaround which I have to apply to *every VM* where I want Nehalem, which is worse than my work around. And yes, I want to use my native CPU in every VM!

I am looking for the right solution in libvirt, where it does not discard Nehalem as Nehalem, out of the box without any workarounds.

So, what's that solution? Do we trust cpuid --kernel more than CPUID instruction? How is kernel's CPUID module determining the right CPU flags? That's what we need to find out and rely on.

Comment 13 devsk 2015-01-26 19:53:42 UTC
Or may be leave this CPU identification business to Qemu, which seems to be doing the right thing.

Comment 14 devsk 2015-02-01 04:50:58 UTC
Is there any information lacking that I can provide that will help someone fix this issue with libvirt?

I am seeing severe disk performance issues in the Windows guest which I suspect might have something to do with missing syscall feature. I have tried all combinations of IO mode (native, Posix threads), cache (none, wt, wb, ds) mode and disk bus (virtio, scsi, sata, IDE) but nothing seems to explain high IO waits in the guest. The same guest had outstanding disk IO performance in virtualbox. And I don't have any of these disk performance issues in my office servers where I use KVM regularly. So, its most likely related to the missing syscall feature for this particular host.

Comment 15 devsk 2015-02-15 06:41:30 UTC
Any updates on this bug?

Comment 16 devsk 2015-03-03 01:17:38 UTC
Is there any information lacking that I can provide that will help someone fix this issue with libvirt?

Any inputs anyone?

Comment 17 devsk 2015-03-16 07:10:54 UTC
Time for bi-monthly check...what can we do to make some progress on this issue?

The bug is fairly understood and it doesn't even look like a hard bug to fix. So, what's the hold up?

Comment 18 devsk 2015-04-05 04:28:19 UTC
I am also seeing this error during saves. I hadn't noticed that my managed saves were failing...:(

# virsh managedsave 2

error: Failed to save domain 2 state
error: Requested operation is not valid: domain has CPU feature: invtsc

Any ideas?

Comment 19 devsk 2015-04-25 06:05:34 UTC
Customary ping on this issue. Can someone please update whether this is being looked into?

Comment 20 devsk 2015-04-26 00:07:47 UTC
I really need to know the answer to managedsave issue in comment 18. Someone please help. Its a big deterrent to be not able to save my VM during shutdown forcing hard power off right now.

Comment 21 devsk 2015-04-26 00:19:55 UTC
I removed invtsc from the features (why is it even there when my host CPU does not support it?? and syscall should be there and its not there..:( as we figured before), and I get this:

# virsh managedsave 2
error: Failed to save domain 2 state
error: internal error: unable to execute QEMU command 'migrate': State blocked by non-migratable device '0000:00:06.0/ich9_ahci'

What a freaky mess!!

Comment 22 devsk 2015-04-26 05:31:36 UTC
OK. I give up on KVM on my desktop after 3 months...:) Moving my Windows VM back to Virtualbox, where it was working fine. I should have probably left it alone but KVM works so well everywhere else (no CPU detection issues and managedsave is a not used there), I thought I will give it a try on main desktop as well. Bad idea!!

Closing as wontfix because I got a feeling nobody wants to really even troubleshoot the issue, let alone fix it.

Comment 23 MicahEd 2016-11-18 16:18:02 UTC
This may not be useful , but I just encountered this bug and found a fix on this website : 
http://masysma.lima-city.de/37/how_to_transition_from_virtualbox_to_kvm.xhtml

Removing the SATA controller from the VM avoided the error when saving.


Note You need to log in before you can comment on or make changes to this bug.