Bug 1218673 - fails to detect E5-2600v3 cpu as haswell
Summary: fails to detect E5-2600v3 cpu as haswell
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
high vote
Target Milestone: ovirt-3.6.0-rc
: 3.6.0
Assignee: jniederm
QA Contact: Ilanit Stein
URL:
Whiteboard: virt
Depends On: 1182650 1186405 1199446 1213053 1229432
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-05 14:19 UTC by Rik Theys
Modified: 2016-03-03 09:32 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-01-13 14:39:21 UTC
oVirt Team: Virt
rule-engine: ovirt-3.6.0+
ylavi: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
capabilities (7.83 KB, text/plain)
2015-05-05 20:30 UTC, Rik Theys
no flags Details
dmidecode -t processor output (3.34 KB, text/plain)
2015-05-05 20:30 UTC, Rik Theys
no flags Details
virsh compare output (61 bytes, text/plain)
2015-05-05 20:31 UTC, Rik Theys
no flags Details


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 41592 master MERGED core: Support for Broadwell and noTSX variants of Haswell and Broadwell Never
oVirt gerrit 41784 ovirt-engine-3.5 ABANDONED core: Support for noTSX and Broadwell processors Never

Description Rik Theys 2015-05-05 14:19:23 UTC
Description of problem:

While trying to add a poweredge R730 with Xeon E5-2690v3 cpu's, the 3.5.2 engine claims the system does not have haswell cpu's.


/proc/cpuinfo output:

processor       : 47
vendor_id       : GenuineIntel
cpu family      : 6
model           : 63
model name      : Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
stepping        : 2
microcode       : 0x29
cpu MHz         : 2495.492
cache size      : 30720 KB
physical id     : 1
siblings        : 24
core id         : 13
cpu cores       : 12
apicid          : 59
initial apicid  : 59
fpu             : yes
fpu_exception   : yes
cpuid level     : 15
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
bogomips        : 5211.43
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

# vdsClient -s 0 getVdsCaps | grep cpu
        cpuCores = '24'
        cpuFlags = 'fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,pdpe1gb,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,rep_good,nopl,xtopology,nonstop_tsc,aperfmperf,eagerfpu,pni,pclmulqdq,dtes64,monitor,ds_cpl,vmx,smx,est,tm2,ssse3,fma,cx16,xtpr,pdcm,pcid,dca,sse4_1,sse4_2,x2apic,movbe,popcnt,tsc_deadline_timer,aes,xsave,avx,f16c,rdrand,lahf_lm,abm,ida,arat,epb,xsaveopt,pln,pts,dtherm,tpr_shadow,vnmi,flexpriority,ept,vpid,fsgsbase,tsc_adjust,bmi1,avx2,smep,bmi2,erms,invpcid,model_Nehalem,model_Conroe,model_coreduo,model_core2duo,model_Penryn,model_Westmere,model_n270,model_SandyBridge'
        cpuModel = 'Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz'
        cpuSockets = '2'
        cpuSpeed = '1613.015'
        cpuThreads = '48'
        numaNodes = {'0': {'cpus': [0,
                     '1': {'cpus': [1,


Version-Release number of selected component (if applicable):
3.5.2

How reproducible:
Try to add a host with Xeon E5-2690 v3 cpu's

Steps to Reproduce:
1.
2.
3.

Actual results:
Not detected as haswell based cpu

Expected results:
Detected as haswell based cpu

Additional info:

Comment 1 Dan Kenigsberg 2015-05-05 15:52:50 UTC
what's your `virsh -r capabilities` ?

what is the output of

 echo '<cpu match="minimum"><model>Haswell</model>
       <vendor>Intel</vendor></cpu>'     > /tmp/cpu.xml
 virsh -r cpu-compare /tmp/cpu.xml

Comment 2 Rik Theys 2015-05-05 20:30:27 UTC
Created attachment 1022312 [details]
capabilities

Comment 3 Rik Theys 2015-05-05 20:30:49 UTC
Created attachment 1022313 [details]
dmidecode -t processor output

Comment 4 Rik Theys 2015-05-05 20:31:06 UTC
Created attachment 1022314 [details]
virsh compare output

Comment 5 Rik Theys 2015-05-05 20:33:30 UTC
(In reply to Dan Kenigsberg from comment #1)
> what's your `virsh -r capabilities` ?

see attachment 1022312 [details]. It seems virsh lists the cpu as sandybridge. In the cpu tag it also lists sockets=1, where the system has two sockets in use. It does show both numa nodes so it does seem to see the two sockets further on.

I've also attached the output of dmidecode -t processor to show both sockets are populated.

> 
> what is the output of
> 
>  echo '<cpu match="minimum"><model>Haswell</model>
>        <vendor>Intel</vendor></cpu>'     > /tmp/cpu.xml
>  virsh -r cpu-compare /tmp/cpu.xml

Output is attached. It indicates the cpu's are not compatible.

Are there any features that are required for the cpu to be detected as Haswell? Maybe I need to enable some specific feature in the BIOS?

Regards,

Rik

Comment 6 Rik Theys 2015-05-05 20:49:39 UTC
Hi,

Looking at other threads this seems to be caused by a microcode update to the cpu which removed the TSX-NI feature (hle feature)

See https://www.redhat.com/archives/libvir-list/2014-December/msg00950.html for more information.

My Debian box has libvirt 1.2.14 and seems to have a "Haswell-noTSX" cpu model in cpu_map.xml which would match with the cpu this server has.

So I guess the cpu_map needs an update and oVirt should have an additional model added?

Regards,

Rik

Comment 7 Dan Kenigsberg 2015-05-05 23:32:47 UTC
Which version of libvirt do you have on your ovirt host? Which platform (Fedora/el) do you use?

libvirt bug 1182650 and qemu bug 1213053 introduce "Haswell-noTSX", which may need a backport to your platform.

I don't think there's anything Vdsm can do about this, but as you say, ovirt-engine may need to define a new cpu compatibility level, and refrain of creating new clusters with the now-obsoleted "Haswell-with-TSX" cpu. Engine should not just replace the expected model_Haswell with model_Haswell-noTSX, since we'd like to block live migration of Haswell-initiated VMs into rebooted hosts that have only model_Haswell-noTSX. Such hosts should be marked as non-operational.

Comment 8 Dan Kenigsberg 2015-05-05 23:43:09 UTC
bug 1199446 introduces "Haswell-noTSX" in el7.2 - we may need to wait for this in to have it fully in ovirt.

Comment 9 Rik Theys 2015-05-06 07:18:05 UTC
(In reply to Dan Kenigsberg from comment #7)
> Which version of libvirt do you have on your ovirt host? Which platform
> (Fedora/el) do you use?

We're using CentOS 7.1 on the host.

> libvirt bug 1182650 and qemu bug 1213053 introduce "Haswell-noTSX", which
> may need a backport to your platform.
> 
> I don't think there's anything Vdsm can do about this, but as you say,
> ovirt-engine may need to define a new cpu compatibility level, and refrain
> of creating new clusters with the now-obsoleted "Haswell-with-TSX" cpu.
> Engine should not just replace the expected model_Haswell with
> model_Haswell-noTSX, since we'd like to block live migration of
> Haswell-initiated VMs into rebooted hosts that have only
> model_Haswell-noTSX. Such hosts should be marked as non-operational.

Is there a chance oVirt will ship an updated libvirt/qemu with the required patches for the el7.1 hosts? That update can later be replaced by the 7.2 update. 7.2 is still months away :-(.

If I select SandyBridge as my current cluster type, can I upgrade it to "Haswell-noTSX" once libvirt/qemu/ovirt supports it? Does this require any changes on the VM configuration? What's the procedure for this? Do I power down all VM's on the cluster, change the cluster type and boot the VM's back up? Or can I do this without taking the cluster/VM's offline?

Regards,

Rik

Comment 10 Dan Kenigsberg 2015-05-19 22:40:22 UTC
(In reply to Rik Theys from comment #9)

> Is there a chance oVirt will ship an updated libvirt/qemu with the required
> patches for the el7.1 hosts? That update can later be replaced by the 7.2
> update. 7.2 is still months away :-(.

Frankly, I think that the chances are slim (due to lack of resources). A more viable approach is for a Red Hat EL7 customer(s) to ask for a backport of Bug 1199446 to el-7.1.z.

> 
> If I select SandyBridge as my current cluster type, can I upgrade it to
> "Haswell-noTSX" once libvirt/qemu/ovirt supports it?

Yes, this is possible.

> Does this require any
> changes on the VM configuration? 

I don't think so. Unless it you explicitly want them to have Haswell vcpu, which can be set only after upgrade.

> What's the procedure for this? Do I power
> down all VM's on the cluster, change the cluster type and boot the VM's back
> up? Or can I do this without taking the cluster/VM's offline?

I believe that a cluster upgrade can be done with no down time; if you want to change a VM vcpu type, you would have to stop it first.

Arik may correct me or add more info about the procedure.

Comment 11 Michal Skrivanek 2015-05-20 11:09:30 UTC
Let's add the -noTSX variant of haswell and broadwell. 
And better do it sooner (3.5.z) than later

Comment 12 Michal Skrivanek 2015-06-01 12:40:32 UTC
we can't backport this to 3.5 since we do not have 3.5 qemu 2.3 required for -noTSX variants

Comment 13 Dan Kenigsberg 2015-06-01 15:32:11 UTC
(In reply to Michal Skrivanek from comment #12)
> we can't backport this to 3.5 since we do not have 3.5 qemu 2.3 required for
> -noTSX variants

Michal, which platform are you referring to?
Cole, don't you intend to backport -noTSX to el6?

Comment 14 Cole Robinson 2015-06-05 21:46:57 UTC
(In reply to Dan Kenigsberg from comment #13)
> (In reply to Michal Skrivanek from comment #12)
> > we can't backport this to 3.5 since we do not have 3.5 qemu 2.3 required for
> > -noTSX variants
> 
> Michal, which platform are you referring to?
> Cole, don't you intend to backport -noTSX to el6?

Yes, https://bugzilla.redhat.com/show_bug.cgi?id=1213053 is tracking that for fedora qemu

Comment 15 Dan Kenigsberg 2015-06-07 15:56:13 UTC
(In reply to Cole Robinson from comment #14)
> 
> Yes, https://bugzilla.redhat.com/show_bug.cgi?id=1213053 is tracking that
> for fedora qemu

I see that. I'm asking explicitly about the plans regarding el6.

Comment 16 Cole Robinson 2015-06-08 16:04:51 UTC
(In reply to Dan Kenigsberg from comment #15)
> (In reply to Cole Robinson from comment #14)
> > 
> > Yes, https://bugzilla.redhat.com/show_bug.cgi?id=1213053 is tracking that
> > for fedora qemu
> 
> I see that. I'm asking explicitly about the plans regarding el6.

Sorry, I missed the el6 part. I don't really work on the epel qemu builds so I had no intention of it. But if someone does the backport and posts a patch in a fedora bug for it I'll be happy to do the build

Comment 17 Dan Kenigsberg 2015-06-08 17:52:51 UTC
Actually, I worry about RHEL6, not epel. To make things clearer, I've cloned-created bug 1229432.

Comment 18 Max Kovgan 2015-06-28 14:13:22 UTC
ovirt-3.6.0-3 release

Comment 19 Ilanit Stein 2015-12-31 09:47:54 UTC
Tested on rhevm-3.6.2-2

Verified all 3 new options are displayed in cluster cpu type list:
Intel Haswell-noTSX Family
Intel Broadwell-noTSX Family
Intel Broadwell Family

Using a host with cpu Intel model_Haswell-noTSX host, getVdsCaps:
cpuModel = 'Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz'
cpuFlags = 'fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,pdpe1gb,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,rep_good,nopl,xtopology,nonstop_tsc,aperfmperf,eagerfpu,pni,pclmulqdq,dtes64,monitor,ds_cpl,vmx,smx,est,tm2,ssse3,fma,cx16,xtpr,pdcm,pcid,dca,sse4_1,sse4_2,x2apic,movbe,popcnt,tsc_deadline_timer,aes,xsave,avx,f16c,rdrand,lahf_lm,abm,ida,arat,epb,pln,pts,dtherm,tpr_shadow,vnmi,flexpriority,ept,vpid,fsgsbase,tsc_adjust,bmi1,avx2,smep,bmi2,erms,invpcid,cqm,xsaveopt,cqm_llc,cqm_occup_llc,model_Haswell-noTSX,model_Nehalem,model_Conroe,model_coreduo,model_core2duo,model_Penryn,model_IvyBridge,model_Westmere,model_n270,model_SandyBridge'

For this host, defining cluster cpu Type as "Intel Haswell-noTSX Family" - Succeeded.

Trying to change the cluster, containing only this host to:
Intel Broadwell-noTSX Family/Intel Broadwell Family/Intel Haswell Family,
open "Operation Canceled" window with error:
 
"Error while executing action: Cannot change Cluster CPU to higher CPU type when there are active Hosts with lower CPU type.
-Please move Hosts with lower CPU to maintenance first."

Comment 20 Ilanit Stein 2016-01-05 08:36:06 UTC
Tested on rhevm-3.6.2-2

Using a host with cpu Intel model_Broadwell host (has flags: rdseed, hle and rtm) getVdsCaps:

cpuModel = 'Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz'

cpuFlags = 'fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,pdpe1gb,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,rep_good,nopl,xtopology,nonstop_tsc,aperfmperf,eagerfpu,pni,pclmulqdq,dtes64,monitor,ds_cpl,vmx,smx,est,tm2,ssse3,fma,cx16,xtpr,pdcm,pcid,sse4_1,sse4_2,x2apic,movbe,popcnt,tsc_deadline_timer,aes,xsave,avx,f16c,rdrand,lahf_lm,abm,3dnowprefetch,ida,arat,epb,pln,pts,dtherm,tpr_shadow,vnmi,flexpriority,ept,vpid,fsgsbase,tsc_adjust,bmi1,hle,avx2,smep,bmi2,erms,invpcid,rtm,rdseed,adx,smap,xsaveopt,model_Haswell,model_Broadwell,model_Haswell-noTSX,model_Nehalem,model_Conroe,model_coreduo,model_core2duo,model_Penryn,model_IvyBridge,model_Westmere,model_n270,model_Broadwell-noTSX,model_SandyBridge'
        
This host installed successfully on rhevm setup. 
For clusters of:
1. cpu type: Intel Broadwell Family
2. cpu type: Broadwell-noTSX Family

Moving to verified, as all 3 added cpu Families were tested.

Comment 21 Sandro Bonazzola 2016-01-13 14:39:21 UTC
oVirt 3.6.0 has been released, closing current release


Note You need to log in before you can comment on or make changes to this bug.