Bug 1854922 - spec_ctrl host feature not detected
Summary: spec_ctrl host feature not detected
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.40.22
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.4.2
: 4.40.24
Assignee: Milan Zamazal
QA Contact: Yaning Wang
URL:
Whiteboard:
Depends On:
Blocks: 1875609
TreeView+ depends on / blocked
 
Reported: 2020-07-08 12:46 UTC by Rik Theys
Modified: 2020-09-18 07:12 UTC (History)
7 users (show)

Fixed In Version: vdsm-4.40.24
Doc Type: Bug Fix
Doc Text:
spec_ctrl CPU flag might not be set for newer Intel CPU models, resulting in problems with adding hosts with those CPUs. It has been fixed and spec_ctrl is set for those CPUs now.
Clone Of:
: 1870040 1875609 (view as bug list)
Environment:
Last Closed: 2020-09-18 07:12:02 UTC
oVirt Team: Virt
Embargoed:
sbonazzo: ovirt-4.4?
sbonazzo: planning_ack?
sbonazzo: devel_ack?
sbonazzo: testing_ack?


Attachments (Terms of Use)
vdsm log (667.22 KB, application/x-xz)
2020-07-09 06:13 UTC, Rik Theys
no flags Details
engine log (138.25 KB, application/x-xz)
2020-07-09 06:13 UTC, Rik Theys
no flags Details
capabilities (6.88 KB, text/plain)
2020-07-09 06:14 UTC, Rik Theys
no flags Details
domcapabilities (5.59 KB, text/plain)
2020-07-09 06:15 UTC, Rik Theys
no flags Details
engine log (1.55 KB, text/plain)
2020-08-17 09:59 UTC, Oleh Horbachov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 110271 0 master MERGED machinetype: Add spec_ctrl feature also to new CPUs without -IBRS 2021-02-16 13:48:18 UTC

Description Rik Theys 2020-07-08 12:46:49 UTC
Description of problem:

I've upgraded our engine to 4.4 and wanted to upgrade our host to 4.4 as well. I've reinstalled the machine with CentOS 8.2 and wanted to add it back to the cluster but it fails with a message indicating that some needed cpu flags are missing.

Initially I tried to add it to an existing 4.3 cluster with cpu type Intel Skylake Server IBRS SSBD MDS Family but this failed. I've then created a new 4.4 cluster with cpu type Secure Intel Skylake Server Family but this also fails wit the message:

The host CPU does not match the Cluster CPU Type and is running in a degraded mode. It is missing the following CPU flags: vmx, ssbd, md_clear, model_Skylake-Server, spec_ctrl. Please update the host CPU microcode or change the Cluster CPU Type.

When I look at the detected features in the vdsm log:

'info': {'kvmEnabled': 'true', 'cpuCores': '10', 'cpuThreads': '20', 'cpuSockets': '1', 'onlineCpus':
'0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19', 'cpuSpeed': '1002.074', 'cpuModel': 'Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz', 'cpuFlags': 'tpr_shadow,monitor,nx,rtm,cqm_mbm_local,pat,fxsr,adx,rdtscp,pku,cat_l3,avx,smap,x2apic,pcid,mca,ht,cmov,dts,popcnt,fsgsbase,rdseed,erms,art,avx2,fma,invpcid_single,fpu,pdpe1gb,bmi1,ospke,flush_l1d,msr,sep,vmx,sse2,md_clear,tm,invpcid,clflushopt,intel_pt,smep,epb,aes,pclmulqdq,ssse3,avx512f,pts,stibp,mtrr,vme,xsaveopt,ept,pbe,flexpriority,hypervisor,hle,smx,dca,tsc_adjust,sse4_2,ibpb,rdt_a,skip-l1dfl-vmentry,clwb,sse4_1,avx512dq,cpuid_fault,ds_cpl,pdcm,arat,apic,avx512bw,de,pae,vnmi,cqm,f16c,cqm_llc,xtopology,amd-ssbd,avx512cd,pge,xtpr,constant_tsc,pse,arch-capabilities,nopl,sse,clflush,xsaves,cqm_occup_llc,pti,xgetbv1,sdbg,ss,xsave,pebs,cx16,mmx,syscall,lahf_lm,abm,ssbd,aperfmperf,cpuid,pse36,3dnowprefetch,mce,mba,dtes64,dtherm,mpx,intel_ppin,tsc_deadline_timer,tm2,vpid,nonstop_tsc,arch_perfmon,movbe,umip,md-clear,est,tsc,rdrand,cqm_mbm_total,pni,cdp_l3,cx8,acpi,rep_good,lm,bts,xsavec,bmi2,ida,pln,invtsc,avx512vl,ibrs,model_Broadwell-IBRS,model_Skylake-Server-IBRS,model_Skylake-Client-IBRS,model_n270,model_Penryn,model_Opteron_G2,model_coreduo,model_Westmere,model_Skylake-Client,model_Nehalem,model_Westmere-IBRS,model_Opteron_G1,model_qemu32,model_Nehalem-IBRS,model_SandyBridge,model_pentium2,model_SandyBridge-IBRS,model_Haswell-noTSX-IBRS,model_Haswell-IBRS,model_IvyBridge,model_qemu64,model_pentium,model_Haswell,model_kvm64,model_Broadwell-noTSX-IBRS,model_pentium3,model_Broadwell-noTSX,model_Broadwell,model_IvyBridge-IBRS,model_Conroe,model_Haswell-noTSX,model_core2duo,model_486,model_Skylake-Server,model_kvm32', 'version_name': 'Snow Man', 'software_version': '4.40.22', 'software_revision': '1', 'supportedENGINEs': ['4.2', '4.3', '4.4'], 'clusterLevels': ['4.2', '4.3', '4.4']

The "missing" features are all there except spec_ctrl. According to https://bugzilla.redhat.com/show_bug.cgi?id=1837266 this should get added automatically on IBRS cpu's, but it seems it isn't in this case.

With the cpu type set to Intel Skylake Server Family it also complains about other missing features that are also clearly present in /proc/cpuinfo (such as vmx and nx)


Version-Release number of selected component (if applicable):
vdsm-4.40.22-1.el8.x86_64

How reproducible:


Steps to Reproduce:
1. Add an Intel(R) Xeon(R) Silver 4114 CPU host to a 4.4 cluster
2.
3.

Actual results:
oVirt complains about missing cpu features

Expected results:
Host added to cluster

Additional info:

Comment 1 Michal Skrivanek 2020-07-09 04:32:29 UTC
Can you please attach full vdsm.log and engine.log? And virsh domcapabilities and virsh capabilities output if you can. Thanks!

Comment 2 Rik Theys 2020-07-09 06:13:21 UTC
Created attachment 1700387 [details]
vdsm log

Comment 3 Rik Theys 2020-07-09 06:13:42 UTC
Created attachment 1700388 [details]
engine log

Comment 4 Rik Theys 2020-07-09 06:14:02 UTC
Created attachment 1700389 [details]
capabilities

Comment 5 Rik Theys 2020-07-09 06:15:00 UTC
Created attachment 1700390 [details]
domcapabilities

Hi,

I've attached the requested logs and command output.

The logs will show a lot of attempts to get this host up as I'm having multiple issues.

Regards,
Rik

Comment 6 Michal Skrivanek 2020-07-13 15:46:30 UTC
It could be because your host is Cascadelake-Server and bug 1837266  is adding it only for names ending with -IBRS...but in this case it doesn't. Milan, it may need another exception or maybe blacklist rather than a whitelist for this...

Comment 7 Rik Theys 2020-07-14 11:15:04 UTC
Hi Michal,

(In reply to Michal Skrivanek from comment #6)
> It could be because your host is Cascadelake-Server and bug 1837266  is
> adding it only for names ending with -IBRS...but in this case it doesn't.
> Milan, it may need another exception or maybe blacklist rather than a
> whitelist for this...

Are you sure my cpu is a Cascadelake? According to the 'virsh capabilities' my cpu model is Skylake-Server-IBRS. Since it ends with -IBRS, it makes me believe the feature should have been automatically added to my feature list already.

According to https://ark.intel.com/content/www/us/en/ark/products/123550/intel-xeon-silver-4114-processor-13-75m-cache-2-20-ghz.html my cpu is a Skylake cpu.

Regards,
Rik

Comment 8 Milan Zamazal 2020-07-14 14:52:56 UTC
Hi Rik,

indeed your physical CPU model is reported as Skylake-Server-IBRS. For some reason, libvirt apparently decides to use Cascadelake model for your guests, as reported in `virsh domcapabilities'. Both the models should report spec_ctrl, but Vdsm currently reports it only for *-IBRS. So I think Michal's analysis above still applies and we need one of the suggested fixes.

Comment 9 Yaning Wang 2020-08-11 06:22:33 UTC
Verified on:

rhv-4.4.2-2
vdsm-4.40.24-1

Steps:

1. Add an Intel(R) Xeon(R) Silver 4110 CPU host to a 4.4 cluster


Actual results:
hosts successfully added to cluster without any complaints

Comment 10 Oleh Horbachov 2020-08-17 09:58:26 UTC
I created cluster for v4.4.0 and after upgrade I had the same problem on ovirt v4.4.1.4. I found this bugreport and updated to version 4.4.2-pre, the problem remained with error

The host CPU does not match the Cluster CPU Type and is running in a degraded mode. It is missing the following CPU flags: vmx, model_Skylake-Server, nx. Please update the host CPU microcode or change the Cluster CPU Type.

In additional I tried reinstall
CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
ovirt-engine-4.4.2.2-1.el8.noarch
vdsm-4.40.25-1.el8.x86_64

Comment 11 Oleh Horbachov 2020-08-17 09:59:45 UTC
Created attachment 1711580 [details]
engine log

Comment 12 Oleh Horbachov 2020-08-17 10:03:40 UTC
Sorry missed text
I tried reinstall exist host and catch same error

Comment 13 Michal Skrivanek 2020-08-19 11:36:28 UTC
it's not the same issue, please use a separate bugs with separate logs (from hosts please)

Comment 14 Arik 2020-08-20 09:28:26 UTC
*** Bug 1869209 has been marked as a duplicate of this bug. ***

Comment 15 Sandro Bonazzola 2020-09-18 07:12:02 UTC
This bugzilla is included in oVirt 4.4.2 release, published on September 17th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.