Bug 1534212 - Self-Hosted Engine need to be able to run on an IBRS compatible CPU
Summary: Self-Hosted Engine need to be able to run on an IBRS compatible CPU
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.2.2
: ---
Assignee: Simone Tiraboschi
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On: 1552158 1549642 1551289 1551291
Blocks: 1458709 1534421
TreeView+ depends on / blocked
 
Reported: 2018-01-14 08:38 UTC by Yaniv Kaul
Modified: 2018-04-08 15:05 UTC (History)
6 users (show)

Fixed In Version: ovirt-hosted-engine-setup-2.2.11
Doc Type: Enhancement
Doc Text:
In this release, the self-hosted engine can be installed on IBRS-compatible CPUs and the cluster's CPU type is set accordingly.
Clone Of:
: 1534421 (view as bug list)
Environment:
Last Closed: 2018-03-29 11:16:49 UTC
oVirt Team: Integration
rule-engine: ovirt-4.2+
ylavi: blocker+


Attachments (Terms of Use)
sosreport from host (11.34 MB, application/x-xz)
2018-03-04 09:26 UTC, Nikolai Sednev
no flags Details

Description Yaniv Kaul 2018-01-14 08:38:47 UTC
Description of problem:
I'm opening this bug for both clean install and upgrade, though I assume we'll need to split it later.

We need to ensure that:
1. Clean installation on a host with IBRS compatible fixes (CPU + kernel + qemu-kvm + libvirt) can select an IBRS enabled vCPU type for the SHE VM.
2. We need to provide a procedure on upgrade to switch to such CPU type (already doable today?)

Comment 2 Nikolai Sednev 2018-03-04 09:21:44 UTC
I've tried to deploy IBRS CPU type over Gluster and failed with:
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "panther09.qa.lab.tlv.redhat.com", "affinity_labels": [], "auto_numa_status": "disable", "certificate": {"organization": "qa.lab.tlv.redhat.com", "subject": "O=qa.lab.tlv.redhat.com,CN=panther09.qa.lab.tlv.redhat.com"}, "cluster": {"href": "/ovirt-engine/api/clusters/21dcb67a-1f8b-11e8-bc70-00163eeeeeee", "id": "21dcb67a-1f8b-11e8-bc70-00163eeeeeee"}, "comment": "", "cpu": {"name": "Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz", "speed": 3021.0, "topology": {"cores": 8, "sockets": 1, "threads": 2}}, "device_passthrough": {"enabled": false}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"manufacturer": "Dell Inc.", "product_name": "PowerEdge FC430", "serial_number": "4D50CB2", "supported_rng_sources": ["hwrng", "random"], "uuid": "4C4C4544-0044-3510-8030-B4C04F434232"}, "hooks": [], "href": "/ovirt-engine/api/hosts/cbc265b3-1b2b-484c-9f4b-36a71ab5181b", "id": "cbc265b3-1b2b-484c-9f4b-36a71ab5181b", "iscsi": {"initiator": "iqn.1994-05.com.redhat:a515d04e7e2f"}, "katello_errata": [], "kdump_status": "disabled", "ksm": {"enabled": false}, "libvirt_version": {"build": 0, "full_version": "libvirt-3.9.0-13.el7", "major": 3, "minor": 9, "revision": 0}, "max_scheduling_memory": 66862448640, "memory": 67267198976, "name": "panther09.qa.lab.tlv.redhat.com", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": false, "os": {"custom_kernel_cmdline": "", "reported_kernel_cmdline": "BOOT_IMAGE=/vmlinuz-3.10.0-858.el7.x86_64 root=/dev/mapper/vg0-lv_root ro rhgb quiet crashkernel=auto rd.lvm.lv=vg0/lv_root rd.lvm.lv=vg0/lv_swap console=ttyS1,115200n8 LANG=en_US.UTF-8", "type": "RHEL", "version": {"full_version": "7.5 - 8.el7", "major": 7, "minor": 5}}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_linux": {"mode": "enforcing"}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:pvf83fk8qaHH3w0mjHAVEDPOs6cOnGH42tPf7xKf/gk", "port": 22}, "statistics": [], "status": "non_responsive", "storage_connection_extensions": [], "summary": {"active": 1, "migrating": 0, "total": 1}, "tags": [], "transparent_huge_pages": {"enabled": true}, "type": "rhel", "unmanaged_networks": [], "update_available": false, "version": {"build": 19, "full_version": "vdsm-4.20.19-1.el7ev", "major": 4, "minor": 20, "revision": 0}}]}, "attempts": 50, "changed": false}
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: 

Components on host:
ovirt-hosted-engine-setup-2.2.11-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.6-1.el7ev.noarch
Linux 3.10.0-858.el7.x86_64 #1 SMP Tue Feb 27 08:59:23 EST 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Moving back to assigned.

Comment 3 Red Hat Bugzilla Rules Engine 2018-03-04 09:21:50 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 4 Nikolai Sednev 2018-03-04 09:26:16 UTC
Created attachment 1403702 [details]
sosreport from host

Comment 5 Yaniv Kaul 2018-03-04 10:11:49 UTC
Are you sure it was IBRS enabled? What did VDSM say?

Comment 6 Nikolai Sednev 2018-03-04 11:54:12 UTC
(In reply to Yaniv Kaul from comment #5)
> Are you sure it was IBRS enabled? What did VDSM say?

It was the IBRS capable host.
Vdsm log was attached within the sosreport.
Is there any additional actions required for IBRS to work?

Comment 7 Yaniv Kaul 2018-03-04 12:54:06 UTC
(In reply to Nikolai Sednev from comment #6)
> (In reply to Yaniv Kaul from comment #5)
> > Are you sure it was IBRS enabled? What did VDSM say?
> 
> It was the IBRS capable host.
> Vdsm log was attached within the sosreport.

From VDSM log which you've attached:
kernelFeatures': {u'IBRS': 0, u'PTI': 1, u'IBPB': 0},


> Is there any additional actions required for IBRS to work?

I think IBRS should be equal to 1.

Comment 8 Nikolai Sednev 2018-03-04 14:03:07 UTC
https://access.redhat.com/articles/3311301
These three debugfs tunables can be enabled or disabled on the kernel command line at boot, or at runtime via debugfs controls. The tunables control Page Table Isolation (pti), Indirect Branch Restricted Speculation (ibrs), and Indirect Branch Prediction Barriers (ibpb). Red Hat enables each of these features by default as needed to protect the architecture detected at boot.

Architectural Defaults
By default, each of the 3 tunables that apply to an architecture will be enabled automatically at boot time, based upon the architecture detected.

Intel Defaults:

pti 1 ibrs 1 ibpb 1 -> fix variant#1 #2 #3
pti 1 ibrs 0 ibpb 0 -> fix variant#1 #3 (for older Intel systems with no microcode update available)

panther09 ~]# cat /sys/kernel/debug/x86/ibrs_enabled
0

I see that defaults somehow are not the same in documentation vs. real host that was cleanly reprovisioned to RHEL7.5. 

This might partially explain failure of deployment.


panther09 ~]#  systemctl status microcode -l
● microcode.service - Load CPU microcode update
   Loaded: loaded (/usr/lib/systemd/system/microcode.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Sun 2018-03-04 15:54:55 IST; 4min 15s ago
  Process: 888 ExecStart=/usr/bin/bash -c grep -l GenuineIntel /proc/cpuinfo | xargs grep -l -E "model[[:space:]]*: 79$" > /dev/null || echo 1 > /sys/devices/system/cpu/microcode/reload (code=exited, status=0/SUCCESS)
 Main PID: 888 (code=exited, status=0/SUCCESS)

ar 04 15:54:55 panther09.qa.lab.tlv.redhat.com systemd[1]: Starting Load CPU microcode update...
Mar 04 15:54:55 panther09.qa.lab.tlv.redhat.com systemd[1]: Started Load CPU microcode update.

panther09 ~]# dmesg | grep microcode
[    0.000000] microcode: microcode updated early to revision 0x3a, date = 2017-01-30
[    2.065211] microcode: CPU0 sig=0x306f2, pf=0x1, revision=0x3a
[    2.065222] microcode: CPU1 sig=0x306f2, pf=0x1, revision=0x3a
[    2.065234] microcode: CPU2 sig=0x306f2, pf=0x1, revision=0x3a
[    2.065246] microcode: CPU3 sig=0x306f2, pf=0x1, revision=0x3a
[    2.065257] microcode: CPU4 sig=0x306f2, pf=0x1, revision=0x3a
[    2.065269] microcode: CPU5 sig=0x306f2, pf=0x1, revision=0x3a
[    2.065281] microcode: CPU6 sig=0x306f2, pf=0x1, revision=0x3a
[    2.065291] microcode: CPU7 sig=0x306f2, pf=0x1, revision=0x3a
[    2.065302] microcode: CPU8 sig=0x306f2, pf=0x1, revision=0x3a
[    2.065315] microcode: CPU9 sig=0x306f2, pf=0x1, revision=0x3a
[    2.065338] microcode: CPU10 sig=0x306f2, pf=0x1, revision=0x3a
[    2.065348] microcode: CPU11 sig=0x306f2, pf=0x1, revision=0x3a
[    2.065370] microcode: CPU12 sig=0x306f2, pf=0x1, revision=0x3a
[    2.065381] microcode: CPU13 sig=0x306f2, pf=0x1, revision=0x3a
[    2.065392] microcode: CPU14 sig=0x306f2, pf=0x1, revision=0x3a
[    2.065402] microcode: CPU15 sig=0x306f2, pf=0x1, revision=0x3a
[    2.065470] microcode: Microcode Update Driver: v2.01 <tigran@aivazian.fsnet.co.uk>, Peter Oruba

Might be that host have to go through BIOS FW upgrade to get ibrs_enabled.

Please provide your input.

Comment 9 Yaniv Kaul 2018-03-04 14:05:28 UTC
Might be - it's your server - I don't know if you have an updated BIOS FW - or the latest Microcode from Intel for this CPU.

But the fact is, you don't have IBRS right now.

So:
1. Please test with a host we know has IBRS.
2. Were you asked in any point in time to choose such CPU? If not, how did we get to this situation in the 1st place (the failure) ?

Comment 10 Nikolai Sednev 2018-03-04 14:09:49 UTC
(In reply to Yaniv Kaul from comment #9)
> Might be - it's your server - I don't know if you have an updated BIOS FW -
> or the latest Microcode from Intel for this CPU.
> 
> But the fact is, you don't have IBRS right now.
> 
> So:
> 1. Please test with a host we know has IBRS.
> 2. Were you asked in any point in time to choose such CPU? If not, how did
> we get to this situation in the 1st place (the failure) ?

Regarding first topic, the host is IBRS capable, I've received it especially from one of QA teams to verify this bug, although it appears to be with unpatched BIOS and so I've already opened a ticket for that matter.

Can you please rephrase your second topic? I don't quite getting the point.

Comment 11 Yaniv Kaul 2018-03-04 14:45:02 UTC
(In reply to Nikolai Sednev from comment #10)
> (In reply to Yaniv Kaul from comment #9)
> > Might be - it's your server - I don't know if you have an updated BIOS FW -
> > or the latest Microcode from Intel for this CPU.
> > 
> > But the fact is, you don't have IBRS right now.
> > 
> > So:
> > 1. Please test with a host we know has IBRS.
> > 2. Were you asked in any point in time to choose such CPU? If not, how did
> > we get to this situation in the 1st place (the failure) ?
> 
> Regarding first topic, the host is IBRS capable, I've received it especially
> from one of QA teams to verify this bug, although it appears to be with
> unpatched BIOS and so I've already opened a ticket for that matter.
> 
> Can you please rephrase your second topic? I don't quite getting the point.

When using the Cockpit wizard, you should not be asked about the CPU type. Where were you asked? As part of command-line interactive setup / otopi / answer file?

Comment 12 Nikolai Sednev 2018-03-04 15:13:10 UTC
(In reply to Yaniv Kaul from comment #11)
> (In reply to Nikolai Sednev from comment #10)
> > (In reply to Yaniv Kaul from comment #9)
> > > Might be - it's your server - I don't know if you have an updated BIOS FW -
> > > or the latest Microcode from Intel for this CPU.
> > > 
> > > But the fact is, you don't have IBRS right now.
> > > 
> > > So:
> > > 1. Please test with a host we know has IBRS.
> > > 2. Were you asked in any point in time to choose such CPU? If not, how did
> > > we get to this situation in the 1st place (the failure) ?
> > 
> > Regarding first topic, the host is IBRS capable, I've received it especially
> > from one of QA teams to verify this bug, although it appears to be with
> > unpatched BIOS and so I've already opened a ticket for that matter.
> > 
> > Can you please rephrase your second topic? I don't quite getting the point.
> 
> When using the Cockpit wizard, you should not be asked about the CPU type.
> Where were you asked? As part of command-line interactive setup / otopi /
> answer file?

Ah...I was running not from the Cockpit, but via CLI.

Comment 13 Simone Tiraboschi 2018-03-05 08:34:04 UTC
(In reply to Nikolai Sednev from comment #12)
> Ah...I was running not from the Cockpit, but via CLI.

We are asking the cluster CPU type only on the vintage (--noansible) flow (since in that case hosted-engine-setup has to start the engine VM as we want it to get imported by the auto-import process).

On the new flow we simply let the engine choose by itself according to host capabilities.

Comment 14 Nikolai Sednev 2018-03-06 14:08:34 UTC
I see some strange host's behavior.
In one hand ibrs should be supported on host, in the other hand, looks like capability is turned off.

By default ibrs should be on on host if it's architecture supports that functionality: https://access.redhat.com/articles/3311301.

[root@panther09 ~]# cat /sys/kernel/debug/x86/ibrs_enabled
0
[root@panther09 ~]# virsh -r capabilities | head
<capabilities>

  <host>
    <uuid>4c4c4544-0044-3510-8030-b4c04f434232</uuid>
    <cpu>
      <arch>x86_64</arch>
      <model>Haswell-noTSX-IBRS</model>
      <vendor>Intel</vendor>
      <microcode version='60'/>
      <topology sockets='1' cores='8' threads='2'/>

Any ideas?

Components on host:
ovirt-hosted-engine-ha-2.2.6-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.12-1.el7ev.noarch
Red Hat Enterprise Linux Server release 7.5 (Maipo)
Linux 3.10.0-858.el7.x86_64 #1 SMP Tue Feb 27 08:59:23 EST 2018 x86_64 x86_64 x86_64 GNU/Linux


The host's BIOS had been recently upgraded.

Comment 18 Nikolai Sednev 2018-03-06 16:51:42 UTC
I've manually cast echo 2 > /sys/kernel/debug/x86/ibrs_enabled and then tested with the script:
panther09 ~]#  ./spectre-meltdown-checker.sh
Spectre and Meltdown mitigation detection tool v0.35

Checking for vulnerabilities on current system
Kernel is Linux 3.10.0-858.el7.x86_64 #1 SMP Tue Feb 27 08:59:23 EST 2018 x86_64
CPU is Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz

Hardware check
* Hardware support (CPU microcode) for mitigation techniques
  * Indirect Branch Restricted Speculation (IBRS)
    * SPEC_CTRL MSR is available:  YES 
    * CPU indicates IBRS capability:  YES  (SPEC_CTRL feature bit)
  * Indirect Branch Prediction Barrier (IBPB)
    * PRED_CMD MSR is available:  YES 
    * CPU indicates IBPB capability:  YES  (SPEC_CTRL feature bit)
  * Single Thread Indirect Branch Predictors (STIBP)
    * SPEC_CTRL MSR is available:  YES 
    * CPU indicates STIBP capability:  YES 
  * Enhanced IBRS (IBRS_ALL)
    * CPU indicates ARCH_CAPABILITIES MSR availability:  NO 
    * ARCH_CAPABILITIES MSR advertises IBRS_ALL capability:  NO 
  * CPU explicitly indicates not being vulnerable to Meltdown (RDCL_NO):  NO 
  * CPU microcode is known to cause stability problems:  NO  (model 63 stepping 2 ucode 0x3c)
* CPU vulnerability to the three speculative execution attacks variants
  * Vulnerable to Variant 1:  YES 
  * Vulnerable to Variant 2:  YES 
  * Vulnerable to Variant 3:  YES 

CVE-2017-5753 [bounds check bypass] aka 'Spectre Variant 1'
* Mitigated according to the /sys interface:  YES  (kernel confirms that the mitigation is active)
* Kernel has array_index_mask_nospec:  NO 
* Kernel has the Red Hat/Ubuntu patch:  YES 
> STATUS:  NOT VULNERABLE  (Mitigation: Load fences)

CVE-2017-5715 [branch target injection] aka 'Spectre Variant 2'
* Mitigated according to the /sys interface:  YES  (kernel confirms that the mitigation is active)
* Mitigation 1
  * Kernel is compiled with IBRS/IBPB support:  YES 
  * Currently enabled features
    * IBRS enabled for Kernel space:  YES 
    * IBRS enabled for User space:  YES 
    * IBPB enabled:  YES 
* Mitigation 2
  * Kernel compiled with retpoline option:  YES 
  * Kernel compiled with a retpoline-aware compiler:  UNKNOWN 
> STATUS:  NOT VULNERABLE  (Mitigation: IBRS (kernel and user space))

CVE-2017-5754 [rogue data cache load] aka 'Meltdown' aka 'Variant 3'
* Mitigated according to the /sys interface:  YES  (kernel confirms that the mitigation is active)
* Kernel supports Page Table Isolation (PTI):  YES 
* PTI enabled and active:  YES 
* Running as a Xen PV DomU:  NO 
> STATUS:  NOT VULNERABLE  (Mitigation: PTI)

A false sense of security is worse than no security at all, see --disclaimer

With manually enabled IBRS, I still can't deploy SHE as being blocked by:
https://bugzilla.redhat.com/show_bug.cgi?id=1551289
https://bugzilla.redhat.com/show_bug.cgi?id=1551291

Comment 19 Nikolai Sednev 2018-03-20 18:28:17 UTC
I've manually cast echo 2 > /sys/kernel/debug/x86/ibrs_enabled and deployed Node 0 over NFS on these components on host:
[ INFO  ] Hosted Engine successfully deployed

ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.13-1.el7ev.noarch
rhvm-appliance-4.2-20180202.0.el7.noarch
Linux 3.10.0-861.el7.x86_64 #1 SMP Wed Mar 14 10:21:01 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Works for me, moving to verified.

Comment 20 Sandro Bonazzola 2018-03-29 11:16:49 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.