Bug 1772032

Summary: [RFE] Allow ability to disable individual CPU flags via `cpu_model_extra_flags`
Product: Red Hat OpenStack Reporter: Kashyap Chamarthy <kchamart>
Component: openstack-novaAssignee: Kashyap Chamarthy <kchamart>
Status: CLOSED ERRATA QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: medium Docs Contact:
Priority: medium    
Version: 17.0 (Wallaby)CC: dasmith, egallen, eglynn, igallagh, jhakimra, jparker, jschluet, kchamart, mariel, mwitt, sbauza, sgordon, stephenfin, vromanso
Target Milestone: gaKeywords: FutureFeature, Reopened, Triaged
Target Release: 17.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-nova-23.2.1-0.20220615000408.1e37f2c.el9ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1938204 (view as bug list) Environment:
Last Closed: 2022-09-21 12:07:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: wallaby
Embargoed:
Bug Depends On:    
Bug Blocks: 1938204    

Description Kashyap Chamarthy 2019-11-13 13:46:37 UTC
What?
-----

When using a custom CPU model, Nova currently allows enabling
individual CPU flags/features via the config attribute,
`cpu_model_extra_flags`:

    [libvirt]
    cpu_mode=custom
    cpu_model=IvyBridge
    cpu_model_extra_flags="pcid,ssbd, md-clear"

The above only lets you enable the CPU features.  This RFE is to also
allow _disabling_ individual CPU features.


Why?
---

A couple of reasons:

  - An Operator wants to generate a baseline CPU config (that facilates
    live migration) across his Compute node pool.  However, a certain
    CPU flag is causing an inteolerable performance issue for their
    guest workloads.  If the Operator isolated the problem to _that_
    specific CPU flag, then she would like to disable the flag.

  - More importantly, a specific CPU flag might trigger a CPU
    vulnerability.  In such a case, the mitigation for it could be to
    simply _disable_ the offending CPU flag.

Allowing disabling of individual CPU flags via Nova would enable the
above use cases.


How?
----

By allowing the notion of '+' / '-' to indicate whether to enable to
disable a given CPU flag.

E.g. if you specify the below in 'nova.conf' (on the Compute nodes):

    [libvirt]
    cpu_mode=custom
    cpu_model=IvyBridge
    cpu_model_extra_flags="+pcid,-mtrr,ssbd"

Then, when you start an instance, Nova should generate the below XML:

    <cpu match='exact'> 
      <model fallback='forbid'>IvyBridge</model>
      <vendor>Intel</vendor>
      <feature policy='require' name='pcid'/>
      <feature policy='disable' name='mtrr'/>
      <feature policy='require' name='ssbd'/>
    </cpu>


Note that the requirement to specify '+' / '-' for individual flags
should be optional.  If neither is specified, then we should assume '+',
and enable the feature (as shown above for the 'ssbd' flag).

Comment 1 Kashyap Chamarthy 2019-11-13 14:08:45 UTC
Upstream blueprint: https://blueprints.launchpad.net/nova/+spec/allow-disabling-cpu-flags

Comment 2 Eduardo Habkost 2019-11-13 21:39:15 UTC
Additional background information:

One possible mitigation for TAA (TSX Asynchronous Abort, CVE-2019-11135)[1][2] is to disable TSX using `tsx=off` in the kernel command line.

However, to be able to use `tsx=off`, customers need the ability to disable TSX in the VM CPU configurations too, otherwise VMs might become unbootable when using `tsx=off`.

[1] https://access.redhat.com/solutions/tsx-asynchronousabort
[2] https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html

Comment 7 Kashyap Chamarthy 2021-03-08 16:19:39 UTC
This is merged upstream, based on these two commits:

(1) https://opendev.org/openstack/nova/commit/2e8e04a — libvirt: Don't drop CPU flags with policy='disable' from guest XML

(2) https://opendev.org/openstack/nova/commit/bcd6b42 — libvirt: Allow disabling CPU flags via `cpu_model_extra_flags`

Comment 18 errata-xmlrpc 2022-09-21 12:07:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543