Bug 1970949

Summary: [OSP-16.1] [Upgrades][TripleO] Add a config to disable Intel "TSX" on RHEL-8.3 kernel
Product: Red Hat OpenStack Reporter: David Vallee Delisle <dvd>
Component: openstack-tripleo-heat-templatesAssignee: David Vallee Delisle <dvd>
Status: CLOSED DUPLICATE QA Contact: Joe H. Rahme <jhakimra>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: dvd, jhakimra, jpretori, kchamart, mburns, mschuppe
Target Milestone: z7Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1923165 Environment:
Last Closed: 2021-07-12 14:53:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: Train
Embargoed:
Bug Depends On: 1923165, 2002346    
Bug Blocks:    

Description David Vallee Delisle 2021-06-11 14:31:47 UTC
Since Intel June 2021 Microcode Update [1]

One of the things that stand out here is the "Transactional
Synchronization Extension (TSX) Deprecation" section, which says:

    "... update disables TSX by default on some platforms"

Which is most likely to mean:

    The TSX flags ('hle' and 'rtm') won't show up in /proc/cpuinfo
    anymore on hosts that have applied this CPU microcode.

But we need to test it to find it out.

[1] https://access.redhat.com/articles/6101171#transactional-synchronization-extension-tsx-deprecation-15

+++ This bug was initially created as a clone of Bug #1923165 +++

Description
-----------

Fast-forward upgrade from OSP-13 (RHEL-7.9) to OSP-16.2 (RHEL-8.3)
fails[1] during live migration with:

    [...] libvirt.libvirtError: operation failed: guest CPU doesn't
    match specification: missing features: hle,rtm

The failure is due to RHEL-8.3 (destination host) disabling an Intel
"TSX".  And disabling TSX disables the 'hle' and 'rtm' features.

This was discovered during OSP fast-forward upgrades testing[+] where a
guest was being live-migrated from RHEL-7.9 (with TSX=on) to RHEL-8.3
(breaking change: TSX=off), and the migration failed with the
above-mentioned error.

[+] https://bugzilla.redhat.com/show_bug.cgi?id=1921070#c14 — Live
    migration during OSP16.2 hybrid state from RHEL7.9 to RHEL8.3 not
    working


Why?
----

RHEL-8.3 kernel disabled Intel TSX by default, because it is considered
a potential security risk:

    https://bugzilla.redhat.com/show_bug.cgi?id=1828642
    kernel: Disable Intel TSX by default on newer CPUs

Still, it is not acceptable for RHEL-8.3 kernel to break user-space in a
minor RHEL release.  (See also:
https://bugzilla.redhat.com/show_bug.cgi?id=1921070#c16)


Workaround for OSP upgrades
---------------------------

This is unpalatable, but unfortunately there's no other option currently:

(1) have a TripleO config attribute that will enable TSX on the
    destination RHEL-8.3 host; set the following in /etc/default/grub:

        GRUB_CMDLINE_LINUX_DEFAULT="[...] tsx=on" 

    ... and reboot the 8.3 host;

(2) live-migrate the guests from RHEL-7.9 to the RHEL-8.3;

(3) now turn off TSX on the RHEL-8.3 host kernel command-line;
    shutdown the guests;

(4) reboot the 8.3 host again, and start the guests

--- Additional comment from errata-xmlrpc on 2021-06-04 01:48:24 EDT ---

This bug has been added to advisory RHEA-2020:66969 by Shreshtha Joshi (shrjoshi)

--- Additional comment from errata-xmlrpc on 2021-06-04 01:48:25 EDT ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHEA-2020:66969-01
https://errata.devel.redhat.com/advisory/66969

--- Additional comment from errata-xmlrpc on 2021-06-04 01:48:59 EDT ---

This bug has been added to advisory RHEA-2020:66969 by Shreshtha Joshi (shrjoshi)

--- Additional comment from Miguel Garcia on 2021-06-04 06:08:29 EDT ---

Moving to MODIFIED as not all fixed-in-versions are present in RHOS-16.2-RHEL-8-20210525.n.0