Bug 1970949 - [OSP-16.1] [Upgrades][TripleO] Add a config to disable Intel "TSX" on RHEL-8.3 kernel
Summary: [OSP-16.1] [Upgrades][TripleO] Add a config to disable Intel "TSX" on RHEL-8....
Keywords:
Status: CLOSED DUPLICATE of bug 1981432
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z7
: ---
Assignee: David Vallee Delisle
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On: 1923165 2002346
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-11 14:31 UTC by David Vallee Delisle
Modified: 2022-08-16 08:36 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1923165
Environment:
Last Closed: 2021-07-12 14:53:28 UTC
Target Upstream Version: Train
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-5134 0 None None None 2022-08-16 08:35:59 UTC
Red Hat Issue Tracker UPG-3326 0 None None None 2021-09-08 15:23:53 UTC

Description David Vallee Delisle 2021-06-11 14:31:47 UTC
Since Intel June 2021 Microcode Update [1]

One of the things that stand out here is the "Transactional
Synchronization Extension (TSX) Deprecation" section, which says:

    "... update disables TSX by default on some platforms"

Which is most likely to mean:

    The TSX flags ('hle' and 'rtm') won't show up in /proc/cpuinfo
    anymore on hosts that have applied this CPU microcode.

But we need to test it to find it out.

[1] https://access.redhat.com/articles/6101171#transactional-synchronization-extension-tsx-deprecation-15

+++ This bug was initially created as a clone of Bug #1923165 +++

Description
-----------

Fast-forward upgrade from OSP-13 (RHEL-7.9) to OSP-16.2 (RHEL-8.3)
fails[1] during live migration with:

    [...] libvirt.libvirtError: operation failed: guest CPU doesn't
    match specification: missing features: hle,rtm

The failure is due to RHEL-8.3 (destination host) disabling an Intel
"TSX".  And disabling TSX disables the 'hle' and 'rtm' features.

This was discovered during OSP fast-forward upgrades testing[+] where a
guest was being live-migrated from RHEL-7.9 (with TSX=on) to RHEL-8.3
(breaking change: TSX=off), and the migration failed with the
above-mentioned error.

[+] https://bugzilla.redhat.com/show_bug.cgi?id=1921070#c14 — Live
    migration during OSP16.2 hybrid state from RHEL7.9 to RHEL8.3 not
    working


Why?
----

RHEL-8.3 kernel disabled Intel TSX by default, because it is considered
a potential security risk:

    https://bugzilla.redhat.com/show_bug.cgi?id=1828642
    kernel: Disable Intel TSX by default on newer CPUs

Still, it is not acceptable for RHEL-8.3 kernel to break user-space in a
minor RHEL release.  (See also:
https://bugzilla.redhat.com/show_bug.cgi?id=1921070#c16)


Workaround for OSP upgrades
---------------------------

This is unpalatable, but unfortunately there's no other option currently:

(1) have a TripleO config attribute that will enable TSX on the
    destination RHEL-8.3 host; set the following in /etc/default/grub:

        GRUB_CMDLINE_LINUX_DEFAULT="[...] tsx=on" 

    ... and reboot the 8.3 host;

(2) live-migrate the guests from RHEL-7.9 to the RHEL-8.3;

(3) now turn off TSX on the RHEL-8.3 host kernel command-line;
    shutdown the guests;

(4) reboot the 8.3 host again, and start the guests

--- Additional comment from errata-xmlrpc on 2021-06-04 01:48:24 EDT ---

This bug has been added to advisory RHEA-2020:66969 by Shreshtha Joshi (shrjoshi)

--- Additional comment from errata-xmlrpc on 2021-06-04 01:48:25 EDT ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHEA-2020:66969-01
https://errata.devel.redhat.com/advisory/66969

--- Additional comment from errata-xmlrpc on 2021-06-04 01:48:59 EDT ---

This bug has been added to advisory RHEA-2020:66969 by Shreshtha Joshi (shrjoshi)

--- Additional comment from Miguel Garcia on 2021-06-04 06:08:29 EDT ---

Moving to MODIFIED as not all fixed-in-versions are present in RHOS-16.2-RHEL-8-20210525.n.0


Note You need to log in before you can comment on or make changes to this bug.