Bug 2061319

Summary: [update] 16.1 when adding a tsx kernel flag during update, the compute node reboot.
Product: Red Hat OpenStack Reporter: Sofer Athlan-Guyot <sathlang>
Component: tripleo-ansibleAssignee: Bogdan Dobrelya <bdobreli>
Status: CLOSED ERRATA QA Contact: Jason Grosso <jgrosso>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.1 (Train)CC: bdobreli, jamsmith, jgrosso, jpretori, kchamart, slinaber, spower
Target Milestone: z8Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tripleo-ansible-0.5.1-1.20220114163453.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-24 11:03:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2052411    

Description Sofer Athlan-Guyot 2022-03-07 11:39:34 UTC
Description of problem:

First discovered it that bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=2052411.

Running an update of 16.1 where the Overcloud was deployed without tsx
flag, and adding it during update cause the Compute to reboot during
update.

This shouldn't happen, and check should be in place to prevent reboot
during update for any kernel change as the reboot should be defered to
after the update.

This is what we see during "openstack overcloud udpate run --limit
Compute --playbook all"


   2022-03-03 17:10:11 | TASK [tripleo-kernel : Reboot after kernel args update] ************************
   2022-03-03 17:10:11 | Thursday 03 March 2022  17:09:10 +0000 (0:00:00.094)       0:07:25.648 ********
   2022-03-03 17:10:11 | changed: [compute-1] => {"changed": true, "elapsed": 60, "rebooted": true}
   2022-03-03 17:21:11 |
   2022-03-03 17:21:11 | changed: [compute-0] => {"changed": true, "elapsed": 714, "rebooted": true}
   2022-03-03 17:21:11 |
   2022-03-03 17:21:11 |
   2022-03-03 17:21:11 | TASK [tripleo-kernel : Skipping reboot for deployed node] **********************
   2022-03-03 17:21:11 | Thursday 03 March 2022  17:21:05 +0000 (0:11:55.821)       0:19:21.469 ********
   2022-03-03 17:21:11 | skipping: [compute-0] => {}
   2022-03-03 17:21:11 | skipping: [compute-1] => {}


From the var/log/message of each compute we see the reboot there:

compute-1 starts with:

   Mar  3 06:04:51 compute-1 kernel: Linux version 4.18.0-193.68.1.el8_2.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Thu Sep 30 10:22:20 EDT 2021
   Mar  3 06:04:51 compute-1 kernel: Command line: BOOT_IMAGE=(hd0,msdos2)/boot/vmlinuz-4.18.0-193.68.1.el8_2.x86_64 root=UUID=95bf4a02-df9a-4253-81b4-0987196cbb9f ro console=ttyS0 console=ttyS0,115200n81 no_timer_check crashkernel=auto rhgb quiet

and reboot with:

   Mar  3 17:09:28 compute-1 kernel: Linux version 4.18.0-193.75.1.el8_2.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Tue Feb 8 09:21:43 EST 2022
   Mar  3 17:09:28 compute-1 kernel: Command line: BOOT_IMAGE=(hd0,msdos2)/boot/vmlinuz-4.18.0-193.75.1.el8_2.x86_64 root=UUID=95bf4a02-df9a-4253-81b4-0987196cbb9f ro console=ttyS0 console=ttyS0,115200n81 no_timer_check crashkernel=auto rhgb quiet tsx=on
   


Version-Release number of selected component (if applicable):

This is an update of 16.1 from z7 to passed_phase2 (RHOS-16.1-RHEL-8-20220225.n.1 puddle)

   2022-03-03T14:07:10Z SUBDEBUG Upgrade: tripleo-ansible-0.5.1-1.20220114163452.902c3c8.el8ost.noarch
   2022-03-03T14:07:11Z SUBDEBUG Upgraded: tripleo-ansible-0.5.1-1.20210713143309.el8ost.noarch


How reproducible: Always.


Steps to Reproduce:
1. deploys without any tsx kernel flag
2. update flags during update
3. see the compute being rebooted during update.

Actual results: Compute reboots during update when one update any kernel flag.


Expected results: Compute shouldn't reboot during update when kernel flags are updated


Additional info: well, nothing else.

Comment 17 errata-xmlrpc 2022-03-24 11:03:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.8 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0986