Bug 2061319 - [update] 16.1 when adding a tsx kernel flag during update, the compute node reboot.
Summary: [update] 16.1 when adding a tsx kernel flag during update, the compute node r...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z8
: 16.1 (Train on RHEL 8.2)
Assignee: Bogdan Dobrelya
QA Contact: Jason Grosso
URL:
Whiteboard:
Depends On:
Blocks: 2052411
TreeView+ depends on / blocked
 
Reported: 2022-03-07 11:39 UTC by Sofer Athlan-Guyot
Modified: 2022-03-24 11:03 UTC (History)
7 users (show)

Fixed In Version: tripleo-ansible-0.5.1-1.20220114163453.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-24 11:03:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-13418 0 None None None 2022-03-07 11:41:42 UTC
Red Hat Product Errata RHBA-2022:0986 0 None None None 2022-03-24 11:03:32 UTC

Description Sofer Athlan-Guyot 2022-03-07 11:39:34 UTC
Description of problem:

First discovered it that bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=2052411.

Running an update of 16.1 where the Overcloud was deployed without tsx
flag, and adding it during update cause the Compute to reboot during
update.

This shouldn't happen, and check should be in place to prevent reboot
during update for any kernel change as the reboot should be defered to
after the update.

This is what we see during "openstack overcloud udpate run --limit
Compute --playbook all"


   2022-03-03 17:10:11 | TASK [tripleo-kernel : Reboot after kernel args update] ************************
   2022-03-03 17:10:11 | Thursday 03 March 2022  17:09:10 +0000 (0:00:00.094)       0:07:25.648 ********
   2022-03-03 17:10:11 | changed: [compute-1] => {"changed": true, "elapsed": 60, "rebooted": true}
   2022-03-03 17:21:11 |
   2022-03-03 17:21:11 | changed: [compute-0] => {"changed": true, "elapsed": 714, "rebooted": true}
   2022-03-03 17:21:11 |
   2022-03-03 17:21:11 |
   2022-03-03 17:21:11 | TASK [tripleo-kernel : Skipping reboot for deployed node] **********************
   2022-03-03 17:21:11 | Thursday 03 March 2022  17:21:05 +0000 (0:11:55.821)       0:19:21.469 ********
   2022-03-03 17:21:11 | skipping: [compute-0] => {}
   2022-03-03 17:21:11 | skipping: [compute-1] => {}


From the var/log/message of each compute we see the reboot there:

compute-1 starts with:

   Mar  3 06:04:51 compute-1 kernel: Linux version 4.18.0-193.68.1.el8_2.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Thu Sep 30 10:22:20 EDT 2021
   Mar  3 06:04:51 compute-1 kernel: Command line: BOOT_IMAGE=(hd0,msdos2)/boot/vmlinuz-4.18.0-193.68.1.el8_2.x86_64 root=UUID=95bf4a02-df9a-4253-81b4-0987196cbb9f ro console=ttyS0 console=ttyS0,115200n81 no_timer_check crashkernel=auto rhgb quiet

and reboot with:

   Mar  3 17:09:28 compute-1 kernel: Linux version 4.18.0-193.75.1.el8_2.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Tue Feb 8 09:21:43 EST 2022
   Mar  3 17:09:28 compute-1 kernel: Command line: BOOT_IMAGE=(hd0,msdos2)/boot/vmlinuz-4.18.0-193.75.1.el8_2.x86_64 root=UUID=95bf4a02-df9a-4253-81b4-0987196cbb9f ro console=ttyS0 console=ttyS0,115200n81 no_timer_check crashkernel=auto rhgb quiet tsx=on
   


Version-Release number of selected component (if applicable):

This is an update of 16.1 from z7 to passed_phase2 (RHOS-16.1-RHEL-8-20220225.n.1 puddle)

   2022-03-03T14:07:10Z SUBDEBUG Upgrade: tripleo-ansible-0.5.1-1.20220114163452.902c3c8.el8ost.noarch
   2022-03-03T14:07:11Z SUBDEBUG Upgraded: tripleo-ansible-0.5.1-1.20210713143309.el8ost.noarch


How reproducible: Always.


Steps to Reproduce:
1. deploys without any tsx kernel flag
2. update flags during update
3. see the compute being rebooted during update.

Actual results: Compute reboots during update when one update any kernel flag.


Expected results: Compute shouldn't reboot during update when kernel flags are updated


Additional info: well, nothing else.

Comment 17 errata-xmlrpc 2022-03-24 11:03:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.8 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0986


Note You need to log in before you can comment on or make changes to this bug.