Description of problem: Hi, Updating OSP16.1 to OSP16.2 require the new kernel flag TSX=on so that we are able to migrate vms[1]. Enabling the new parameter has the side effect of rebooting the compute node *during* update: this cause a first data plane (ping loss) cut. After the data plane doesn't recover. To workaround this issue we need to add a preparation step: the /etc/default/grub file must already contain the tsx parameter before update. Somehow related to https://bugzilla.redhat.com/show_bug.cgi?id=1923165 but for update. [1] See "Minor update: from RHOSP-16.1 to RHOSP-16.2" there https://access.redhat.com/solutions/6036141
Hey @skramaja, do you think there's a way to take this situation into account by mangling https://opendev.org/openstack/tripleo-ansible/src/branch/stable/train/tripleo_ansible/roles/tripleo-kernel/tasks/kernelargs.yml#L21-L27 ? I think the logic would be too complicated, but I'd rather have your input on this. The problem now is that customer have to manually update /etc/default/grub before running an update from 16.1 to 16.2. And that could be a lot of nodes. Thanks,
Hi @vgrosu, I've added some more action to be done for update from 16.1 to 16.2 in [1]. Basically one has to do this: On every node from compute role: grep tsx /etc/default/grub || sed -ie "s/rhgb/rhgb tsx=on/" /etc/default/grub and then make sure to update their templates to have: parameter_defaults: ComputeParameters: KernelArgs: "tsx=on" we need a section in the warning part of the update from 16.1 to 16.2. Thanks, [1] https://access.redhat.com/node/6036141/draft
Hi, so the previous solution didn't solve it, we need : echo "#TRIPLEO_HEAT_TEMPLATE_KERNEL_ARGS" |sudo tee -a /etc/default/grub to be run on every node of the compute role to prevent the reboot during update. I've updated https://access.redhat.com/node/6036141/draft accordingly.
Hi Sofer, I've opened BZ#1975450 to track all the docs changes required for the 16.2 Keeping Red Hat OpenStack Platform Updated guide. I'll update this ticket with details of the docs draft. Many thanks, Vlada
It's unclear to me why this is a TestOnly item - yet it's assigned - can you clarify?
Applying the tsx flag is left at the discretion of the user, there's no automation, but there is a validation though. It's described as a manual procedure in a already existing kb article, so it was assigned to sathlang to test/improve the existing kb article.
Started the review. Unrelated but I noticed: BZ#1872404 - restarting nodes in parallel while maintaining quorum creates an unexpected node shutdown Until this issue is resolved, for nodes based on composable roles, you must update the Database role first, before you can update Controller, Messaging, Compute, Ceph, and other roles. I'm not sure where it comes from but doesn't look right to me. Do you want another bz for this?
Hi Sofer, Thank you for the review. I'll address your comments shortly. Regarding BZ#1872404 - the bug is still open and it looks like you've reported it. However, if this instruction to update the nodes in a specific sequence is wrong, I'd be happy to remove it. (In reply to Sofer Athlan-Guyot from comment #9) > Started the review. > > Unrelated but I noticed: > > BZ#1872404 - restarting nodes in parallel while maintaining quorum creates > an unexpected node shutdown > Until this issue is resolved, for nodes based on composable roles, you > must update the Database role first, before you can update Controller, > Messaging, Compute, Ceph, and other roles. > > I'm not sure where it comes from but doesn't look right to me. Do you want > another bz for this? I have opened BZ#1975450 to update the doc for 16.2 so we can use that bug to track this change. Please feel free to comment on it. Many thanks, Vlada
Adding 2 patches to help with this issue: - 801518 will give the operator the possibility to opt-out automated reboots no matter what. - 801509 will prevent a reboot from the nodes when the only kernelargs added was tsx=xxx and node is already provisioned (validating the presence of nova_libvirt and nova.conf).
Hi @vgrosu , so we the patch mentioned above we have to modify the kcs article and remove the manual modification of the /etc/default/grub configuration. @dvalleed, do you need help with testing the patch mentioned above. I'm on pto for some time, but I can prep the workaround and start testing this, @mciecier should be able to validate them.
Hi folks, I can confirm I've removed the manual steps to edit `/etc/default/grub` configuration from the solution article: https://access.redhat.com/solutions/6036141 as requested in comment #11 and comment #14. Thank you.
*** Bug 1993299 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:3483