Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2207926

Summary: [RHOSP17.1] Fail To Boot Modified Overcloud Image
Product: Red Hat OpenStack Reporter: Vadim Khitrin <vkhitrin>
Component: documentationAssignee: Greg Rakauskas <gregraka>
Status: CLOSED MIGRATED QA Contact: RHOS Documentation Team <rhos-docs>
Severity: medium Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: cfontain, ekuris, elicohen, gregraka, jamsmith, jkreger, mariel, mblue, njohnston, ramishra, rdiazcam, sbaker
Target Milestone: asyncKeywords: Triaged
Target Release: 17.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-12-05 11:45:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
fail boot
none
fail boot none

Description Vadim Khitrin 2023-05-17 10:59:17 UTC
Description of problem:
When modifying the shipped overcloud image using virt-customize, we encountered issues booting up this image during provisioning. For example, we are injecting repositories and updating RPMs.

Version-Release number of selected component (if applicable):
Compose: RHOS-17.1-RHEL-9-20230511.n.1

rhosp-director-images-base-17.1-20230509.1.el9ost.noarch
rhosp-director-images-metadata-17.1-20230509.1.el9ost.noarch
rhosp-director-images-ipa-x86_64-17.1-20230509.1.el9ost.noarch
rhosp-director-images-x86_64-17.1-20230509.1.el9ost.noarch
rhosp-director-images-17.1-20230509.1.el9ost.noarch
rhosp-director-images-minimal-17.1-20230509.1.el9ost.noarch
rhosp-director-images-uefi-x86_64-17.1-20230509.1.el9ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Modify overcloud image with `virt-customize` (or other guestfish tools)
2. Start provisioning.

Actual results:
RHEL failing to boot properly on baremetal node.

Expected results:
RHEL boots properly on baremetal node.

Additional info:

Comment 1 Vadim Khitrin 2023-05-17 11:02:11 UTC
Created attachment 1965033 [details]
fail boot

Comment 2 Vadim Khitrin 2023-05-17 11:02:34 UTC
Created attachment 1965034 [details]
fail boot

Comment 3 Vadim Khitrin 2023-05-17 11:04:21 UTC
I believe a kernel was updated in the overcloud image in the attached pictures containing the failure.

Comment 4 Ricardo Diaz 2023-05-17 15:56:49 UTC
It looks like the initramfs for the updated kernel is not present in /boot (ccamposr found this). It looks like the virt-customize doesn't install the kernel-core package when performing the update. A workaround for this can be explicitly installing the kernel-rt with virt-customize.

Comment 5 Julia Kreger 2023-05-19 17:06:32 UTC
(In reply to Ricardo Diaz from comment #4)
> It looks like the initramfs for the updated kernel is not present in /boot
> (ccamposr found this). It looks like the virt-customize doesn't
> install the kernel-core package when performing the update. A workaround for
> this can be explicitly installing the kernel-rt with virt-customize.

I believe that might be intentional on part of RHEL. Regardless of that intent, this doesn't seem to be an actual bug as the issue is the image being created by user invoked virt-customize action. As such, I think this can be closed as not a bug. Please advise.

Comment 6 Vadim Khitrin 2023-05-21 06:55:35 UTC
Hey Julia,

Sounds logical to me that this is most likely the behavior of RHEL RPMs.
Let me follow up on this tomorrow with the folks who commented on this, and we will close this bug if needed.

Comment 7 Steve Baker 2023-05-22 19:37:36 UTC
Could you please provide the exact commands you ran to customise the image? A possible outcome from this bug is documentation which ensures the kernel doesn't get updated during these customise steps.

Comment 8 Ricardo Diaz 2023-05-23 07:28:27 UTC
Hi Steve, this command:

virt-customize -a overcloud-hardened-uefi-full.qcow2 --install kernel-core --selinux-relabel

Comment 9 Vadim Khitrin 2023-05-28 05:50:52 UTC
Ricardo has provided the command.

Comment 11 Steve Baker 2023-06-26 19:47:28 UTC
Setting NEEDINFO to request the full documented steps followed to do this, so we can reproduce it.

Comment 36 Steve Baker 2024-06-30 23:41:23 UTC
I just tried the rt.sh from #35, and it looks like it booted into the rt kernel just fine in a local libvirt VM:

$ uname -a
Linux localhost.localdomain 5.14.0-284.71.1.rt14.356.el9_2.x86_64 #1 SMP PREEMPT_RT Mon Jun 17 11:13:07 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux

We also need to ensure that the subscription-manager calls are correct for the current rhel9.2 rhos17.1 repos. After acquiring the image my full list of steps was:

cp overcloud-hardened-uefi-full.qcow2 overcloud-realtime-compute.qcow2
virt-customize -a overcloud-realtime-compute.qcow2 \
    --run-command 'subscription-manager register --username=[username] --password=[password]' \
    --run-command 'subscription-manager release --set 9.2' \
    --run-command 'subscription-manager repos --enable=rhel-9-for-x86_64-nfv-e4s-rpms'
virt-customize -a overcloud-realtime-compute.qcow2 -v --run rt.sh 2>&1 | tee virt-customize.log

Also, I sourced my image from here[1] and it already has the changed /boot/efi/EFI/redhat/grub.cfg so the "# Temporary workaround before z4" block can be removed if validation happens with this image.

I think it is time to bring in a documentation person to make the actual changes. Who is docs for NFV?

[1]https://sf.hosted.upshift.rdu2.redhat.com/images/redhat9/rhos-17.1/current-tripleo/

Comment 37 Steve Baker 2024-07-01 03:03:54 UTC
I think rt.sh is small enough now that we could justify doing this as a single command, or at least a batch of virt-customize commands using only --run-command.

I have verified that the following results in an image that boots a rt kernel:

virt-customize -a overcloud-realtime-compute.qcow2 \
    --run-command 'subscription-manager register --username=[username] --password=[password] \
    --run-command 'subscription-manager release --set 9.2' \
    --run-command 'subscription-manager repos --enable=rhel-9-for-x86_64-nfv-e4s-rpms' \
    --run-command 'dnf -v -y install kernel-rt kernel-rt-kvm tuned-profiles-nfv-host' \
    --run-command 'sed -i -e "s/UPDATEDEFAULT=.*/UPDATEDEFAULT=yes/g" -e "s/DEFAULTKERNEL=.*/DEFAULTKERNEL=kernel-rt-core/g" /etc/sysconfig/kernel' \
    --run-command 'grubby --set-default /boot/vmlinuz*rt*'

Also note that the --selinux-relabel argument no longer does anything on the virt-customize that is shipped with rhel9.2, so that step isn't required either.

Comment 38 Steve Baker 2024-07-01 03:08:06 UTC
Also NEEDINFOing Mikey to find out who to coordinate with to make this NFV docs change