Bug 2236693

Summary: RHEL9 provisioning on Libvirt fails with error - Boot Error -[ end Kernel] panic - not syncing: Attempting to kill init!
Product: Red Hat Satellite Reporter: Gaurav Talreja <gtalreja>
Component: Compute Resources - libvirtAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: Gaurav Talreja <gtalreja>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.14.0CC: ahumbe, ekohlvan, lstejska, nalfassi, rlavi
Target Milestone: 6.15.0Keywords: Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: rubygem-fog-libvirt-0.12.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-04-23 17:14:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gaurav Talreja 2023-09-01 09:16:25 UTC
Description of problem:


Version-Release number of selected component (if applicable):
Satellite 6.14.0 Snap 13

How reproducible:
Always

Steps to Reproduce:
1. Prepare provisioning setup and sync RHEL9 OS and Kickstart repos.
2. Configure Libvirt CR
3. # hammer host create --name="pbvlzmmshx" --location="SEXSGiSTJBd" --organization="kDurvTWwY" --hostgroup="RJvMlS" --compute-resource-id="1" --compute-attributes="cpus=1, memory=6442450944, cpu_mode=default, start=1" --interface="compute_type=bridge,compute_bridge=br-1001" --volume="capacity=10" --provision-method="build" '

Actual results:
RHEL9 provisioning on Libvirt fails with Boot Errors

Expected results:
RHEL9 provisioning on Libvirt completes successfully.

Additional info:
Looks related error https://access.redhat.com/discussions/6968806 and this PR also look related https://github.com/fog/fog-libvirt/pull/127

After some investigation and discussion with @ekohlvan, this issue looks with cpu_mode=default which sets "custom (qemu64)" on Libvirt and when tried other cpu modes like host-model(set as custom (EPYC-Rome)) and host passthrough(set as host passthrough) this works correctly.
Also, For RHEL7 and RHEL8 provisioning cpu_mode=default custom(qemu64) works.

Comment 3 Brad Buckingham 2023-10-30 11:29:29 UTC
Bulk setting Target Milestone = 6.15.0 where sat-6.15.0+ is set.

Comment 4 Leos Stejskal 2023-11-24 08:39:48 UTC
> After some investigation and discussion with @ekohlvan, this issue looks with cpu_mode=default which sets "custom (qemu64)" on Libvirt and when tried other cpu modes 
> like host-model(set as custom (EPYC-Rome)) and host passthrough(set as host passthrough) this works correctly.
> Also, For RHEL7 and RHEL8 provisioning cpu_mode=default custom(qemu64) works.

Does this means that for the fix in Foreman we need it release new fog-libvirt and
bump dependency in the Foreman core?

Comment 5 Ewoud Kohl van Wijngaarden 2023-11-24 09:13:37 UTC
I have done a release and packaged it in upstream nightly.

Comment 9 Ewoud Kohl van Wijngaarden 2023-12-19 11:58:01 UTC
I hope you don't mind me making the reply public, because I think this design part should be openly discussed.

(In reply to Gaurav Talreja from comment #7)
> RHEL9 provisioning works (for BIOS), when using default cpu_mode which now
> sets cpu type as host passthrough and emulated machine as pc-q35-rhel8.6.0
> on Libvirt, I'd like ask a few observations/questions regarding this,
>  
> 1. As default cpu_mode which now sets cpu type as host passthrough, and when
> using host-passthrough cpu_mode which also sets cpu type as host passthrough
> on Libvirt, so should we keep this cpu_mode=host=passthrough for RHEL9 in
> Libvirt CR, and is this redundant option expected?

Excellent question. I didn't realize that part.

It's this part that defines CPU_MODES as either default, host-model or host-passthrough:

https://github.com/theforeman/foreman/blob/70b2144263dc6c3a1efdd5f26d16952b2e4266c3/app/models/compute_resources/foreman/model/libvirt.rb#L8

Now that the default is passthrough, I wonder if there's still a case for exposing the CPU type at all.

The QEMU documentation on the topic is https://qemu-project.gitlab.io/qemu/system/qemu-cpu-models.html and that recommends using passthrough. So now we're (finally) in line with the recommendation. The only downside is that live migration is not supported, but that's quite specialized and not something we generally support anyway.

After reading that, I'd suggest we drop the option altogether. This simplifies the UI a bit without noticeable loss of functionality.

I've raised this upstream: https://community.theforeman.org/t/drop-cpu-type-option-from-libvirt-compute-resource-provisioning/36267

> 2. RHEL9 provisioning with UEFI still doesn't work with "No bootable device"
> error now and mentioned kernel panic errors aren't seen. And, Libvirt sets
> the firmware BIOS instead of UEFI when PXELoader "Grub2 UEFI" is selected.

I think this is a known limitation now. The PXE Loader has no implication on how the (libvirt) VM is created. It's totally fair to assume to do so, but no code has been written to do so. I've discussed this before, so perhaps a separate RFE is needed.

Comment 10 Ewoud Kohl van Wijngaarden 2023-12-20 14:29:57 UTC
(In reply to Ewoud Kohl van Wijngaarden from comment #9)
> (In reply to Gaurav Talreja from comment #7)
> > 2. RHEL9 provisioning with UEFI still doesn't work with "No bootable device"
> > error now and mentioned kernel panic errors aren't seen. And, Libvirt sets
> > the firmware BIOS instead of UEFI when PXELoader "Grub2 UEFI" is selected.
> 
> I think this is a known limitation now. The PXE Loader has no implication on
> how the (libvirt) VM is created. It's totally fair to assume to do so, but
> no code has been written to do so. I've discussed this before, so perhaps a
> separate RFE is needed.

The inability to create UEFI VMs using fog-libvirt is tracked in https://github.com/fog/fog-libvirt/issues/128. Depending on the implementation, it may need additional work in Foreman to provide a selection.

Comment 11 Ewoud Kohl van Wijngaarden 2023-12-22 11:08:52 UTC
(In reply to Ewoud Kohl van Wijngaarden from comment #10)
> The inability to create UEFI VMs using fog-libvirt is tracked in
> https://github.com/fog/fog-libvirt/issues/128. Depending on the
> implementation, it may need additional work in Foreman to provide a
> selection.

I decided to play with this and wrote https://github.com/fog/fog-libvirt/pull/134 to implement it in fog-libvirt. Turns out UEFI and Secure Boot are very easy in the libvirt XML specification.

https://github.com/theforeman/foreman/pull/9965 implements it in Foreman, though I didn't test the UI yet and there is no Secure Boot support yet. By default the firmware (BIOS/UEFI) is derived from the PXE loader, still defaulting to BIOS if no loader is specified.

Comment 12 Gaurav Talreja 2024-01-16 07:45:51 UTC
Verified.

Tested on Satellite 6.15.0 Snap 5.0
Version: rubygem-fog-libvirt-0.12.0-1.el8sat.noarch

Steps:
1. Prepare Satellite with provisioning setup and sync RHEL9 OS and Kickstart repos.
2. Configure Libvirt CR
3. # hammer host create --name="pbvlzmmshx" --location="SEXSGiSTJBd" --organization="kDurvTWwY" --hostgroup="RJvMlS" --compute-resource-id="1" --compute-attributes="cpus=1, memory=6442450944, cpu_mode=default, start=1" --interface="compute_type=bridge,compute_bridge=br-1001" --volume="capacity=10" --provision-method="build" '

Observation:
Provisioning works successfully for RHEL9 on Libvirt CR, and default cpu_mode is set as host-passthrough
I've reported the issues discussed in comment 7, and for UEFI support we got an RFE created by SD/CU already 
https://bugzilla.redhat.com/show_bug.cgi?id=2256234
https://bugzilla.redhat.com/show_bug.cgi?id=2258481


Thanks @ekohlvan for confirming this issues. I greatly appreciate your prompt action with community discussions and PRs

Comment 15 errata-xmlrpc 2024-04-23 17:14:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.15.0 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:2010