Bug 1903247
Summary: | Cannot build a RHEL 8.3 system via Satellite Full Host Bootdisk or Discovery kexec | |||
---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Sayan Das <saydas> | |
Component: | Provisioning | Assignee: | Lukas Zapletal <lzap> | |
Status: | CLOSED ERRATA | QA Contact: | Roman Plevka <rplevka> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 6.7.0 | CC: | ajambhul, alsouza, avroy, ben.formosa, dleroux, inecas, jpasqual, ktordeur, lzap, mmccune, momran, mschibli, patalber, pdwyer, rbeyel, sadas, sshtein, tonay | |
Target Milestone: | 6.9.0 | Keywords: | Regression | |
Target Release: | Unused | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1908850 1919400 (view as bug list) | Environment: | ||
Last Closed: | 2021-04-21 13:24:18 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1904099 | |||
Bug Blocks: |
Description
Sayan Das
2020-12-01 17:09:57 UTC
The problem is that, apparently, there was a change in the kernel (?) and the BOOTIF parameter, in 8.3 does not accept the initial "00-" that is used in basically all our templates. Example, the following bootline works in 8.2 (and previous) but does not work in 8.3: ~~~ initrd=boot/red-hat-enterprise-linux-8-for-x86_64-baseos-kickstart-8-3-100-initrd.img ks=http://sat67.example.com/unattended/provision?token=9d0ca7d6-4369-40e8-adf6-04c58a420655 network ksdevice=bootif ks.device=bootif BOOTIF=00-52-54-00-84-87-51 kssendmac ks.sendmac inst.ks.sendmac ~~~ Changing the parameter BOOTIF to BOOTIF=52-54-00-84-87-51 (removing initial "00-" it works as expected). We can workaround that on Satellite making changes in the provisioning templates, so they don't add the "00-" when the operating system is RHEL 8.3+. I tested with RHEL 8.2 and it works both ways (with the 00- and without it). RHEL 8.3 only works without it. So far, these are the templates which I identified as needing to be addressed: - Kickstart default PXELinux (will workaround the use case reported in this BZ) - Discovery Red Hat kexec (will workaround the use case of FDI): Example of change for Kickstart default PXELinux: ~~~ (...) if mac if os_major == 8 and os_minor > 2 bootif = mac.gsub(':', '-') options.push("BOOTIF=#{bootif}") else bootif = '00-' + mac.gsub(':', '-') options.push("BOOTIF=#{bootif}") end end (...) ~~~ Example of change for Discovery Red Hat kexec: ~~~ (...) mac = @host.facts['discovery_bootif'] if mac if @host.operatingsystem.major.to_i == 8 and @host.operatingsystem.minor.to_i > 2 bootif = mac.gsub(':', '-') else bootif = '00' + mac.gsub(':', '-') end end (...) ~~~ Hello Joniel, If I am not wrong then during normal pxebooting also, the 00 gets appended in front of Mac in bootif section. If what you are saying that is correct, then the network based installation of rhel 8.3 also should have got failed right or I am missing something here as in both scenarios the same template will be used ?? -- Sayan Hello Sayan Running some tests using PXE booting, you're right about using the same template. However (and I don't know all the details about it) when booting using PXE a second BOOTIF parameter was automatically appended to my kernel line: BOOTIF=01-52-54-00-84-87-51 and the provisioning worked. If I remove (manually) this second BOOTIF parameter it fails with the same error as when using the bootdisk. I'm doing some research to understand what that 00- or 01- is used for. (In reply to Joniel Pasqualetto from comment #3) > Hello Sayan > > Running some tests using PXE booting, you're right about using the same > template. However (and I don't know all the details about it) when booting > using PXE a second BOOTIF parameter was automatically appended to my kernel > line: BOOTIF=01-52-54-00-84-87-51 and the provisioning worked. > > If I remove (manually) this second BOOTIF parameter it fails with the same > error as when using the bootdisk. I'm doing some research to understand what > that 00- or 01- is used for. Hello Joniel, Yes, this the only difference I have noticed with Network-based and Full Host based deployments i.e. during network-based PXE, when i edit the PXEmenu to see kernel options, it has an additional BOOTIF parameter appended starting with 01 but for the same MAC. When I boot with Full Host Image, if I halt the menu, edit the same and add the additional BOOTIF=01-XX.XX.XX.XX.XX there , the build works just fine for RHEL 8.3 as well. I am also checking to find out what that option does actually. -- Sayan The second BOOTIF is automatically added because of the option "IPAPPEND 2" from the template. This option automatically adds the BOOTIF with the information about the interface used to boot. That's why it works with PXE but does not work by bootdisk or kexec. From the installation guide: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/performing_an_advanced_rhel_installation/index#network_kickstart-commands-for-network-configuration ~~~ Set IPAPPEND 2 in your pxelinux.cfg file to have pxelinux set the BOOTIF variable. ~~~ (In reply to Joniel Pasqualetto from comment #5) > The second BOOTIF is automatically added because of the option "IPAPPEND 2" > from the template. This option automatically adds the BOOTIF with the > information about the interface used to boot. That's why it works with PXE > but does not work by bootdisk or kexec. > > From the installation guide: > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/ > html-single/performing_an_advanced_rhel_installation/index#network_kickstart- > commands-for-network-configuration > > ~~~ > Set IPAPPEND 2 in your pxelinux.cfg file to have pxelinux set the BOOTIF > variable. > ~~~ Joniel, This is from one of the Full Host Images of mine for RHEL 8.3 ~~ $ cat /mnt/isolinux.cfg # This file was deployed via 'Unnamed' template DEFAULT menu MENU TITLE Booting into OS installer (ESC to stop) TIMEOUT 100 ONTIMEOUT installer LABEL installer MENU LABEL Unnamed KERNEL BOOT/OS_KICKSTART_8_3_105_VMLINUZ APPEND initrd=BOOT/KICKSTART_8_3_105_INITRD_IMG ks=http://satellite.example.com/unattended/provision?token=81bee514-b909-497a-8268-7bcdffd54709 network ksdevice=bootif ks.device=bootif BOOTIF=00-00-50-56-80-8e-1c kssendmac ks.sendmac inst.ks.sendmac IPAPPEND 2 ~~ As I can see that "IPAPPEND 2" is mentioned in the isolinux configuration of the image as well, So logically while booting from Full Host image, shouldn't that option be also adding the second BOOTIF parameter similar to what happens in PXE ? -- Sayan No, because the "IPAPPEND 2" sets the bootif parameter to the NIC used to boot. When you boot using the bootdisk, you didn't use a NIC to boot. Therefore, it's empty when using bootdisk. I also found what is that 01- used for. It's supposed to be the ARP code for the hardware type. See here: https://wiki.syslinux.org/wiki/index.php?title=SYSLINUX ~~~ (...) 2: An option of the following format should be generated, in dash-separated hexadecimal with leading hardware type (same as for the configuration file; see doc/pxelinux.txt), and added to the kernel command line, allowing an initrd program to determine from which interface the system booted (empty for non-PXELINUX variants): (...) ~~~ However, I don't know why we were using 00- as the code for ethernet is 1. See here: https://www.iana.org/assignments/arp-parameters/arp-parameters.xhtml#arp-parameters-2 Excellent work by Joniel, there's nothing much to add here. I came to the very same conclusion after reading an email from Satyajit. The big question is why this is broken in RHEL 8.3. I am looking into the dracut (upstream) and there is no such change, the code https://github.com/dracutdevs/dracut/blob/master/modules.d/40network/net-lib.sh#L223-L233 The code hasn't been changed for about 8 years now: https://github.com/dracutdevs/dracut/blame/master/modules.d/40network/net-lib.sh Maybe there is some completely from-scratch rewrite, let's see. I am going to file a BZ on RHEL and associate it with this one. Does the workaround mentioned by Joniel work for your customers? We need to buy some time until RHEL team can investigate this. Also fix will probably take longer if it's a bug in RHEL, honestly I don't think they should be introducing such a breaking change, this can be easily detected and ingored. Hello Lukas, I can get RHEL 8.3 worked in both of these ways. 1. I either need to have "BOOTIF=00-50-56-80-8e-1c" instead of having the additional "00-" appended in-front of the MAC. [ this is what joniel had suggested and can be applied easily ] Or, 2. I need to add another "BOOTIF=01-50-56-80-8e-1c" i.e. starting with "01-" at the end of the line after halting the PXEmenu while booting from Full Host Image. I tested both to be working but I prefer the change in the following segment in "Kickstart default PXELinux" from, ### options = ["network", "ksdevice=bootif", "ks.device=bootif"] if mac bootif = '00-' + mac.gsub(':', '-') options.push("BOOTIF=#{bootif}") end ### To, ## options = ["network", "ksdevice=bootif", "ks.device=bootif"] if mac if os_major == 8 and os_minor > 2 bootif = mac.gsub(':', '-') options.push("BOOTIF=#{bootif}") else bootif = '00-' + mac.gsub(':', '-') options.push("BOOTIF=#{bootif}") end end ## I am positive this will work for customer's as well. -- Sayan Upstream discussion: https://community.theforeman.org/t/bootdisk-and-discovery-kexec-broken-for-el-8-3/21554 Upstream bug assigned to lzap Upstream bug assigned to lzap Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/31452 has been resolved. Hello Sayan, that's a good point in older versions this snippet might not exist. However quick search shows only one place: [lzap@x1 foreman]$ g checkout SATELLITE-6.8.0 Already on 'SATELLITE-6.8.0' Your branch is up to date with 'origin/SATELLITE-6.8.0'. [lzap@x1 foreman]$ ag "'00-'" app/views/unattended/provisioning_templates/snippet/kickstart_kernel_options.erb 26: bootif = '00-' + mac.gsub(':', '-') I don't see "bootif" in Satellite 6.8 template for PXEGrub2: [lzap@x1 foreman]$ grep bootif ./app/views/unattended/provisioning_templates/PXEGrub2/kickstart_default_pxegrub2.erb However for 6.7 or older there are multiple places to fix, yeah: [lzap@x1 foreman]$ ag "'00-'" app/views/unattended/provisioning_templates/PXELinux/kickstart_default_pxelinux.erb 25: bootif = '00-' + mac.gsub(':', '-') app/views/unattended/provisioning_templates/PXEGrub/kickstart_default_pxegrub.erb 20: bootif = '00-' + mac.gsub(':', '-') app/views/unattended/provisioning_templates/PXEGrub2/kickstart_default_pxegrub2.erb 25: bootif = '00-' + mac.gsub(':', '-') For the record, NM will be updated to also accept 00- next to 01- so the final solution for Satellite is to use 01- instead of 00- for EL 8.3+. The fix for this bugzilla is in an early Satellite 6.9 SNAP; therefore, aligning to release and updating state. Verified on sat6.9.0-12.0 i was able to successfully provision a rhel8.3 host using the following flow: - sync rhel8.3 repos - add vmware as a compute resource - create new host using the compute resource and "boot disk" provisioning method - satellite generates the full host boot disk iso and automatically uploads it to vmware store + attaches it to the newly created VM - machine successfully booted the iso and loaded all the files via ipxe - anaconda successfully installed all the packages - RHEL8.3 successfully booted up after reboot. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Satellite 6.9 Release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1313 |