Bug 1903247 - Cannot build a RHEL 8.3 system via Satellite Full Host Bootdisk or Discovery kexec
Summary: Cannot build a RHEL 8.3 system via Satellite Full Host Bootdisk or Discovery ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Provisioning
Version: 6.7.0
Hardware: x86_64
OS: Linux
high
high vote
Target Milestone: 6.9.0
Assignee: Lukas Zapletal
QA Contact: Roman Plevka
URL:
Whiteboard:
Depends On: 1904099
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-01 17:09 UTC by Sayan Das
Modified: 2021-08-30 13:04 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1908850 1919400 (view as bug list)
Environment:
Last Closed: 2021-04-21 13:24:18 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 31452 0 Urgent Closed Cannot build an EL 8.3 system via Satellite Bootdisk or Discovery kexec 2021-02-20 01:50:13 UTC
Foreman Issue Tracker 31573 0 Normal Closed Drop IPAPPEND2 completely and use 01 for the hardware prefix 2021-02-20 01:50:14 UTC
Red Hat Knowledge Base (Solution) 5631571 0 None None None 2020-12-09 12:12:44 UTC
Red Hat Product Errata RHSA-2021:1313 0 None None None 2021-04-21 13:24:44 UTC

Description Sayan Das 2020-12-01 17:09:57 UTC
Description of problem:

Unable to build RHEL 8.3 using Full Host Image but network-based deployment or auto-attach bootdisk based deployment still works fine.


Version-Release number of selected component (if applicable):

RHEL 8.3
Satellite 6.7


How reproducible:
Always


Steps to Reproduce:

1. Build a Satellite with provisioning services and setup configured.

2. Sync RHEL 8.3 baseos and appstream kickstart repos and add them in a CV.

3. Create a VM in VMware and create a host entry in Satellite using the correct MAC and RHEL 8.3 OS selected.

4. Download the full host image, copy the same to datastore, attach to the VM and boot with that Full Host Image.


Actual results:

* After reaching "Reached Target Basic Shutdown" , It will get stuck and then start showing "dracut-initqueue timeout" after some time.

* From dracut shell, "ip addr show" and "ip link show" commands shows that ens160 is present but not configured with an IP address and in DOWN state.



Expected results:

* Network should be getting configured and the process should move forward to complete the installation.


Additional info:


* The same issue cannot be reproduced with RHEL 8.2 or RHEL 8.1

* The same issue cannot be reproduced with RHEL 8.3 if I use iPXE based Host Image.

* I couldn't find any difference in kernel\initrd parameters in the pxelinux file in case of both Network and Full Host Image-based deployments but still the second one fails.

Comment 1 Joniel Pasqualetto 2020-12-02 22:42:52 UTC
The problem is that, apparently, there was a change in the kernel (?) and the BOOTIF parameter, in 8.3 does not accept the initial "00-" that is used in basically all our templates. 

Example, the following bootline works in 8.2 (and previous) but does not work in 8.3:

~~~
initrd=boot/red-hat-enterprise-linux-8-for-x86_64-baseos-kickstart-8-3-100-initrd.img ks=http://sat67.example.com/unattended/provision?token=9d0ca7d6-4369-40e8-adf6-04c58a420655  network ksdevice=bootif ks.device=bootif BOOTIF=00-52-54-00-84-87-51 kssendmac ks.sendmac inst.ks.sendmac
~~~

Changing the parameter BOOTIF to BOOTIF=52-54-00-84-87-51 (removing initial "00-" it works as expected).

We can workaround that on Satellite making changes in the provisioning templates, so they don't add the "00-" when the operating system is RHEL 8.3+.

I tested with RHEL 8.2 and it works both ways (with the 00- and without it). RHEL 8.3 only works without it. 

So far, these are the templates which I identified as needing to be addressed:

- Kickstart default PXELinux (will workaround the use case reported in this BZ)
- Discovery Red Hat kexec  (will workaround the use case of FDI):


Example of change for Kickstart default PXELinux:

~~~
(...)
  if mac
    if  os_major == 8 and os_minor > 2
      bootif = mac.gsub(':', '-')
      options.push("BOOTIF=#{bootif}")
    else
      bootif = '00-' + mac.gsub(':', '-')
      options.push("BOOTIF=#{bootif}")
    end
  end
(...)
~~~

Example of change for Discovery Red Hat kexec:

~~~
(...)
  mac = @host.facts['discovery_bootif']
  if mac
    if @host.operatingsystem.major.to_i == 8 and @host.operatingsystem.minor.to_i > 2
      bootif = mac.gsub(':', '-')
    else
      bootif = '00' + mac.gsub(':', '-')
    end
  end
(...)
~~~

Comment 2 Sayan Das 2020-12-03 07:44:00 UTC
Hello Joniel,

If I am not wrong then during normal pxebooting also, the 00 gets appended in front of Mac in bootif section.

If what you are saying that is correct, then the network based installation of rhel 8.3 also should have got failed right or I am missing something here as in both scenarios the same template will be used ??


-- Sayan

Comment 3 Joniel Pasqualetto 2020-12-03 13:23:44 UTC
Hello Sayan

Running some tests using PXE booting, you're right about using the same template. However (and I don't know all the details about it) when booting using PXE a second BOOTIF parameter was automatically appended to my kernel line: BOOTIF=01-52-54-00-84-87-51 and the provisioning worked.

If I remove (manually) this second BOOTIF parameter it fails with the same error as when using the bootdisk. I'm doing some research to understand what that 00- or 01- is used for.

Comment 4 Sayan Das 2020-12-03 13:28:34 UTC
(In reply to Joniel Pasqualetto from comment #3)
> Hello Sayan
> 
> Running some tests using PXE booting, you're right about using the same
> template. However (and I don't know all the details about it) when booting
> using PXE a second BOOTIF parameter was automatically appended to my kernel
> line: BOOTIF=01-52-54-00-84-87-51 and the provisioning worked.
> 
> If I remove (manually) this second BOOTIF parameter it fails with the same
> error as when using the bootdisk. I'm doing some research to understand what
> that 00- or 01- is used for.

Hello Joniel,

Yes, this the only difference I have noticed with Network-based and Full Host based deployments i.e. during network-based PXE, when i edit the PXEmenu to see kernel options, it has an additional BOOTIF parameter appended starting with 01 but for the same MAC.

When I boot with Full Host Image, if I halt the menu, edit the same and add the additional BOOTIF=01-XX.XX.XX.XX.XX there , the build works just fine for RHEL 8.3 as well. 


I am also checking to find out what that option does actually.



-- Sayan

Comment 5 Joniel Pasqualetto 2020-12-03 13:36:24 UTC
The second BOOTIF is automatically added because of the option "IPAPPEND 2" from the template. This option automatically adds the BOOTIF with the information about the interface used to boot. That's why it works with PXE but does not work by bootdisk or kexec.

From the installation guide: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/performing_an_advanced_rhel_installation/index#network_kickstart-commands-for-network-configuration

~~~
Set IPAPPEND 2 in your pxelinux.cfg file to have pxelinux set the BOOTIF variable.
~~~

Comment 6 Sayan Das 2020-12-03 13:41:35 UTC
(In reply to Joniel Pasqualetto from comment #5)
> The second BOOTIF is automatically added because of the option "IPAPPEND 2"
> from the template. This option automatically adds the BOOTIF with the
> information about the interface used to boot. That's why it works with PXE
> but does not work by bootdisk or kexec.
> 
> From the installation guide:
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/
> html-single/performing_an_advanced_rhel_installation/index#network_kickstart-
> commands-for-network-configuration
> 
> ~~~
> Set IPAPPEND 2 in your pxelinux.cfg file to have pxelinux set the BOOTIF
> variable.
> ~~~

Joniel,

This is from one of the Full Host Images of mine for RHEL 8.3
~~
$ cat /mnt/isolinux.cfg 
# This file was deployed via 'Unnamed' template


DEFAULT menu
MENU TITLE Booting into OS installer (ESC to stop)
TIMEOUT 100
ONTIMEOUT installer

LABEL installer
  MENU LABEL Unnamed
  KERNEL BOOT/OS_KICKSTART_8_3_105_VMLINUZ
  APPEND initrd=BOOT/KICKSTART_8_3_105_INITRD_IMG ks=http://satellite.example.com/unattended/provision?token=81bee514-b909-497a-8268-7bcdffd54709  network ksdevice=bootif ks.device=bootif BOOTIF=00-00-50-56-80-8e-1c kssendmac ks.sendmac inst.ks.sendmac
  IPAPPEND 2

~~

As I can see that "IPAPPEND 2" is mentioned in the isolinux configuration of the image as well, So logically while booting from Full Host image, shouldn't that option be also adding the second BOOTIF parameter similar to what happens in PXE ?




-- Sayan

Comment 7 Joniel Pasqualetto 2020-12-03 14:01:27 UTC
No, because the "IPAPPEND 2" sets the bootif parameter to the NIC used to boot. When you boot using the bootdisk, you didn't use a NIC to boot. Therefore, it's empty when using bootdisk.

I also found what is that 01- used for. It's supposed to be the ARP code for the hardware type. See here: https://wiki.syslinux.org/wiki/index.php?title=SYSLINUX

~~~
(...)

2: An option of the following format should be generated, in dash-separated hexadecimal with leading hardware type (same as for the configuration file; see doc/pxelinux.txt), and added to the kernel command line, allowing an initrd program to determine from which interface the system booted (empty for non-PXELINUX variants):

(...)
~~~

However, I don't know why we were using 00- as the code for ethernet is 1. See here: https://www.iana.org/assignments/arp-parameters/arp-parameters.xhtml#arp-parameters-2

Comment 8 Lukas Zapletal 2020-12-03 14:32:00 UTC
Excellent work by Joniel, there's nothing much to add here. I came to the very same conclusion after reading an email from Satyajit.

The big question is why this is broken in RHEL 8.3. I am looking into the dracut (upstream) and there is no such change, the code 

https://github.com/dracutdevs/dracut/blob/master/modules.d/40network/net-lib.sh#L223-L233

The code hasn't been changed for about 8 years now:

https://github.com/dracutdevs/dracut/blame/master/modules.d/40network/net-lib.sh

Maybe there is some completely from-scratch rewrite, let's see. I am going to file a BZ on RHEL and associate it with this one.

Does the workaround mentioned by Joniel work for your customers? We need to buy some time until RHEL team can investigate this. Also fix will probably take longer if it's a bug in RHEL, honestly I don't think they should be introducing such a breaking change, this can be easily detected and ingored.

Comment 9 Sayan Das 2020-12-03 14:36:44 UTC
Hello Lukas,

I can get RHEL 8.3 worked in both of these ways.


1. I either need to have "BOOTIF=00-50-56-80-8e-1c" instead of having the additional "00-" appended in-front of the MAC. [ this is what joniel had suggested and can be applied easily ]

Or, 

2. I need to add another "BOOTIF=01-50-56-80-8e-1c" i.e. starting with "01-" at the end of the line after halting the PXEmenu while booting from Full Host Image.


I tested both to be working but  I prefer the change in the following segment in "Kickstart default PXELinux" 

from,
###
  options = ["network", "ksdevice=bootif", "ks.device=bootif"]
  if mac
    bootif = '00-' + mac.gsub(':', '-')
    options.push("BOOTIF=#{bootif}")
  end
###


To,

##
  options = ["network", "ksdevice=bootif", "ks.device=bootif"]
  if mac
    if  os_major == 8 and os_minor > 2
      bootif = mac.gsub(':', '-')
      options.push("BOOTIF=#{bootif}")
    else
      bootif = '00-' + mac.gsub(':', '-')
      options.push("BOOTIF=#{bootif}")
    end
  end
##



I am positive this will work for customer's as well.


-- Sayan

Comment 13 Bryan Kearney 2020-12-04 16:04:18 UTC
Upstream bug assigned to lzap@redhat.com

Comment 14 Bryan Kearney 2020-12-04 16:04:21 UTC
Upstream bug assigned to lzap@redhat.com

Comment 15 Bryan Kearney 2020-12-09 20:03:55 UTC
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/31452 has been resolved.

Comment 17 Lukas Zapletal 2020-12-21 12:48:43 UTC
Hello Sayan, that's a good point in older versions this snippet might not exist. However quick search shows only one place:

[lzap@x1 foreman]$ g checkout SATELLITE-6.8.0
Already on 'SATELLITE-6.8.0'
Your branch is up to date with 'origin/SATELLITE-6.8.0'.

[lzap@x1 foreman]$ ag "'00-'"
app/views/unattended/provisioning_templates/snippet/kickstart_kernel_options.erb
26:    bootif = '00-' + mac.gsub(':', '-')

I don't see "bootif" in Satellite 6.8 template for PXEGrub2:

[lzap@x1 foreman]$ grep bootif ./app/views/unattended/provisioning_templates/PXEGrub2/kickstart_default_pxegrub2.erb

However for 6.7 or older there are multiple places to fix, yeah:

[lzap@x1 foreman]$ ag "'00-'"
app/views/unattended/provisioning_templates/PXELinux/kickstart_default_pxelinux.erb
25:    bootif = '00-' + mac.gsub(':', '-')

app/views/unattended/provisioning_templates/PXEGrub/kickstart_default_pxegrub.erb
20:    bootif = '00-' + mac.gsub(':', '-')

app/views/unattended/provisioning_templates/PXEGrub2/kickstart_default_pxegrub2.erb
25:    bootif = '00-' + mac.gsub(':', '-')

Comment 19 Lukas Zapletal 2021-01-05 12:42:12 UTC
For the record, NM will be updated to also accept 00- next to 01- so the final solution for Satellite is to use 01- instead of 00- for EL 8.3+.

Comment 21 Brad Buckingham 2021-01-08 22:48:08 UTC
The fix for this bugzilla is in an early Satellite 6.9 SNAP; therefore, aligning to release and updating state.

Comment 23 Roman Plevka 2021-02-11 12:49:40 UTC
Verified
on sat6.9.0-12.0

i was able to successfully provision a rhel8.3 host using the following flow:
- sync rhel8.3 repos
- add vmware as a compute resource
- create new host using the compute resource and "boot disk" provisioning method
  - satellite generates the full host boot disk iso and automatically uploads it to vmware store + attaches it to the newly created VM
- machine successfully booted the iso and loaded all the files via ipxe
- anaconda successfully installed all the packages
- RHEL8.3 successfully booted up after reboot.

Comment 28 errata-xmlrpc 2021-04-21 13:24:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.9 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1313


Note You need to log in before you can comment on or make changes to this bug.