Hide Forgot
Description of problem: During a bare-metal IPI installation of OpenShift 4.4.3, master nodes successfully boot via PXE and reboot into RHCOS for the first time, so that their ignition configuration can be applied. In this very first reboot into RHCOS, the bare metal machine can take more than an hour to complete their first boot. These machines are equipped with multiple network interface cards, and the messages on the RHCOS console suggest that: (a) the DHCP client tries each NIC *in series* (b) the DHCP client waits 3-5 minutes before timing out and moving on to the next NIC On these machines, most NICs have not cables attached to them. We must wait until (something like) the first 7 NICs have timed out waiting for a DHCP lease before the correct NIC is attempted. It would be useful to have all these NICs attempted in parallel, so that we're not waiting in series on for DHCP leases. Version-Release number of the following components: OpenShift 4.4.3 How reproducible: Every time
To limit the number of interfaces DHCP is tried on it should be sufficient to replace the `ip=dhcp` argument on the kernel command line with `ip=$NIC:dhcp` where $NIC is the name of the single NIC you want DHCP on (i.e. ens2). You should be able to apply this kernel parameter for the "install boot" (the boot where you are doing the bare metal install) via PXE and it will get propagated to the first boot of the installed system.
@Dusty sorry for potentially a trivial question but do you have an idea where should we specify this so it is taken into account during the deployment of OpenShift BM IPI?
I assume you're using the kargs to do the install over PXE where you have to specify things like coreos.inst=yes. In that same place where you are specifying kargs you most likely also have a `ip=dhcp` kernel argument. Change it to `ip=$NIC:dhcp`.
Do we need to generate the manifest with openshift-baremetal-install to get to this setting or is it something we can configure in the install-config.yaml? I checked in the doc and in my environment but I did not find this information on how to pass this parameter to the OpenShift install.
hey Jean-Francois - It is the RHCOS install, which I think should be separate from the openshift-installer (though I admittedly don't have experience with IPI). In IPI do you ever execute a step like this https://docs.openshift.com/container-platform/4.4/installing/installing_bare_metal/installing-bare-metal.html#installation-user-infra-machines-pxe_installing-bare-metal ?
@Dusty, no this step is automated by the install when doing a baremetal IPI deployment. You create an install-config.yaml file and run "openshift-baremetal-install create cluster" and it launch everything it needs. I tried to find a way to customize this parameter but did not find any yet.
For the purposes of additional clarity, the image that has been deployed is a disk image, after the network boot operation has completed. In other words, there is no ability to specify a parameter for the initial ramdisk's processing
@julia, the suggested workaround (https://bugzilla.redhat.com/show_bug.cgi?id=1836248#c10) is to change the karg for the install, which is the "network boot operation" you are referring to. That will get propagated forward into the first boot of the machine (i.e. the boot from disk). Basically we need the ability to tweak some of the kernel arguments in the PXE config. Can you advise on that front?
This is basically https://github.com/coreos/ignition/issues/979
FWIW, dracut can also be configured to specify the dhcp timeout and retry parameters such that we can only send a single dhcp discover message and wait a small amount of time for a response, which would accelerate this process significantly
for reference: rd.net.timeout.dhcp rd.net.dhcp.retry are the kernel command line parameters that direct dracut in how often to retry and how long to wait between retries
This bug has not been selected for work in the current sprint.
We are having a parallel discussion/debate on this here https://github.com/coreos/ignition/issues/979
Suggestion for possible future IPI deployer behavior: https://github.com/coreos/ignition/issues/979#issuecomment-646725569
From what I understand there a are a few changes coming that could attack this problem from different angles. First there is the discussion going on in https://github.com/coreos/ignition/issues/979 about the future of IPI provisioning RHCOS. I'm not sure if all of that will land in 4.6, but most of it should. Second, in 4.6 we already landed a change that moves to use NetworkManager to do network bringup in the initrd. It appears that NM does try to bring up all interfaces in parallel. I just performed some tests and I believe this will solve the customers immediate need. The original description also states: "On these machines, most NICs have not cables attached to them." I did some more testing and verified that if there is no network cable plugged in then NM won't even try DHCP. The way I tested this was by using a VM and simulating unplugging the network cables (see https://unix.stackexchange.com/questions/81044/emulate-unplugging-a-network-cable-with-qemu-kvm). I started a machine on a bridge without DHCP: ``` $ sudo virsh net-dumpxml nodhcp <network> <name>nodhcp</name> <uuid>626e6e74-49c3-4eb2-87f9-4539f944888e</uuid> <forward mode='nat'> <nat> <port start='1024' end='65535'/> </nat> </forward> <bridge name='virbr100' stp='on' delay='0'/> <mac address='52:54:00:3a:b3:4d'/> <ip address='192.168.130.1' netmask='255.255.255.0'> </ip> </network> ``` Then started a VM with 8 interfaces on that network ``` virt-install --import --name tester --cpu host-passthrough --ram 2048 --vcpus 2 --boot menu=on,useserial=on --accelerate --graphics none --force --qemu-commandline="-fw_cfg name=opt/com.coreos/config,file=/var/b/images/fcct-auto-login-ttyS0.ign" --disk /var/b/images/rhcos-46.82.202006221550-0-qemu.x86_64.qcow2 --rng random --network bridge=virbr100,model=virtio --network bridge=virbr100,model=virtio --network bridge=virbr100,model=virtio --network bridge=virbr100,model=virtio --network bridge=virbr100,model=virtio --network bridge=virbr100,model=virtio --network bridge=virbr100,model=virtio --network bridge=virbr100,model=virtio ``` I stopped the VM at the grub menu and ran a command for each interface to "unplug" the cable: ``` virsh dumpxml tester | grep -i mac virsh domif-setlink tester 52:54:00:3d:07:22 down ... ... ``` I then pressed enter at the grub prompt to continue boot. I was presented with a Login prompt in around 20 seconds. If I don't "unplug" the cable from the interfaces then DHCP is attempted we have to wait for timeouts before the boot continues (as is the experience of the customer here in this issue). Jean-Francois Saucier - I think the immediate need for this will be solved with the move to NetworkManager in 4.6 and in the future (maybe in 4.6) we'll have other pieces in place so networking won't even attempt to be brough up in the initramfs. Can you confirm you think what I'm proposing here is sufficient?
(In reply to Dusty Mabe from comment #40) > I then pressed enter at the grub prompt to continue boot. I was presented > with a Login prompt in around 20 seconds. If I don't "unplug" the cable from > the interfaces then DHCP is attempted we have to wait for timeouts before > the boot continues (as is the experience of the customer here in this issue). Ignore that 2nd sentence, it's inaccurate.
This is being worked on, but is currently awaiting more investigation or more information and won't be completed this sprint.
RHCOS/OCP 4.6 will use NetworkManager to bring up the network in the initramfs, which will make the long timeout issue go away as described in https://bugzilla.redhat.com/show_bug.cgi?id=1836248#c40 . Moving this bug to MODIFIED.
(In reply to Dusty Mabe from comment #41) > (In reply to Dusty Mabe from comment #40) > > > I then pressed enter at the grub prompt to continue boot. I was presented > > with a Login prompt in around 20 seconds. If I don't "unplug" the cable from > > the interfaces then DHCP is attempted we have to wait for timeouts before > > the boot continues (as is the experience of the customer here in this issue). > > Ignore that 2nd sentence, it's inaccurate. To further clarify, the statement should have read: I then pressed enter at the grub prompt to continue boot. I was presented with a Login prompt in around 20 seconds. If I don't "unplug" the cable from the interfaces then we are still OK because DHCP is attempted in parallel. The DHCP attempts will timeout but since they all happen in parallel the timeout to get to the login prompt is <60 seconds for my trivial VM test case. This timeout is much more reasonable as opposed to the hour long timeout the BZ opener reported.
Verified on RHCOS 46.82.202007062141-0 which is a part of 4.6.0-0.nightly-2020-07-07-083718 using steps from https://bugzilla.redhat.com/show_bug.cgi?id=1836248#c40 # cat << EOF > nodhcp.xml <network> <name>nodhcp</name> <uuid>626e6e74-49c3-4eb2-87f9-4539f944888e</uuid> <forward mode='nat'> <nat> <port start='1024' end='65535'/> </nat> </forward> <bridge name='virbr100' stp='on' delay='0'/> <mac address='52:54:00:3a:b3:4d'/> <ip address='192.168.131.1' netmask='255.255.255.0'> </ip> </network> EOF # virsh net-create -f nodhcp.xml # virt-install --import --name tester --cpu host-passthrough --ram 2048 --vcpus 2 --boot menu=on,useserial=on --accelerate --graphics none --force --qemu-commandline="-fw_cfg name=opt/com.coreos/config,file=/var/lib/libvirt/images/rhah/ignition" --disk /var/lib/libvirt/images/rhah/rhcos-46.82.202007071437-0-qemu.x86_64.qcow2 --rng random --network bridge=virbr100,model=virtio --network bridge=virbr100,model=virtio --network bridge=virbr100,model=virtio --network bridge=virbr100,model=virtio --network bridge=virbr100,model=virtio --network bridge=virbr100,model=virtio --network bridge=virbr100,model=virtio --network bridge=virbr100,model=virtio *interrupt grub menu* # virsh dumpxml tester| grep -i 'mac address' | cut -d\' -f2 | xargs -I % sudo virsh domif-setlink tester % down *continue boot*. Results: With the link down, there is no additional wait for unplugged interfaces. With the links up, there is a single wait of 45 seconds for DHCP to time out regardless of how many interfaces there are.
This was actually verified on rhcos 46.82.202007071437-0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196