Description of problem: Unable to install RHCOS when booted from PXE with tagged vlan configuration. On RHEL 7/8 it would look something like this: vlan=eth0.vlan100:vlan100 ip=192.168.1.100::192.168.1.1:255.255.255.0:localhost:vlan100:none However the same doesn't work for RHCOS when booted from PXE Version-Release number of selected component (if applicable): Environment: OpenShift 4.3 OS: RHCOS How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: RHCOS doens't boot from PXE with tagged vlan Expected results: RHCOS should boot from PXE with tagged vlan Additional info: Onsite consultant did workaround issue with the following steps - 1. downloaded the fedora-coreos live CD 2. boot from the CD and run the coreos-installer after setting up the ip address and the VLAN tag manually. 3. editing the first boot from the Server HD and adding the ip=vlan01:dhcp vlan=vlan01:eth0 arguments. Those steps eventually solved the issue and we where able to continue with the cluster deployment from that point I truly believe that there is an issue with the initramfs.img of the PXE in regards to VLAN tag because the image initramfs receives the arguments as expected.
We are currently working on higher priority bugs and features in RHCOS. The BZ has been targeted for the future 4.6 release of RHCOS/OCP and will be evaluated more thoroughly in the near future.
This bug has not been selected for work in the current sprint.
Hi Abhinit, Frédéric, I'm trying to isolate the issue but I'm lacking some appropriate infrastructure to do so. I've got a set up with vlans in my local environment with VMs but I don't have the PXE component. I have confirmed that I can boot with appropriate kernel args for vlan and have it do the install off of a vlan tagged interface. The only piece I'm missing is the PXE boot. Abhinit, I see you said the consultant did a workaround by booting into the ISO, setting up the network and then running the install. Can you test doing something similar with the ISO except use kargs instead? This should get us a lot closer to the PXE workflow, but without actually introducing PXE yet. Here is what I did that worked: 1. Boot ISO 2. Stop at grub prompt 3. Update kargs to something like: - console=ttyS0 coreos.inst.install_dev=sda coreos.inst.ignition_url=http://192.168.201.100:8000/config.ign coreos.inst.image_url=http://192.168.201.100:8000/rhcos-4.3.8-x86_64-metal.x86_64.raw.gz vlan=ens2.100:ens2 ip=192.168.201.101::192.168.201.1:255.255.255.0:localhost:ens2.100:none 4. press enter and watch install complete
My customer was successful configuring vlan tagging with RHCOS without PXE boot. That said it is still a major issue for them as it is not practical to manage multiple large bare-metal installations without PXE boots. The customer was able using PXE-Boot to grub with the fedora shim.efi and grubx64.efi binaries but *grub itself* seems not able to load the grub.conf via VLAN. The customer setup is - The nodes have two bonded 10G network interfaces that they want to use as load balancing and failover (both active connections, no LACP) - They have VLANs configured, so the PXE-request has to come with a VLAN tag. This can be configured in the BIOS when using UEFI mode - They want to use UEFI for booting (see previous point) - The final node configuration has to be a bond interface with VLAN configured
Hi Frederic, Abhinit, It looks like only certain hardware has the ability to set the VLAN in the UEFI settings, so I'm a bit limited in trying to reproduce this locally. So we'll try to ask a few questions blind and see if we can work towards a solution. Does this work when trying to PXE boot in a UEFI environment with plain RHEL8? Some searching landed me on this old mailing list thread which makes me wonder if support to grub was ever added for this: https://help-grub.gnu.narkive.com/iSM0NEe0/uefi-pxe-boot-to-grub2-with-bios-configured-vlan-tagging
When booting with ISO and adding "ip=vlan01:dhcp vlan=vlan01:eth0" arguments at cmdline works fine. And that is the workaround onsite consultant followed for the installation. However the problem is when passing these information via PXE.
(In reply to Dusty Mabe from comment #16) > Hi Frederic, Abhinit, > > It looks like only certain hardware has the ability to set the VLAN in the > UEFI settings, so I'm a bit limited in trying to reproduce this locally. So > we'll try to ask a few questions blind and see if we can work towards a > solution. > > Does this work when trying to PXE boot in a UEFI environment with plain > RHEL8? Some searching landed me on this old mailing list thread which makes > me wonder if support to grub was ever added for this: > https://help-grub.gnu.narkive.com/iSM0NEe0/uefi-pxe-boot-to-grub2-with-bios- > configured-vlan-tagging Hello Dusty, Below is the comment from onsite consultant when tried to boot the same configuration with RHEL 8. ~~~ it looks like a VLAN TAG issue with the kernel cmdline argument we are providing is not working as expected. we tried to install RHEL (vlan=eth0.005:eth0) on those servers in the same matter and RHEL was able to deploy successfully. ~~~
(In reply to Abhinit Kumar from comment #19) > > Hello Dusty, > > Below is the comment from onsite consultant when tried to boot the same > configuration with RHEL 8. > > ~~~ > it looks like a VLAN TAG issue with the kernel cmdline argument we are > providing is not working as expected. > we tried to install RHEL (vlan=eth0.005:eth0) on those servers in the same > matter and RHEL was able to deploy successfully. > ~~~ Thanks Abhinit. I'm a bit confused. From reading this bug report the summary of the problem that I'm coming away with is: - The server's have a special setting for telling NICs to use a VLAN tag during early boot (i.e., PXE). - There is an issue when trying to PXE boot on a VLAN tagged network where PXE on UEFI either: - successfully pulls the kernel and initrd, but then things get stuck in grub - OR never get the kernel and initrd at all From that it seems like kargs like `vlan=` have no impact on the problem because those don't apply until dracut and we're getting stuck before that.
After discussing this with several grub developers it turns out this feature does not currently exist in grub (neither upstream nor in RHEL). The feature does happen to exist in RHEL for the ppc64le architecture because, at the time it was implemented, it was the only platform that passed VLAN information along. For this particular BZ I am going to close it as NOTABUG (since the feature never existed in the first place). For those customers currently affected by this problem: 1. I have opened an RFE against grub for this feature to exist: BZ1857410. Please follow and add more information to BZ1857410 so the grub team can make appropriate prioritization decisions. 2. Unfortunately for now you will most likely want to work around this by performing PXE based operations on a non VLAN tagged network. I will be free to answer questions if anyone would like to discuss this further.