Bug 1842887
| Summary: | [UPI] [Baremetal] RHCOS 4.3 Installation doesn't work when using tagged vlan [need info] | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Abhinit Kumar <abhinkum> |
| Component: | RHCOS | Assignee: | Dusty Mabe <dustymabe> |
| Status: | CLOSED NOTABUG | QA Contact: | Michael Nguyen <mnguyen> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.3.0 | CC: | apjagtap, bbreard, dornelas, fgiloux, hyoskim, imcleod, jligon, miabbott, nstielau |
| Target Milestone: | --- | ||
| Target Release: | 4.6.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-07-15 19:57:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1186913 | ||
|
Description
Abhinit Kumar
2020-06-02 10:36:25 UTC
We are currently working on higher priority bugs and features in RHCOS. The BZ has been targeted for the future 4.6 release of RHCOS/OCP and will be evaluated more thoroughly in the near future. This bug has not been selected for work in the current sprint. This bug has not been selected for work in the current sprint. Hi Abhinit, Frédéric,
I'm trying to isolate the issue but I'm lacking some appropriate infrastructure to do so. I've got a set up with vlans in my local environment with VMs but I don't have the PXE component. I have confirmed that I can boot with appropriate kernel args for vlan and have it do the install off of a vlan tagged interface. The only piece I'm missing is the PXE boot.
Abhinit,
I see you said the consultant did a workaround by booting into the ISO, setting up the network and then running the install. Can you test doing something similar with the ISO except use kargs instead? This should get us a lot closer to the PXE workflow, but without actually introducing PXE yet. Here is what I did that worked:
1. Boot ISO
2. Stop at grub prompt
3. Update kargs to something like:
- console=ttyS0 coreos.inst.install_dev=sda coreos.inst.ignition_url=http://192.168.201.100:8000/config.ign coreos.inst.image_url=http://192.168.201.100:8000/rhcos-4.3.8-x86_64-metal.x86_64.raw.gz vlan=ens2.100:ens2 ip=192.168.201.101::192.168.201.1:255.255.255.0:localhost:ens2.100:none
4. press enter and watch install complete
My customer was successful configuring vlan tagging with RHCOS without PXE boot. That said it is still a major issue for them as it is not practical to manage multiple large bare-metal installations without PXE boots. The customer was able using PXE-Boot to grub with the fedora shim.efi and grubx64.efi binaries but *grub itself* seems not able to load the grub.conf via VLAN. The customer setup is - The nodes have two bonded 10G network interfaces that they want to use as load balancing and failover (both active connections, no LACP) - They have VLANs configured, so the PXE-request has to come with a VLAN tag. This can be configured in the BIOS when using UEFI mode - They want to use UEFI for booting (see previous point) - The final node configuration has to be a bond interface with VLAN configured Hi Frederic, Abhinit, It looks like only certain hardware has the ability to set the VLAN in the UEFI settings, so I'm a bit limited in trying to reproduce this locally. So we'll try to ask a few questions blind and see if we can work towards a solution. Does this work when trying to PXE boot in a UEFI environment with plain RHEL8? Some searching landed me on this old mailing list thread which makes me wonder if support to grub was ever added for this: https://help-grub.gnu.narkive.com/iSM0NEe0/uefi-pxe-boot-to-grub2-with-bios-configured-vlan-tagging When booting with ISO and adding "ip=vlan01:dhcp vlan=vlan01:eth0" arguments at cmdline works fine. And that is the workaround onsite consultant followed for the installation. However the problem is when passing these information via PXE. (In reply to Dusty Mabe from comment #16) > Hi Frederic, Abhinit, > > It looks like only certain hardware has the ability to set the VLAN in the > UEFI settings, so I'm a bit limited in trying to reproduce this locally. So > we'll try to ask a few questions blind and see if we can work towards a > solution. > > Does this work when trying to PXE boot in a UEFI environment with plain > RHEL8? Some searching landed me on this old mailing list thread which makes > me wonder if support to grub was ever added for this: > https://help-grub.gnu.narkive.com/iSM0NEe0/uefi-pxe-boot-to-grub2-with-bios- > configured-vlan-tagging Hello Dusty, Below is the comment from onsite consultant when tried to boot the same configuration with RHEL 8. ~~~ it looks like a VLAN TAG issue with the kernel cmdline argument we are providing is not working as expected. we tried to install RHEL (vlan=eth0.005:eth0) on those servers in the same matter and RHEL was able to deploy successfully. ~~~ (In reply to Abhinit Kumar from comment #19) > > Hello Dusty, > > Below is the comment from onsite consultant when tried to boot the same > configuration with RHEL 8. > > ~~~ > it looks like a VLAN TAG issue with the kernel cmdline argument we are > providing is not working as expected. > we tried to install RHEL (vlan=eth0.005:eth0) on those servers in the same > matter and RHEL was able to deploy successfully. > ~~~ Thanks Abhinit. I'm a bit confused. From reading this bug report the summary of the problem that I'm coming away with is: - The server's have a special setting for telling NICs to use a VLAN tag during early boot (i.e., PXE). - There is an issue when trying to PXE boot on a VLAN tagged network where PXE on UEFI either: - successfully pulls the kernel and initrd, but then things get stuck in grub - OR never get the kernel and initrd at all From that it seems like kargs like `vlan=` have no impact on the problem because those don't apply until dracut and we're getting stuck before that. After discussing this with several grub developers it turns out this feature does not currently exist in grub (neither upstream nor in RHEL). The feature does happen to exist in RHEL for the ppc64le architecture because, at the time it was implemented, it was the only platform that passed VLAN information along. For this particular BZ I am going to close it as NOTABUG (since the feature never existed in the first place). For those customers currently affected by this problem: 1. I have opened an RFE against grub for this feature to exist: BZ1857410. Please follow and add more information to BZ1857410 so the grub team can make appropriate prioritization decisions. 2. Unfortunately for now you will most likely want to work around this by performing PXE based operations on a non VLAN tagged network. I will be free to answer questions if anyone would like to discuss this further. |