Bug 1940871
Summary: | Unable to use a bonded device ( bond0 ) on a vlan via UPI install of node workers | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Neeraj <nbhatt> |
Component: | RHCOS | Assignee: | Luca BRUNO <lucab> |
Status: | CLOSED DUPLICATE | QA Contact: | Michael Nguyen <mnguyen> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.6 | CC: | abhinkum, acai, bbreard, dornelas, dustymabe, imcleod, itiwana, jligon, lsantill, mharris, miabbott, mnguyen, nkaushik, nstielau, rgregory, sferguso, smolnar, vchoudha |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | 4.8.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1897660 | Environment: | |
Last Closed: | 2021-04-20 09:35:42 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1897660 | ||
Bug Blocks: |
Description
Neeraj
2021-03-19 12:47:47 UTC
the above scenario is reproducible mostly when deployed on bare-metal nodes. (ie: When testing in libvirt/kvm it tends to work just fine.) The network configs can be pushed in via kernel params or done with nmcli, the result does not change. the above scenario is reproducible mostly when deployed on bare-metal nodes. (ie: When testing in libvirt/kvm it tends to work just fine.) The network configs can be pushed in via kernel params or done with nmcli, the result does not change. Could you share the kernel args used to configure the network interfaces? Please provide the contents of `cat /etc/NetworkManager/system-connections/*` after the `nmcli` commands were performed in the live ISO. Please provide the serial console/journal for the system after the `coreos-installer` command has been run and the system has been rebooted. There is not yet enough information in this report to determine what has gone wrong. Hello, 1) the following were the kernel args passed during the booting of the live ISO: kernel /images/pxeboot/vmlinuz append initrd=/images/pxeboot/initrd.img,/images/ignition.img random.trust_cpu=on rd.luks.options=discard coreos.liveiso=RHCOS-CustomIso ignition.firstboot ignition.platform.id=metal ip=10.141.97.10::10.141.97.1:255.255.255.0:worker7.ocp-lab.menalab.corp.local:bond0.2225:none vlan=bond0.2225:bond0 bond=bond0:eno1,eno2:mode=802.3ad,lacp_rate=fast,miimon=100 nameserver=172.24.109.51 coreos.inst.install_dev=sda coreos.inst.ignition_url=http://10.141.96.8:80/ignition/worker.ign 2) we don't have the system-connections at the moment, but network was accessible during this stage throughout (though nmcli was giving an object error) and later during the boot-from-disk stage for about ~30 seconds. will try to pull this from next install. 3) will get fresh from the system install in 2). Possibly related, 4.6 releases had a bug related to the handling of `nameserver=` kernel arguments: https://bugzilla.redhat.com/show_bug.cgi?id=1882781 On top of what Micah already asked, it would be great to perform this cluster installation directly on 4.7, or rework the kernel arguments to avoid hitting the above bug. Speculatively pointing to https://bugzilla.redhat.com/show_bug.cgi?id=1882781 as the root cause for this, due to matching conditions/triggers and the lack of actionable logs to investigate further. Closing as a duplicate. We deem RHCOS 4.7 to be generally fine for setups in a bond+VLAN environment. If there are further cases of failures using 4.7, please open a dedicated ticket with full installation details and journal logs from the RHCOS node. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |