Description of problem: When deploying a baremetal cluster using IPI installation we cannot apply machine config post-deployment to setup bonding. Version-Release number of selected component (if applicable): 4.6.4 How reproducible: Scenario: 3 masters, 3 workers. first nic - provisioning (enp1s0) second nic - baremetal (enp2s0) third nic - baremetal (enp3s0) -> initially unused on purpose to build bond after. I'm testing on a virtual LAB (libvirt for VMs and vbmc to simulate IPMI), using other types of bonding get the same results. Steps to Reproduce: 1. Deploy 4.6.4 OCP with IPI baremetal, first nic for prov, second nic for baremetal, third nic unused 2. Apply machine config for workers or masters to setup bond0 with second and third nics 3. Wait for the machine config to be applied and servers reboot Actual results: - Everything continues working, as before, but network is not set as defined. - ifcfg-* files are written for bond and nics as defined in the machine config - bonding created with only second nic - br-ex still uses the first nic, and not the bond Expected results: - bonding should have the two defined nics as slaves - br-ex should use bond and not the first nic only, Additional info: example of a worker before applying machine config: [core@worker-2 ~]$ nmcli con show NAME UUID TYPE DEVICE ovs-if-br-ex 4b32228e-85df-4646-83c8-df7bea727415 ovs-interface br-ex Wired Connection 29379185-7639-46f8-a1a9-8349a6a03256 ethernet enp1s0 br-ex d6446250-26dd-4e41-a3e2-2a0081bfe482 ovs-bridge br-ex ovs-if-phys0 2e06c1e9-39c9-4b62-9818-98ff3dd2f75f ethernet enp2s0 ovs-port-br-ex 3787fe2e-fc7a-46fa-9569-39b4cddb4f92 ovs-port br-ex ovs-port-phys0 eea4945e-4be5-48ad-b4be-95067e8f709e ovs-port enp2s0 After applying the machine config and reboot [core@worker-2 ~]$ nmcli con show NAME UUID TYPE DEVICE bond0 ad33d8b0-1f7b-cab9-9447-ba07f855b143 bond bond0 ovs-if-br-ex 4b32228e-85df-4646-83c8-df7bea727415 ovs-interface br-ex Wired Connection 29379185-7639-46f8-a1a9-8349a6a03256 ethernet enp1s0 br-ex d6446250-26dd-4e41-a3e2-2a0081bfe482 ovs-bridge br-ex ovs-if-phys0 2e06c1e9-39c9-4b62-9818-98ff3dd2f75f ethernet enp2s0 ovs-port-br-ex 3787fe2e-fc7a-46fa-9569-39b4cddb4f92 ovs-port br-ex ovs-port-phys0 eea4945e-4be5-48ad-b4be-95067e8f709e ovs-port enp2s0 System enp3s0 63aa2036-8665-f54d-9a92-c3035bad03f7 ethernet enp3s0 System enp2s0 8c6fd7b1-ab62-a383-5b96-46e083e04bb1 ethernet -- OVS database still shows first nic as a port in the bridge. [core@worker-2 ~]$ sudo ovs-vsctl list-ports br-ex enp2s0 patch-br-ex_worker-2-to-br-int The bonding only has one leg, (the initially unused nic) [core@worker-2 ~]$ cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: fault-tolerance (active-backup) (fail_over_mac follow) Primary Slave: None Currently Active Slave: enp3s0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Peer Notification Delay (ms): 0 Slave Interface: enp3s0 MII Status: up Speed: Unknown Duplex: Unknown Link Failure Count: 0 Permanent HW addr: 52:54:00:98:00:32 Slave queue ID: 0 ifcfg-* files are build just fine: [core@worker-2 ~]$ ls -l /etc/sysconfig/network-scripts/ total 12 -rw-r--r--. 1 root root 332 Nov 26 17:00 ifcfg-bond0 -rw-r--r--. 1 root root 77 Nov 26 17:00 ifcfg-enp2s0 -rw-r--r--. 1 root root 77 Nov 26 17:00 ifcfg-enp3s0 [core@worker-2 ~]$ cat /etc/sysconfig/network-scripts/ifcfg-bond0 DEVICE=bond0 TYPE=Bond NAME=bond0 BONDING_MASTER=yes BOOTPROTO=dhcp ONBOOT=yes MTU=1500 BONDING_OPTS="mode=active-backup miimon=100 fail_over_mac=follow" AUTOCONNECT_SLAVES=yes IPV6INIT=no DHCPV6C=no IPV6INIT=no IPV6_AUTOCONF=no IPV6_DEFROUTE=no IPV6_PEERDNS=no IPV6_PEERROUTES=no IPV6_FAILURE_FATAL=no IPV4_DHCP_TIMEOUT=2147483647 [core@worker-2 ~]$ cat /etc/sysconfig/network-scripts/ifcfg-enp2s0 TYPE=Ethernet DEVICE=enp2s0 BOOTPROTO=none ONBOOT=yes MASTER=bond0 SLAVE=yes [core@worker-2 ~]$ cat /etc/sysconfig/network-scripts/ifcfg-enp3s0 TYPE=Ethernet DEVICE=enp3s0 BOOTPROTO=none ONBOOT=yes MASTER=bond0 SLAVE=yes When login initially to the worker, I noticed there is a failure with NetworkManager [root@worker-2 ~]# systemctl status NetworkManager-wait-online.service ● NetworkManager-wait-online.service - Network Manager Wait Online Loaded: loaded (/usr/lib/systemd/system/NetworkManager-wait-online.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Thu 2020-11-26 17:01:15 UTC; 36min ago Docs: man:nm-online(1) Process: 1409 ExecStart=/usr/bin/nm-online -s -q --timeout=30 (code=exited, status=1/FAILURE) Main PID: 1409 (code=exited, status=1/FAILURE) CPU: 112ms Nov 26 17:00:45 localhost systemd[1]: Starting Network Manager Wait Online... Nov 26 17:01:15 worker-2 systemd[1]: NetworkManager-wait-online.service: Main process exited, code=exited, status=1/FAILURE Nov 26 17:01:15 worker-2 systemd[1]: NetworkManager-wait-online.service: Failed with result 'exit-code'. Nov 26 17:01:15 worker-2 systemd[1]: Failed to start Network Manager Wait Online. Nov 26 17:01:15 worker-2 systemd[1]: NetworkManager-wait-online.service: Consumed 112ms CPU time Attached some other outputs, the machine config templates and sosreports just in case. Please let me know if I can provide any other helpful information. Thanks,
Created attachment 1733851 [details] Installation template
Created attachment 1733852 [details] machine config template for workers
Created attachment 1733853 [details] machine config operator outputs
Created attachment 1733857 [details] sosreport with only command outputs, /var and /etc directories
I would like to add the following tests results in case this helps to narrow down the problem: I noticed using a different network type has an impact, to summarize: - Using OVNKubernetes in 4.6.4 and setting bonding with MC templates post-install doesn't work (as described above in this BZ) - Using OVNKubernetes in 4.6.4 and passing the mc templates as manifest in a fresh install results in bonding set with round-robin mode instead of the defined mode specified in templates see BZ: 1899350 - Using OpenshiftSDN in 4.6.4 and passing the mc templates as manifest in a fresh install works. (bonding preserves the options we passed in templates) At the end we want to use OVNKubernetes doesn't really matter if we setup the bond post or during the installation. We tested with NMSTATE, but also got errors, I'll follow up that in a diff BZ.
Hi, I see the BZ has the Needinfo label, is there any information you would like me to include? thanks!
Doing some BZ cleanup, closing this one out as it is over a year old. If this is still an issue, please open a new BZ and provide details. Thanks.