Bug 2013438

Summary: OCP 4.7 bond network fails to link properly in mode 1, defaults to round-robin, MC override fails despite successful ifconfig update on master nodes
Product: OpenShift Container Platform Reporter: Will Russell <wrussell>
Component: NetworkingAssignee: Ben Nemec <bnemec>
Networking sub component: runtime-cfg QA Contact: Victor Voronkov <vvoronko>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, bpickard, vpickard
Version: 4.7   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-27 22:21:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Will Russell 2021-10-12 20:32:44 UTC
Description of problem:
OCP 4.7
Cluster deployment on bare metal hosts fails with timeout error if network bond is defined with mode 1. 
Cluster will only spin up if the mode is undefined, which defaults the bond to round-robin and does not engage the interface. 

When specifying mode 1 via MC (as stipulated by docs) The MC does deploy successfully to the master nodes, but the contents of bond0 are as follows:

cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

Slave Interface: ens2f0
MII Status: up
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: <redacted>
Slave queue ID: 0

Slave Interface: ens2f1
MII Status: up
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: <redacted>
Slave queue ID: 0

network config for bond:
[root@master-0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0 
DEVICE=bond0
TYPE=Bond
NAME=bond0
BONDING_MASTER=yes
BOOTPROTO=dhcp
ONBOOT=yes
MTU=9100
IPV4_DHCP_TIMEOUT=2147483647
IPV6INIT=no
DHCPV6C=no
IPV6INIT=no
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_PEERDNS=no
IPV6_PEERROUTES=no
IPV6_FAILURE_FATAL=no
BONDING_OPTS="mode=1 miimon=100"

Version-Release number of selected component (if applicable):
4.7.11/24 *tested on both releases issue occurs across both.

How reproducible:
every time

[network is preconfigured for bond link]


Steps to Reproduce:
1.install cluster - observe timeout if bond defined
2.install cluster with workaround failing to define bond mode, adjust via MC deployment when cluster is up/stable, observe no change in bond network interface even after node reboots


Actual results:
bond link is never activated as desired in mode 1, stays in round robin/disconnected


Expected results:
bond link should map as expected

Additional info:
case details linked to BZ for additional uploads/MG/SOS reports

Comment 3 Will Russell 2021-10-27 22:21:02 UTC
This bug is now being tracked in the following NEW Bug, which is the heart of the problem. Closing this case as duplicate. https://bugzilla.redhat.com/show_bug.cgi?id=2018003

*** This bug has been marked as a duplicate of bug 2018003 ***