Description of problem: Nodes get tainted when additional interfaces get enabled. The additional interfaces have lower MTU than the initial interfaces. Version-Release number of selected component (if applicable): 4.5.7 How reproducible: once out of 1 attempts Steps to Reproduce: 1. deploy OCP on hosts with multiple interfaces, configure only one iface connected to PLAN with jumbo frames enabled (in our case MTU=5550) 2. after the installation, verify cluster is operational (OpenShift SDN's MTU was automatically configured to MTU 5500) 3. configure additional interface on compute nodes connected to the management network with standard MTU (1500) Actual results: all the nodes with configured management interface get tainted with network.openshift.io/mtu-too-small:NoSchedule Expected results: the nodes with an additional interface configured are not tainted and schedulable without tolerations Additional info: - the tainted nodes are schedulable with pod tolerations - removing the taint lasts for a couple of seconds until an operator sets it again - overriding the tain with network.openshift.io/mtu-too-small:PreferNoSchedule makes the nodes schedulable for daemonsets without adding tolerations - this taint is not overridden by the operator - our environment is bare metal Iface overview: PLAN interface: eno4 Mgmt interface: eno1np0 [core@compute1 ~]$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eno3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether b0:26:28:12:60:88 brd ff:ff:ff:ff:ff:ff 3: eno1np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether b0:26:28:12:60:8a brd ff:ff:ff:ff:ff:ff inet 10.76.34.24/23 brd 10.76.35.255 scope global dynamic noprefixroute eno1np0 valid_lft 27001sec preferred_lft 27001sec 4: eno4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 5550 qdisc mq state UP group default qlen 1000 link/ether b0:26:28:12:60:89 brd ff:ff:ff:ff:ff:ff inet 192.168.51.36/24 brd 192.168.51.255 scope global dynamic noprefixroute eno4 valid_lft 459813sec preferred_lft 459813sec inet6 fe80::b226:28ff:fe12:6089/64 scope link valid_lft forever preferred_lft forever 5: eno2np1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether b0:26:28:12:60:8b brd ff:ff:ff:ff:ff:ff 10: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether f6:91:1c:4f:ea:30 brd ff:ff:ff:ff:ff:ff 11: br0: <BROADCAST,MULTICAST> mtu 5500 qdisc noop state DOWN group default qlen 1000 link/ether d2:00:ec:99:09:42 brd ff:ff:ff:ff:ff:ff ... [core@compute1 ~]$ ip route default via 10.76.34.1 dev eno1np0 proto dhcp metric 100 default via 192.168.51.32 dev eno4 proto dhcp metric 500 10.76.34.0/23 dev eno1np0 proto kernel scope link src 10.76.34.24 metric 100 10.128.0.0/14 dev tun0 scope link 172.30.0.0/16 dev tun0 192.168.51.0/24 dev eno4 proto kernel scope link src 192.168.51.36 metric 500 [root@compute1 network-scripts]# grep '.' ifcfg-* ifcfg-eno1np0:# management network ifcfg-eno1np0:NAME=eno1np0 ifcfg-eno1np0:DEVICE=eno1np0 ifcfg-eno1np0:BROWSER_ONLY=no ifcfg-eno1np0:DEFROUTE=yes ifcfg-eno1np0:IPV4_FAILURE_FATAL="no" ifcfg-eno1np0:IPV4_ROUTE_METRIC=100 ifcfg-eno1np0:METRIC=100 ifcfg-eno1np0:NM_CONTROLLED=yes ifcfg-eno1np0:ONBOOT=yes ifcfg-eno1np0:PEERDNS=no ifcfg-eno1np0:PEERROUTES=yes ifcfg-eno1np0:TYPE=Ethernet ifcfg-eno1np0:PROXY_METHOD=none ifcfg-eno1np0:IPV6_DISABLED=yes ifcfg-eno1np0:BOOTPROTO=dhcp ifcfg-eno4:DEVICE=eno4 ifcfg-eno4:ONBOOT=yes ifcfg-eno4:BOOTPROTO=dhcp ifcfg-eno4:IPV6INIT=no ifcfg-eno4:IPV6_AUTOCONF=no ifcfg-eno4:TYPE=Ethernet ifcfg-eno4:NAME="OpenShift Private VLAN" ifcfg-eno4:METRIC=500 ifcfg-eno4:PROXY_METHOD=none ifcfg-eno4:BROWSER_ONLY=no ifcfg-eno4:DEFROUTE=yes ifcfg-eno4:IPV4_FAILURE_FATAL=yes ifcfg-eno4:IPV4_ROUTE_METRIC=500 ifcfg-eno4:MTU=5550 SDN pod log: *$ oc logs sdn-ghgns | grep -i mtu I0908 08:42:16.881304 3065 node.go:245] Checking default interface MTU I0908 10:36:08.521126 2782 node.go:245] Checking default interface MTU I0908 10:48:53.378994 2723 node.go:245] Checking default interface MTU I0908 11:00:51.293230 2754 node.go:245] Checking default interface MTU I0908 14:31:11.635782 3956 node.go:245] Checking default interface MTU I0908 14:31:11.651405 3956 node.go:296] Default interface MTU is less than VXLAN overhead, tainting node... I0908 14:34:08.655947 2917 node.go:245] Checking default interface MTU No idea, how the default interface is determined (alphabetically?). METRIC does not seem to play the role.
I was wrong about this one: "this taint is not overridden by the operator" OpenShift SDN container taints the node on its startup. To amend that, we are now using this work-around: https://gist.github.com/miminar/1399627ef114f96245f011185fa3747b
openshift-sdn does not currently support installation on systems with multiple interfaces, where the VXLAN interface does not have the lowest-metric (eg most preferred) default route. It has always been this way, and there are already RFEs to support multiple interfaces, but other features have been a priority.
(In reply to Dan Williams from comment #9) > openshift-sdn does not currently support installation on systems with > multiple interfaces, where the VXLAN interface does not have the > lowest-metric (eg most preferred) default route. Actually, the MTU-tainting code is buggy; it taints the node if there is _any_ interface with a default route and a too-small MTU, even if it's not the interface that will actually get used. It probably ought to only check the interface that holds the primary node IP, since that's guaranteed to be the interface that at least inbound VXLAN traffic will use. If someone has a cluster with zany asymmetric routing such that outbound VXLAN uses a different interface, then they can just be responsible for sanity-checking the MTU on that interface themselves.
@vpickard taking this assuming you are not working on it, feel free to ping in case you are
@fpaoline Thanks for taking this one!
Verified this bug on 4.8.0-0.nightly-2021-05-07-075528 There are two interface eno1 and eno2, and set eno2 with lower MTU 1400 than 1500 step 1: sh-4.4# ip a show eno1 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether dc:f4:01:e7:5d:84 brd ff:ff:ff:ff:ff:ff inet 10.73.116.54/23 brd 10.73.117.255 scope global dynamic noprefixroute eno1 valid_lft 34688sec preferred_lft 34688sec inet6 2620:52:0:4974:25d3:20de:2f60:293/64 scope global dynamic noprefixroute valid_lft 2591941sec preferred_lft 604741sec inet6 fe80::8e:f470:4074:97e9/64 scope link noprefixroute valid_lft forever preferred_lft forever sh-4.4# ip a show eno2 3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc mq state UP group default qlen 1000 link/ether dc:f4:01:e7:5d:85 brd ff:ff:ff:ff:ff:ff inet 192.168.222.112/24 brd 192.168.222.255 scope global dynamic noprefixroute eno2 valid_lft 1524sec preferred_lft 1524sec inet6 fe80::def4:1ff:fee7:5d85/64 scope link noprefixroute valid_lft forever preferred_lft forever sh-4.4# ip a show br0 14: br0: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default qlen 1000 link/ether b6:5f:07:08:a2:4a brd ff:ff:ff:ff:ff:ff sh-4.4# ip route default via 192.168.222.101 dev eno2 default via 10.73.117.254 dev eno1 proto dhcp metric 100 10.73.116.0/23 dev eno1 proto kernel scope link src 10.73.116.54 metric 100 10.128.0.0/14 dev tun0 scope link 172.30.0.0/16 dev tun0 192.168.222.0/24 dev eno2 proto kernel scope link src 192.168.222.112 metric 101 step 2: Delete sdn pod to make it recreated on this node step 3: Check the logs of sdn new created # oc logs sdn-lf7xb -n openshift-sdn -c sdn | grep -i mtu I0508 14:06:25.729983 398667 node.go:247] Checking default interface MTU step 4: Create one rc and scale 20 pods and found pods can schedule to this node.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438