Description of problem: 4.5 to 4.6 upgrade fails when external network is configured on a bond device: ovs-configuration service fails and node becomes unreachable This issue was observed on a baremetal IPI deployment with the nodes having the following NICs layout: nic1: provisioning network nic2: bond0 member nic3: bond0 member bond0: external network Example from one of the master nodes: 2: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:b0:88:f0 brd ff:ff:ff:ff:ff:ff inet6 fd00:1101::3/64 scope global dynamic valid_lft 10sec preferred_lft 10sec inet6 fe80::fb54:adad:caa3:615d/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: enp5s0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond0 state UP group default qlen 1000 link/ether 52:54:00:b6:79:9c brd ff:ff:ff:ff:ff:ff 4: enp6s0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond0 state UP group default qlen 1000 link/ether 52:54:00:b6:79:9c brd ff:ff:ff:ff:ff:ff 5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:b6:79:9c brd ff:ff:ff:ff:ff:ff inet 192.168.123.121/24 brd 192.168.123.255 scope global dynamic noprefixroute bond0 valid_lft 3124sec preferred_lft 3124sec inet6 fe80::5054:ff:feb6:799c/64 scope link valid_lft forever preferred_lft forever When upgrading to 4.6, during machine-config operator upgrade, after the reboot of the first node the upgrade process is blocked because the worker node loses connectivity over the external network. Looking at the worker logs the failure is caused by ovs-configuration service failure: -- Logs begin at Mon 2020-10-12 15:30:04 UTC, end at Mon 2020-10-12 18:34:43 UTC. -- Oct 12 16:48:44 worker-0-0 systemd[1]: Starting Configures OVS with proper host networking configuration... Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + touch /var/run/ovs-config-executed Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + '[' OVNKubernetes == OVNKubernetes ']' Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + NM_CONN_PATH=/etc/NetworkManager/system-connections Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + iface= Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + counter=0 Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + '[' 0 -lt 12 ']' Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: ++ ip route show default Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: ++ awk '{if ($4 == "dev") print $5; exit}' Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + iface=bond0 Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + [[ -n bond0 ]] Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + echo 'IPv4 Default gateway interface found: bond0' Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: IPv4 Default gateway interface found: bond0 Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + break Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + '[' bond0 = br-ex ']' Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + '[' -z bond0 ']' Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + iface_mac=52:54:00:1f:c4:d7 Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + echo 'MAC address found for iface: bond0: 52:54:00:1f:c4:d7' Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: MAC address found for iface: bond0: 52:54:00:1f:c4:d7 Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: ++ awk '{print $5; exit}' Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: ++ ip link show bond0 Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + iface_mtu=1500 Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + [[ -z 1500 ]] Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + echo 'MTU found for iface: bond0: 1500' Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: MTU found for iface: bond0: 1500 Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + nmcli connection show br-ex Oct 12 16:48:44 worker-0-0 configure-ovs.sh[2095]: + nmcli c add type ovs-bridge conn.interface br-ex con-name br-ex 802-3-ethernet.mtu 1500 802-3-ethernet.cloned-mac-address 52:54:00:1f:c4:d7 Oct 12 16:48:45 worker-0-0 configure-ovs.sh[2095]: Connection 'br-ex' (f7dfecd3-2069-411c-b2b3-eaceed0c0fa4) successfully added. Oct 12 16:48:45 worker-0-0 configure-ovs.sh[2095]: ++ nmcli --fields UUID,DEVICE conn show --active Oct 12 16:48:45 worker-0-0 configure-ovs.sh[2095]: ++ awk '{print $1}' Oct 12 16:48:45 worker-0-0 configure-ovs.sh[2095]: ++ grep bond0 Oct 12 16:48:45 worker-0-0 configure-ovs.sh[2095]: + old_conn=ad33d8b0-1f7b-cab9-9447-ba07f855b143 Oct 12 16:48:45 worker-0-0 configure-ovs.sh[2095]: + nmcli connection show ovs-port-phys0 Oct 12 16:48:45 worker-0-0 configure-ovs.sh[2095]: + nmcli c add type ovs-port conn.interface bond0 master br-ex con-name ovs-port-phys0 Oct 12 16:48:45 worker-0-0 configure-ovs.sh[2095]: Connection 'ovs-port-phys0' (0d8ba368-c8c0-4cd7-b796-8f6a061cb1bf) successfully added. Oct 12 16:48:45 worker-0-0 configure-ovs.sh[2095]: + nmcli connection show ovs-port-br-ex Oct 12 16:48:45 worker-0-0 configure-ovs.sh[2095]: + nmcli c add type ovs-port conn.interface br-ex master br-ex con-name ovs-port-br-ex Oct 12 16:48:45 worker-0-0 configure-ovs.sh[2095]: Connection 'ovs-port-br-ex' (5620c242-73ec-4d6c-af4e-fd8827bc92f8) successfully added. Oct 12 16:48:45 worker-0-0 configure-ovs.sh[2095]: + nmcli device disconnect bond0 Oct 12 16:48:45 worker-0-0 configure-ovs.sh[2095]: Device 'bond0' successfully disconnected. Oct 12 16:48:45 localhost.localdomain configure-ovs.sh[2095]: + nmcli connection show ovs-if-phys0 Oct 12 16:48:45 localhost.localdomain configure-ovs.sh[2095]: + nmcli c add type 802-3-ethernet conn.interface bond0 master ovs-port-phys0 con-name ovs-if-phys0 connection.autoconnect-priority 100 802-3-ethernet.mtu 1500 Oct 12 16:48:45 localhost.localdomain configure-ovs.sh[2095]: Connection 'ovs-if-phys0' (e510ddbf-a2eb-4c10-a914-c4d080d78801) successfully added. Oct 12 16:48:45 localhost.localdomain configure-ovs.sh[2095]: + nmcli conn up ovs-if-phys0 Oct 12 16:48:45 localhost.localdomain configure-ovs.sh[2095]: Error: Connection activation failed: No suitable device found for this connection (device enp4s0 not available because profile is not compatible with device (mismatching interface name)). Oct 12 16:48:45 localhost.localdomain systemd[1]: ovs-configuration.service: Main process exited, code=exited, status=4/NOPERMISSION Oct 12 16:48:45 localhost.localdomain systemd[1]: ovs-configuration.service: Failed with result 'exit-code'. Oct 12 16:48:45 localhost.localdomain systemd[1]: Failed to start Configures OVS with proper host networking configuration. Oct 12 16:48:45 localhost.localdomain systemd[1]: ovs-configuration.service: Consumed 285ms CPU time Version-Release number of selected component (if applicable): 4.6.0-rc.2 How reproducible: 100% Steps to Reproduce: 1. Deploy 4.5 via baremetal IPI flow with nodes having the external network configured on top of a bond device. Initial bond configuration was set up via Ignition files: { "ignition": { "version": "2.3.0" }, "storage": { "files": [ { "path": "/etc/sysconfig/network-scripts/ifcfg-enp5s0", "filesystem": "root", "mode": 436, "contents": { "source": "data:text/plain;charset=utf-8;base64,REVWSUNFPWVucDVzMApCT09UUFJPVE89bm9uZQpPTkJPT1Q9eWVzCk1BU1RFUj1ib25kMApTTEFWRT15ZXM=" } }, { "path": "/etc/sysconfig/network-scripts/ifcfg-enp6s0", "filesystem": "root", "mode": 436, "contents": { "source": "data:text/plain;charset=utf-8;base64,REVWSUNFPWVucDZzMApCT09UUFJPVE89bm9uZQpPTkJPT1Q9eWVzCk1BU1RFUj1ib25kMApTTEFWRT15ZXMK" } }, { "path": "/etc/sysconfig/network-scripts/ifcfg-bond0", "filesystem": "root", "mode": 436, "contents": { "source": "data:text/plain;charset=utf-8;base64,Qk9ORElOR19PUFRTPWRvd25kZWxheT0wIGxhY3BfcmF0ZT1mYXN0IG1paW1vbj0xMDAgbW9kZT04MDIuM2FkIHVwZGVsYXk9MApUWVBFPUJvbmQKQk9ORElOR19NQVNURVI9eWVzCkJPT1RQUk9UTz1kaGNwCk5BTUU9Ym9uZDAKREVWSUNFPWJvbmQwCk9OQk9PVD15ZXM=" } } ] } } 2. Upgrade to 4.6.0-rc.2 Actual results: Upgrade fails and leaves one of the worker nodes without external network connectivity Expected results: Upgrade succeeds Additional info: nmcli and ip a output from the worker node which lost connectivity: [core@localhost ~]$ nmcli con NAME UUID TYPE DEVICE Wired connection 1 95ea186e-e3b7-3399-811a-80ea135e5e82 ethernet enp4s0 br-ex f7dfecd3-2069-411c-b2b3-eaceed0c0fa4 ovs-bridge br-ex ovs-port-br-ex 5620c242-73ec-4d6c-af4e-fd8827bc92f8 ovs-port br-ex ovs-port-phys0 0d8ba368-c8c0-4cd7-b796-8f6a061cb1bf ovs-port bond0 bond0 ad33d8b0-1f7b-cab9-9447-ba07f855b143 bond -- ovs-if-phys0 e510ddbf-a2eb-4c10-a914-c4d080d78801 ethernet -- System enp5s0 9310e179-14b6-430a-6843-6491c047d532 ethernet -- System enp6s0 b43fa2aa-5a85-7b0a-9a20-469067dba6d6 ethernet -- [core@localhost ~]$ [core@localhost ~]$ nmcli dev DEVICE TYPE STATE CONNECTION enp4s0 ethernet connected Wired connection 1 br-ex ovs-bridge connected br-ex bond0 ovs-port connected ovs-port-phys0 br-ex ovs-port connected ovs-port-br-ex enp5s0 ethernet disconnected -- enp6s0 ethernet disconnected -- lo loopback unmanaged -- [core@localhost ~]$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:39:7a:a9 brd ff:ff:ff:ff:ff:ff inet6 fd00:1101::54/128 scope global dynamic noprefixroute valid_lft 2893sec preferred_lft 2893sec inet6 fe80::cdb1:e04b:f7c:b20e/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:1f:c4:d7 brd ff:ff:ff:ff:ff:ff 4: enp6s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:d5:b0:0f brd ff:ff:ff:ff:ff:ff find /etc/NetworkManager/system-connections/ -type f -print -exec cat {} \; /etc/NetworkManager/system-connections/br-ex.nmconnection [connection] id=br-ex uuid=f7dfecd3-2069-411c-b2b3-eaceed0c0fa4 type=ovs-bridge interface-name=br-ex permissions= [ethernet] cloned-mac-address=52:54:00:1F:C4:D7 mac-address-blacklist= mtu=1500 [ovs-bridge] [ipv4] dns-search= method=auto [ipv6] addr-gen-mode=stable-privacy dns-search= method=auto [proxy] /etc/NetworkManager/system-connections/ovs-port-phys0.nmconnection [connection] id=ovs-port-phys0 uuid=0d8ba368-c8c0-4cd7-b796-8f6a061cb1bf type=ovs-port interface-name=bond0 master=br-ex permissions= slave-type=ovs-bridge [ovs-port] /etc/NetworkManager/system-connections/ovs-port-br-ex.nmconnection [connection] id=ovs-port-br-ex uuid=5620c242-73ec-4d6c-af4e-fd8827bc92f8 type=ovs-port interface-name=br-ex master=br-ex permissions= slave-type=ovs-bridge [ovs-port] /etc/NetworkManager/system-connections/ovs-if-phys0.nmconnection [connection] id=ovs-if-phys0 uuid=e510ddbf-a2eb-4c10-a914-c4d080d78801 type=ethernet autoconnect-priority=100 interface-name=bond0 master=0d8ba368-c8c0-4cd7-b796-8f6a061cb1bf permissions= slave-type=ovs-port [ethernet] mac-address-blacklist= mtu=1500 [ovs-interface] type=system
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633