Description of problem: OCP 4.7.24 BareMetal IPI installation Bond Network Version-Release number of selected component (if applicable): How reproducible: 80% of the time (Succeeds on one master node, fails on two --> every time the same nodes however, master-1 ALWAYS succeeds, master-0 and master-2 always fail) Steps to Reproduce: 1. configure IPI manifests, prepare dnsmask values/service, deploy cluster with defined bond interface details 2. bootstrap will initialize nodes, all 6 nics will be granted IP addresses and default routes successfully. Restart #1 succeeds, nics remain active. configure-ovs.sh kicks off during provisioning step, ovn comes online and loses the IP data for the interfaces - wipes the IP contents for ens2f0 and ens2f1 and fails to provision the network (and subsequently times out the deployment). 3. ssh into the nodes via provisioning network and observe loss of network data, configure-ovs.sh output differing on master-1 (succeeded/online) and master-0/2 (failed/offline) Actual results: deployment fails, IP/routes are lost when OVS comes up for primary interface ens2f0 Expected results: ens2f0 should retain baremetal IP information and provision successfully Additional info: Case linked includes the following data: Output for failed service on master-0, successful service on master-1 for comparison. DNSmask provisioning data and more - have been able to reproduce this behavior on maybe 10 separate installs with several changes but each time the consistent issue is that the default route will lose it's network connectivity/IP/route information when OVS comes up. The deployment then hangs because it requires a successful link before it can communicate as OVS defines the bond as the primary connection and so the deploys fail out. We are unable to successfully re-run the script even after running a reset-ovs.sh script to rebuild the baseline connection. Every time the script at `/usr/local/bin/configure-ovs.sh` is run, the connections will clear their values and the bond will fail to provision. https://github.com/openshift/machine-config-operator/blob/release-4.7/templates/common/_base/files/configure-ovs-network.yaml#L259 ~~~ Oct 27 16:28:30 master-0 configure-ovs.sh[3721]: + grep manual Oct 27 16:28:30 master-0 configure-ovs.sh[3721]: + nmcli c add type ovs-interface slave-type ovs-port conn.interface br-ex master ovs-port-br-ex con-name ovs-if-br-ex 802-3-ethernet.mtu 9100 802-3-ethernet.cloned> Oct 27 16:28:30 master-0 configure-ovs.sh[3721]: Connection 'ovs-if-br-ex' (<UUID>) successfully added. Oct 27 16:28:30 master-0 configure-ovs.sh[3721]: + counter=0 Oct 27 16:28:30 master-0 configure-ovs.sh[3721]: + '[' 0 -lt 5 ']' Oct 27 16:28:30 master-0 configure-ovs.sh[3721]: + sleep 5 Oct 27 16:28:35 master-0 configure-ovs.sh[3721]: + nmcli --fields GENERAL.STATE conn show ovs-if-br-ex Oct 27 16:28:35 master-0 configure-ovs.sh[3721]: + grep -i activated Oct 27 16:28:35 master-0 configure-ovs.sh[3721]: + counter=1 Oct 27 16:28:35 master-0 configure-ovs.sh[3721]: + '[' 1 -lt 5 ']' Oct 27 16:28:35 master-0 configure-ovs.sh[3721]: + sleep 5 Oct 27 16:28:40 master-0 configure-ovs.sh[3721]: + nmcli --fields GENERAL.STATE conn show ovs-if-br-ex Oct 27 16:28:40 master-0 configure-ovs.sh[3721]: + grep -i activated Oct 27 16:28:40 master-0 configure-ovs.sh[3721]: + counter=2 Oct 27 16:28:40 master-0 configure-ovs.sh[3721]: + '[' 2 -lt 5 ']' Oct 27 16:28:40 master-0 configure-ovs.sh[3721]: + sleep 5 Oct 27 16:28:45 master-0 configure-ovs.sh[3721]: + nmcli --fields GENERAL.STATE conn show ovs-if-br-ex Oct 27 16:28:45 master-0 configure-ovs.sh[3721]: + grep -i activated Oct 27 16:28:45 master-0 configure-ovs.sh[3721]: + counter=3 Oct 27 16:28:45 master-0 configure-ovs.sh[3721]: + '[' 3 -lt 5 ']' Oct 27 16:28:45 master-0 configure-ovs.sh[3721]: + sleep 5 Oct 27 16:28:50 master-0 configure-ovs.sh[3721]: + nmcli --fields GENERAL.STATE conn show ovs-if-br-ex Oct 27 16:28:50 master-0 configure-ovs.sh[3721]: + grep -i activated Oct 27 16:28:50 master-0 configure-ovs.sh[3721]: + counter=4 Oct 27 16:28:50 master-0 configure-ovs.sh[3721]: + '[' 4 -lt 5 ']' Oct 27 16:28:50 master-0 configure-ovs.sh[3721]: + sleep 5 Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: + nmcli --fields GENERAL.STATE conn show ovs-if-br-ex Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: + grep -i activated Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: + counter=5 Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: + '[' 5 -lt 5 ']' Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: + echo 'WARN: OVS did not succesfully activate NM connection. Attempting to bring up connections' Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: WARN: OVS did not succesfully activate NM connection. Attempting to bring up connections Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: + counter=0 Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: + '[' 0 -lt 5 ']' Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: + nmcli conn up ovs-if-br-ex Oct 27 16:29:40 master-0 configure-ovs.sh[3721]: Error: Connection activation failed: IP configuration could not be reserved (no available address, timeout, etc.) Oct 27 16:29:40 master-0 configure-ovs.sh[3721]: Hint: use 'journalctl -xe NM_CONNECTION=<UUID> + NM_DEVICE=br-ex' to get more details. Oct 27 16:29:40 master-0 configure-ovs.sh[3721]: + sleep 5 ~~~
*** Bug 2013438 has been marked as a duplicate of this bug. ***
*** This bug has been marked as a duplicate of bug 1975174 ***