Bug 2018003 - ovs-configuration service fails when external network is configured via dnsmask on a bond device on a baremetal IPI deployment
Summary: ovs-configuration service fails when external network is configured via dnsma...
Keywords:
Status: CLOSED DUPLICATE of bug 1975174
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Jaime Caamaño Ruiz
QA Contact: Anurag saxena
URL:
Whiteboard:
: 2013438 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-27 22:19 UTC by Will Russell
Modified: 2021-12-03 15:36 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-03 18:47:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Will Russell 2021-10-27 22:19:09 UTC
Description of problem:

OCP 4.7.24
BareMetal IPI installation
Bond Network

Version-Release number of selected component (if applicable):


How reproducible:
80% of the time (Succeeds on one master node, fails on two --> every time the same nodes however, master-1 ALWAYS succeeds, master-0 and master-2 always fail)

Steps to Reproduce:
1. configure IPI manifests, prepare dnsmask values/service, deploy cluster with defined bond interface details
2. bootstrap will initialize nodes, all 6 nics will be granted IP addresses and default routes successfully. Restart #1 succeeds, nics remain active. configure-ovs.sh kicks off during provisioning step, ovn comes online and loses the IP data for the interfaces - wipes the IP contents for ens2f0 and ens2f1 and fails to provision the network (and subsequently times out the deployment).
3. ssh into the nodes via provisioning network and observe loss of network data, configure-ovs.sh output differing on master-1 (succeeded/online) and master-0/2 (failed/offline)

Actual results:

deployment fails, IP/routes are lost when OVS comes up for primary interface ens2f0

Expected results:

ens2f0 should retain baremetal IP information and provision successfully

Additional info:

Case linked includes the following data:

Output for failed service on master-0, successful service on master-1 for comparison. DNSmask provisioning data and more - have been able to reproduce this behavior on maybe 10 separate installs with several changes but each time the consistent issue is that the default route will lose it's network connectivity/IP/route information when OVS comes up. The deployment then hangs because it requires a successful link before it can communicate as OVS defines the bond as the primary connection and so the deploys fail out. We are unable to successfully re-run the script even after running a reset-ovs.sh script to rebuild the baseline connection. Every time the script at `/usr/local/bin/configure-ovs.sh` is run, the connections will clear their values and the bond will fail to provision.

https://github.com/openshift/machine-config-operator/blob/release-4.7/templates/common/_base/files/configure-ovs-network.yaml#L259

~~~
Oct 27 16:28:30 master-0 configure-ovs.sh[3721]: + grep manual
Oct 27 16:28:30 master-0 configure-ovs.sh[3721]: + nmcli c add type ovs-interface slave-type ovs-port conn.interface br-ex master ovs-port-br-ex con-name ovs-if-br-ex 802-3-ethernet.mtu 9100 802-3-ethernet.cloned>
Oct 27 16:28:30 master-0 configure-ovs.sh[3721]: Connection 'ovs-if-br-ex' (<UUID>) successfully added.
Oct 27 16:28:30 master-0 configure-ovs.sh[3721]: + counter=0
Oct 27 16:28:30 master-0 configure-ovs.sh[3721]: + '[' 0 -lt 5 ']'
Oct 27 16:28:30 master-0 configure-ovs.sh[3721]: + sleep 5
Oct 27 16:28:35 master-0 configure-ovs.sh[3721]: + nmcli --fields GENERAL.STATE conn show ovs-if-br-ex
Oct 27 16:28:35 master-0 configure-ovs.sh[3721]: + grep -i activated
Oct 27 16:28:35 master-0 configure-ovs.sh[3721]: + counter=1
Oct 27 16:28:35 master-0 configure-ovs.sh[3721]: + '[' 1 -lt 5 ']'
Oct 27 16:28:35 master-0 configure-ovs.sh[3721]: + sleep 5
Oct 27 16:28:40 master-0 configure-ovs.sh[3721]: + nmcli --fields GENERAL.STATE conn show ovs-if-br-ex
Oct 27 16:28:40 master-0 configure-ovs.sh[3721]: + grep -i activated
Oct 27 16:28:40 master-0 configure-ovs.sh[3721]: + counter=2
Oct 27 16:28:40 master-0 configure-ovs.sh[3721]: + '[' 2 -lt 5 ']'
Oct 27 16:28:40 master-0 configure-ovs.sh[3721]: + sleep 5
Oct 27 16:28:45 master-0 configure-ovs.sh[3721]: + nmcli --fields GENERAL.STATE conn show ovs-if-br-ex
Oct 27 16:28:45 master-0 configure-ovs.sh[3721]: + grep -i activated
Oct 27 16:28:45 master-0 configure-ovs.sh[3721]: + counter=3
Oct 27 16:28:45 master-0 configure-ovs.sh[3721]: + '[' 3 -lt 5 ']'
Oct 27 16:28:45 master-0 configure-ovs.sh[3721]: + sleep 5
Oct 27 16:28:50 master-0 configure-ovs.sh[3721]: + nmcli --fields GENERAL.STATE conn show ovs-if-br-ex
Oct 27 16:28:50 master-0 configure-ovs.sh[3721]: + grep -i activated
Oct 27 16:28:50 master-0 configure-ovs.sh[3721]: + counter=4
Oct 27 16:28:50 master-0 configure-ovs.sh[3721]: + '[' 4 -lt 5 ']'
Oct 27 16:28:50 master-0 configure-ovs.sh[3721]: + sleep 5
Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: + nmcli --fields GENERAL.STATE conn show ovs-if-br-ex
Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: + grep -i activated
Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: + counter=5
Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: + '[' 5 -lt 5 ']'
Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: + echo 'WARN: OVS did not succesfully activate NM connection. Attempting to bring up connections'
Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: WARN: OVS did not succesfully activate NM connection. Attempting to bring up connections
Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: + counter=0
Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: + '[' 0 -lt 5 ']'
Oct 27 16:28:55 master-0 configure-ovs.sh[3721]: + nmcli conn up ovs-if-br-ex
Oct 27 16:29:40 master-0 configure-ovs.sh[3721]: Error: Connection activation failed: IP configuration could not be reserved (no available address, timeout, etc.)
Oct 27 16:29:40 master-0 configure-ovs.sh[3721]: Hint: use 'journalctl -xe NM_CONNECTION=<UUID> + NM_DEVICE=br-ex' to get more details.
Oct 27 16:29:40 master-0 configure-ovs.sh[3721]: + sleep 5
~~~

Comment 1 Will Russell 2021-10-27 22:21:02 UTC
*** Bug 2013438 has been marked as a duplicate of this bug. ***

Comment 10 Will Russell 2021-11-03 18:47:11 UTC

*** This bug has been marked as a duplicate of bug 1975174 ***


Note You need to log in before you can comment on or make changes to this bug.