Description of problem: After second boot, When MCO tries to configure network, the script configure-ovs.sh fails. Version-Release number of selected component (if applicable): Openshift 4.6.16 How reproducible: Steps to Reproduce: 1. provision node as described in documentation with kernel parameters 2. Wait for image write on disk and after reboot the script configure-ovs.sh fails Actual results: The configure-ovs.sh script fails and prevend OVN br-ex to be created Expected results: The script ends successfully Additional info: Dracut puts `master={BOND_UUID}` in slave interfaces config files. I tested to change uuid by interfaces names (to avoid script crash) but after restart, NetworkManager didn't set the bond up. UUID seems to be the proper way to refer master. The configure-ovs.sh script at line 183 select interfaces files by greping bond uuid: ` if egrep -l --include=*.nmconnection $old_conn ${NM_CONN_PATH}/*; then`. This matches 3 interface files (the bond and it's two slaves) because in slave `master={BOND_UUID}` is written. The script then fails to cp 3 files in one at line 202 `cp -f ${old_conn_file} ${new_conn_file}`
Verified script changes landed in 4.8.0-0.nightly-2021-03-03-072205. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-03-03-072205 True False 17m Cluster version is 4.8.0-0.nightly-2021-03-03-072205 $ oc get node NAME STATUS ROLES AGE VERSION ip-10-0-136-105.us-west-2.compute.internal Ready worker 29m v1.20.0+2ce2be0 ip-10-0-138-238.us-west-2.compute.internal Ready master 38m v1.20.0+2ce2be0 ip-10-0-173-154.us-west-2.compute.internal Ready master 38m v1.20.0+2ce2be0 ip-10-0-177-34.us-west-2.compute.internal Ready worker 29m v1.20.0+2ce2be0 ip-10-0-210-181.us-west-2.compute.internal Ready worker 30m v1.20.0+2ce2be0 ip-10-0-220-196.us-west-2.compute.internal Ready master 38m v1.20.0+2ce2be0 $ oc debug node/ip-10-0-210-181.us-west-2.compute.internal Starting pod/ip-10-0-210-181us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# cat /usr/local/bin/configure-ovs.sh #!/bin/bash set -eux # Workaround to ensure OVS is installed due to bug in systemd Requires: # https://bugzilla.redhat.com/show_bug.cgi?id=1888017 copy_nm_conn_files() { src_path="/etc/NetworkManager/system-connections-merged" dst_path="/etc/NetworkManager/system-connections" if [ -d $src_path ]; then echo "$src_path exists" fileList=$(echo {br-ex,ovs-if-br-ex,ovs-port-br-ex,ovs-if-phys0,ovs-port-phys0}.nmconnection) for file in ${fileList[*]}; do if [ ! -f $dst_path/$file ]; then cp $src_path/$file $dst_path/$file else echo "Skipping $file since it exists in $dst_path" fi done fi } if ! rpm -qa | grep -q openvswitch; then echo "Warning: Openvswitch package is not installed!" exit 1 fi if [ "$1" == "OVNKubernetes" ]; then # Configures NICs onto OVS bridge "br-ex" # Configuration is either auto-detected or provided through a config file written already in Network Manager # key files under /etc/NetworkManager/system-connections/ # Managing key files is outside of the scope of this script # if the interface is of type vmxnet3 add multicast capability for that driver # REMOVEME: Once BZ:1854355 is fixed, this needs to get removed. function configure_driver_options { intf=$1 driver=$(cat "/sys/class/net/${intf}/device/uevent" | grep DRIVER | awk -F "=" '{print $2}') echo "Driver name is" $driver if [ "$driver" = "vmxnet3" ]; then ifconfig "$intf" allmulti fi } if [ -d "/etc/NetworkManager/system-connections-merged" ]; then NM_CONN_PATH="/etc/NetworkManager/system-connections-merged" else NM_CONN_PATH="/etc/NetworkManager/system-connections" fi iface="" counter=0 # find default interface while [ $counter -lt 12 ]; do # check ipv4 iface=$(ip route show default | awk '{ if ($4 == "dev") { print $5; exit } }') if [[ -n "$iface" ]]; then echo "IPv4 Default gateway interface found: ${iface}" break fi # check ipv6 iface=$(ip -6 route show default | awk '{ if ($4 == "dev") { print $5; exit } }') if [[ -n "$iface" ]]; then echo "IPv6 Default gateway interface found: ${iface}" break fi counter=$((counter+1)) echo "No default route found on attempt: ${counter}" sleep 5 done if [ "$iface" = "br-ex" ]; then # handle vlans and bonds etc if they have already been # configured via nm key files and br-ex is already up ifaces=$(ovs-vsctl list-ifaces ${iface}) for intf in $ifaces; do configure_driver_options $intf; done echo "Networking already configured and up for br-ex!" # remove bridges created by openshift-sdn ovs-vsctl --timeout=30 --if-exists del-br br0 exit 0 fi if [ -z "$iface" ]; then echo "ERROR: Unable to find default gateway interface" exit 1 fi # find the MAC from OVS config or the default interface to use for OVS internal port # this prevents us from getting a different DHCP lease and dropping connection if ! iface_mac=$(<"/sys/class/net/${iface}/address"); then echo "Unable to determine default interface MAC" exit 1 fi echo "MAC address found for iface: ${iface}: ${iface_mac}" # find MTU from original iface iface_mtu=$(ip link show "$iface" | awk '{print $5; exit}') if [[ -z "$iface_mtu" ]]; then echo "Unable to determine default interface MTU, defaulting to 1500" iface_mtu=1500 else echo "MTU found for iface: ${iface}: ${iface_mtu}" fi # store old conn for later old_conn=$(nmcli --fields UUID,DEVICE conn show --active | awk "/\s${iface}\s*\$/ {print \$1}") extra_brex_args="" # check for dhcp client ids dhcp_client_id=$(nmcli --get-values ipv4.dhcp-client-id conn show ${old_conn}) if [ -n "$dhcp_client_id" ]; then extra_brex_args+="ipv4.dhcp-client-id ${dhcp_client_id} " fi dhcp6_client_id=$(nmcli --get-values ipv6.dhcp-duid conn show ${old_conn}) if [ -n "$dhcp6_client_id" ]; then extra_brex_args+="ipv6.dhcp-duid ${dhcp6_client_id} " fi # create bridge; use NM's ethernet device default route metric (100) if ! nmcli connection show br-ex &> /dev/null; then nmcli c add type ovs-bridge \ con-name br-ex \ conn.interface br-ex \ 802-3-ethernet.mtu ${iface_mtu} \ 802-3-ethernet.cloned-mac-address ${iface_mac} \ ipv4.route-metric 100 \ ipv6.route-metric 100 \ ${extra_brex_args} fi # find default port to add to bridge if ! nmcli connection show ovs-port-phys0 &> /dev/null; then nmcli c add type ovs-port conn.interface ${iface} master br-ex con-name ovs-port-phys0 fi if ! nmcli connection show ovs-port-br-ex &> /dev/null; then nmcli c add type ovs-port conn.interface br-ex master br-ex con-name ovs-port-br-ex fi extra_phys_args="" # check if this interface is a vlan, bond, or ethernet type if [ $(nmcli --get-values connection.type conn show ${old_conn}) == "vlan" ]; then iface_type=vlan vlan_id=$(nmcli --get-values vlan.id conn show ${old_conn}) if [ -z "$vlan_id" ]; then echo "ERROR: unable to determine vlan_id for vlan connection: ${old_conn}" exit 1 fi vlan_parent=$(nmcli --get-values vlan.parent conn show ${old_conn}) if [ -z "$vlan_parent" ]; then echo "ERROR: unable to determine vlan_parent for vlan connection: ${old_conn}" exit 1 fi extra_phys_args="dev ${vlan_parent} id ${vlan_id}" elif [ $(nmcli --get-values connection.type conn show ${old_conn}) == "bond" ]; then iface_type=bond # check bond options bond_opts=$(nmcli --get-values bond.options conn show ${old_conn}) if [ -n "$bond_opts" ]; then extra_phys_args+="bond.options ${bond_opts} " fi else iface_type=802-3-ethernet fi # bring down any old iface nmcli device disconnect $iface if ! nmcli connection show ovs-if-phys0 &> /dev/null; then nmcli c add type ${iface_type} conn.interface ${iface} master ovs-port-phys0 con-name ovs-if-phys0 \ connection.autoconnect-priority 100 802-3-ethernet.mtu ${iface_mtu} ${extra_phys_args} fi nmcli conn up ovs-if-phys0 if ! nmcli connection show ovs-if-br-ex &> /dev/null; then if nmcli --fields ipv4.method,ipv6.method conn show $old_conn | grep manual; then echo "Static IP addressing detected on default gateway connection: ${old_conn}" # find and copy the old connection to get the address settings if egrep -l --include=*.nmconnection uuid=$old_conn ${NM_CONN_PATH}/*; then old_conn_file=$(egrep -l --include=*.nmconnection uuid=$old_conn ${NM_CONN_PATH}/*) cloned=false else echo "WARN: unable to find NM configuration file for conn: ${old_conn}. Attempting to clone conn" old_conn_file=${NM_CONN_PATH}/${old_conn}-clone.nmconnection nmcli conn clone ${old_conn} ${old_conn}-clone cloned=true if [ ! -f "$old_conn_file" ]; then echo "ERROR: unable to locate cloned conn file: ${old_conn_file}" exit 1 fi echo "Successfully cloned conn to ${old_conn_file}" fi echo "old connection file found at: ${old_conn_file}" new_conn_file=${NM_CONN_PATH}/ovs-if-br-ex.nmconnection if [ -f "$new_conn_file" ]; then echo "WARN: existing br-ex interface file found: $new_conn_file, which is not loaded in NetworkManager...overwriting" fi cp -f ${old_conn_file} ${new_conn_file} restorecon ${new_conn_file} if $cloned; then nmcli conn delete ${old_conn}-clone rm -f ${old_conn_file} fi ovs_port_conn=$(nmcli --fields connection.uuid conn show ovs-port-br-ex | awk '{print $2}') br_iface_uuid=$(cat /proc/sys/kernel/random/uuid) # modify file to work with OVS and have unique settings sed -i '/^\[connection\]$/,/^\[/ s/^uuid=.*$/uuid='"$br_iface_uuid"'/' ${new_conn_file} sed -i '/^multi-connect=.*$/d' ${new_conn_file} sed -i '/^\[connection\]$/,/^\[/ s/^type=.*$/type=ovs-interface/' ${new_conn_file} sed -i '/^\[connection\]$/,/^\[/ s/^id=.*$/id=ovs-if-br-ex/' ${new_conn_file} sed -i '/^\[connection\]$/a slave-type=ovs-port' ${new_conn_file} sed -i '/^\[connection\]$/a master='"$ovs_port_conn" ${new_conn_file} if grep 'interface-name=' ${new_conn_file} &> /dev/null; then sed -i '/^\[connection\]$/,/^\[/ s/^interface-name=.*$/interface-name=br-ex/' ${new_conn_file} else sed -i '/^\[connection\]$/a interface-name=br-ex' ${new_conn_file} fi if ! grep 'cloned-mac-address=' ${new_conn_file} &> /dev/null; then sed -i '/^\[ethernet\]$/,/^\[/ s/^cloned-mac-address=.*$/cloned-mac-address='"$iface_mac"'/' ${new_conn_file} else sed -i '/^\[ethernet\]$/a cloned-mac-address='"$iface_mac" ${new_conn_file} fi if grep 'mtu=' ${new_conn_file} &> /dev/null; then sed -i '/^\[ethernet\]$/,/^\[/ s/^mtu=.*$/mtu='"$iface_mtu"'/' ${new_conn_file} else sed -i '/^\[ethernet\]$/a mtu='"$iface_mtu" ${new_conn_file} fi cat <<EOF >> ${new_conn_file} [ovs-interface] type=internal EOF nmcli c load ${new_conn_file} echo "Loaded new ovs-if-br-ex connection file: ${new_conn_file}" else nmcli c add type ovs-interface slave-type ovs-port conn.interface br-ex master ovs-port-br-ex con-name \ ovs-if-br-ex 802-3-ethernet.mtu ${iface_mtu} 802-3-ethernet.cloned-mac-address ${iface_mac} \ ipv4.route-metric 100 ipv6.route-metric 100 fi fi # wait for DHCP to finish, verify connection is up counter=0 while [ $counter -lt 5 ]; do sleep 5 # check if connection is active if nmcli --fields GENERAL.STATE conn show ovs-if-br-ex | grep -i "activated"; then echo "OVS successfully configured" copy_nm_conn_files ip a show br-ex ip route show configure_driver_options ${iface} exit 0 fi counter=$((counter+1)) done echo "WARN: OVS did not succesfully activate NM connection. Attempting to bring up connections" counter=0 while [ $counter -lt 5 ]; do if nmcli conn up ovs-if-br-ex; then echo "OVS successfully configured" copy_nm_conn_files ip a show br-ex ip route show configure_driver_options ${iface} exit 0 fi sleep 5 counter=$((counter+1)) done echo "ERROR: Failed to activate ovs-if-br-ex NM connection" # if we made it here networking isnt coming up, revert for debugging set +e nmcli conn down ovs-if-br-ex nmcli conn down ovs-if-phys0 nmcli conn up $old_conn exit 1 elif [ "$1" == "OpenShiftSDN" ]; then # Revert changes made by /usr/local/bin/configure-ovs.sh. # Remove OVS bridge "br-ex". Use the default NIC for cluster network. iface="" if nmcli connection show ovs-port-phys0 &> /dev/null; then iface=$(nmcli --get-values connection.interface-name connection show ovs-port-phys0) nmcli c del ovs-port-phys0 fi if nmcli connection show ovs-if-phys0 &> /dev/null; then nmcli c del ovs-if-phys0 fi if nmcli connection show ovs-port-br-ex &> /dev/null; then nmcli c del ovs-port-br-ex fi if nmcli connection show ovs-if-br-ex &> /dev/null; then nmcli c del ovs-if-br-ex fi if nmcli connection show br-ex &> /dev/null; then nmcli c del br-ex fi rm -f /etc/NetworkManager/system-connections/{br-ex,ovs-if-br-ex,ovs-port-br-ex,ovs-if-phys0,ovs-port-phys0}.nmconnection # remove bridges created by ovn-kubernetes, try to delete br-ex again in case NM fail to talk to ovsdb ovs-vsctl --timeout=30 --if-exists del-br br-int -- --if-exists del-br br-local -- --if-exists del-br br-ex if [[ -n "$iface" ]]; then nmcli device connect $iface fi fi sh-4.4#
KCS article created to describe this problem: https://access.redhat.com/solutions/5995781.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days