This bug was initially created as a copy of Bug #2022641 I am copying this bug because: ------------------------------ It's needed for back porting to previous releases Description of problem: ------------------------ ovs-configuration.service fails with error: systemctl status ovs-configuration.service ● ovs-configuration.service - Configures OVS with proper host networking configuration Loaded: loaded (/etc/systemd/system/ovs-configuration.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Fri 2021-11-12 02:38:09 UTC; 6h ago Main PID: 5173 (code=exited, status=10) CPU: 1.321s Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f1-slave-ovs-clone 82423461-b0b1-4cd6-a8ec-6a4dde3f0114 ethernet -- Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn down ovs-if-phys0 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: Connection 'ovs-if-phys0' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/8) Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn up 52eecf5a-df5e-30ae-9ca1-6297f0239027 Nov 12 02:38:09 openshift-worker-3 configure-ovs.sh[5173]: Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/10) Nov 12 02:38:09 openshift-worker-3 configure-ovs.sh[5173]: + exit 10 Nov 12 02:38:09 openshift-worker-3 systemd[1]: ovs-configuration.service: Main process exited, code=exited, status=10/n/a Nov 12 02:38:09 openshift-worker-3 systemd[1]: ovs-configuration.service: Failed with result 'exit-code'. Nov 12 02:38:09 openshift-worker-3 systemd[1]: Failed to start Configures OVS with proper host networking configuration. Nov 12 02:38:09 openshift-worker-3 systemd[1]: ovs-configuration.service: Consumed 1.321s CPU time Problem is that qarning message sent by 'clone_slave_connection' function ends up in a 'new_uuid' variable breaking following command: Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + '[' bond0 '!=' bond0 ']' Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + local new_uuid Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ clone_slave_connection 8950883b-a416-360a-a597-cb308946aaa0 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ local uuid=8950883b-a416-360a-a597-cb308946aaa0 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ local old_name Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: +++ nmcli -g connection.id connection show uuid 8950883b-a416-360a-a597-cb308946aaa0 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ old_name=ens4f0 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ local new_name=ens4f0-slave-ovs-clone Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ nmcli connection show id ens4f0-slave-ovs-clone Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ echo 'WARN: existing ovs slave ens4f0-slave-ovs-clone connection profile file found, overwriting...' Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ nmcli connection delete id ens4f0-slave-ovs-clone Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ nmcli connection clone 8950883b-a416-360a-a597-cb308946aaa0 ens4f0-slave-ovs-clone Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ nmcli -g connection.uuid connection show ens4f0-slave-ovs-clone Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + new_uuid='WARN: existing ovs slave ens4f0-slave-ovs-clone connection profile file found, overwriting... Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: fa5e54c8-c34b-4f4d-8fff-3653f1ed084f' Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn mod uuid WARN: existing ovs slave ens4f0-slave-ovs-clone connection profile file found, overwriting... fa5e54c8-c34b-4f4d-8fff-3653f1ed084f connectio> Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: Error: unknown connection 'WARN:'. Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + handle_exit_error Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + e=10 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + '[' 10 -eq 0 ']' Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + set +e Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli c show Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: NAME UUID TYPE DEVICE Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: bond0 52eecf5a-df5e-30ae-9ca1-6297f0239027 bond bond0 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ovs-if-br-ex 72596cc5-698c-4b27-a494-0bbb06fb6862 ovs-interface br-ex Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: br-ex 66168247-7db4-4466-8997-370639c70d54 ovs-bridge br-ex Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f0 8950883b-a416-360a-a597-cb308946aaa0 ethernet ens4f0 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f1 9b808a8b-13ba-3749-8b8b-6f6f208666a2 ethernet ens4f1 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ovs-if-phys0 a3798cf3-e56b-4a52-893f-41846723f0f5 vlan bond0.373 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ovs-port-br-ex 8207e9c4-0312-4622-ace6-f9c92b9d8773 ovs-port br-ex Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ovs-port-phys0 df4e486e-3e2d-49a0-a5fa-142fa8c8a5ca ovs-port bond0.373 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: Wired connection 1 e22a8deb-684a-3a4f-9e1d-567e3f4546bd ethernet -- Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: bond0.373 3b3cccb0-d7a9-3546-b076-dbc701181e95 vlan -- Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f0-slave-ovs-clone fa5e54c8-c34b-4f4d-8fff-3653f1ed084f ethernet -- Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f1-slave-ovs-clone 82423461-b0b1-4cd6-a8ec-6a4dde3f0114 ethernet -- Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn down ovs-if-phys0 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: Connection 'ovs-if-phys0' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/8) Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn up 52eecf5a-df5e-30ae-9ca1-6297f0239027 Nov 12 02:38:09 openshift-worker-3 configure-ovs.sh[5173]: Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/10) Nov 12 02:38:09 openshift-worker-3 configure-ovs.sh[5173]: + exit 10 Steps to Reproduce: ------------------- 1. Deploy baremetal IPI cluster 2. Perform OVN migration to 2nd interface 3. After node reboot configure-ovs might fail Actual results: --------------- MCO fails to proceed and node stays in SchedulingDisabled state oc logs -n openshift-machine-config-operator -c machine-config-daemon machine-config-daemon-brrhj Trace[1804756265]: [30.001843433s] [30.001843433s] END E1112 09:02:11.116566 16951 reflector.go:138] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfig: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout I1112 09:02:57.376779 16951 trace.go:205] Trace[1042946614]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:134 (12-Nov-2021 09:02:27.375) (total time: 30001ms): Trace[1042946614]: [30.001337111s] [30.001337111s] END E1112 09:02:57.376828 16951 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout Expected results: ----------------- configure-ovs succeeds
Yurii - it looks like you created the fix for this so reassigning this to you.
Verified with 4.10.0-0.nightly-2021-11-15-034648 Fix present in configure-ovs.sh: ``` if nmcli connection show id "${new_name}" &> /dev/null; then echo "WARN: existing ovs slave ${new_name} connection profile file found, overwriting..." >&2 nmcli connection delete id "${new_name}" &> /dev/null fi ``` Rebooting nodes multiple times and issue didn't reproduce systemctl status ovs-configuration.service ● ovs-configuration.service - Configures OVS with proper host networking configuration Loaded: loaded (/etc/systemd/system/ovs-configuration.service; enabled; vendor preset: disabled) Active: inactive (dead) since Thu 2021-11-18 12:24:20 UTC; 1min 0s ago Main PID: 4125 (code=exited, status=0/SUCCESS) CPU: 190ms Nov 18 12:24:20 worker-0-0 configure-ovs.sh[4125]: + '[' -f /etc/ovnk/extra_bridge ']' Nov 18 12:24:20 worker-0-0 configure-ovs.sh[4125]: + nmcli connection show br-ex1 Nov 18 12:24:20 worker-0-0 configure-ovs.sh[4125]: + nmcli connection show ovs-if-phys1 Nov 18 12:24:20 worker-0-0 configure-ovs.sh[4125]: + '[' '!' -f /etc/ovnk/extra_bridge ']' Nov 18 12:24:20 worker-0-0 configure-ovs.sh[4125]: + ovs-vsctl --timeout=30 --if-exists del-br br0 Nov 18 12:24:20 worker-0-0 ovs-vsctl[4147]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 --if-exists del-br br0 Nov 18 12:24:20 worker-0-0 configure-ovs.sh[4125]: + exit 0 Nov 18 12:24:20 worker-0-0 systemd[1]: ovs-configuration.service: Succeeded. Nov 18 12:24:20 worker-0-0 systemd[1]: Started Configures OVS with proper host networking configuration. Nov 18 12:24:20 worker-0-0 systemd[1]: ovs-configuration.service: Consumed 190ms CPU time
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056