+++ This bug was initially created as a clone of Bug #2022641 +++ Description of problem: ------------------------ ovs-configuration.service fails with error: systemctl status ovs-configuration.service ● ovs-configuration.service - Configures OVS with proper host networking configuration Loaded: loaded (/etc/systemd/system/ovs-configuration.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Fri 2021-11-12 02:38:09 UTC; 6h ago Main PID: 5173 (code=exited, status=10) CPU: 1.321s Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f1-slave-ovs-clone 82423461-b0b1-4cd6-a8ec-6a4dde3f0114 ethernet -- Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn down ovs-if-phys0 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: Connection 'ovs-if-phys0' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/8) Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn up 52eecf5a-df5e-30ae-9ca1-6297f0239027 Nov 12 02:38:09 openshift-worker-3 configure-ovs.sh[5173]: Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/10) Nov 12 02:38:09 openshift-worker-3 configure-ovs.sh[5173]: + exit 10 Nov 12 02:38:09 openshift-worker-3 systemd[1]: ovs-configuration.service: Main process exited, code=exited, status=10/n/a Nov 12 02:38:09 openshift-worker-3 systemd[1]: ovs-configuration.service: Failed with result 'exit-code'. Nov 12 02:38:09 openshift-worker-3 systemd[1]: Failed to start Configures OVS with proper host networking configuration. Nov 12 02:38:09 openshift-worker-3 systemd[1]: ovs-configuration.service: Consumed 1.321s CPU time Problem is that qarning message sent by 'clone_slave_connection' function ends up in a 'new_uuid' variable breaking following command: Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + '[' bond0 '!=' bond0 ']' Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + local new_uuid Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ clone_slave_connection 8950883b-a416-360a-a597-cb308946aaa0 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ local uuid=8950883b-a416-360a-a597-cb308946aaa0 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ local old_name Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: +++ nmcli -g connection.id connection show uuid 8950883b-a416-360a-a597-cb308946aaa0 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ old_name=ens4f0 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ local new_name=ens4f0-slave-ovs-clone Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ nmcli connection show id ens4f0-slave-ovs-clone Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ echo 'WARN: existing ovs slave ens4f0-slave-ovs-clone connection profile file found, overwriting...' Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ nmcli connection delete id ens4f0-slave-ovs-clone Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ nmcli connection clone 8950883b-a416-360a-a597-cb308946aaa0 ens4f0-slave-ovs-clone Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ nmcli -g connection.uuid connection show ens4f0-slave-ovs-clone Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + new_uuid='WARN: existing ovs slave ens4f0-slave-ovs-clone connection profile file found, overwriting... Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: fa5e54c8-c34b-4f4d-8fff-3653f1ed084f' Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn mod uuid WARN: existing ovs slave ens4f0-slave-ovs-clone connection profile file found, overwriting... fa5e54c8-c34b-4f4d-8fff-3653f1ed084f connectio> Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: Error: unknown connection 'WARN:'. Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + handle_exit_error Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + e=10 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + '[' 10 -eq 0 ']' Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + set +e Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli c show Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: NAME UUID TYPE DEVICE Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: bond0 52eecf5a-df5e-30ae-9ca1-6297f0239027 bond bond0 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ovs-if-br-ex 72596cc5-698c-4b27-a494-0bbb06fb6862 ovs-interface br-ex Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: br-ex 66168247-7db4-4466-8997-370639c70d54 ovs-bridge br-ex Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f0 8950883b-a416-360a-a597-cb308946aaa0 ethernet ens4f0 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f1 9b808a8b-13ba-3749-8b8b-6f6f208666a2 ethernet ens4f1 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ovs-if-phys0 a3798cf3-e56b-4a52-893f-41846723f0f5 vlan bond0.373 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ovs-port-br-ex 8207e9c4-0312-4622-ace6-f9c92b9d8773 ovs-port br-ex Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ovs-port-phys0 df4e486e-3e2d-49a0-a5fa-142fa8c8a5ca ovs-port bond0.373 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: Wired connection 1 e22a8deb-684a-3a4f-9e1d-567e3f4546bd ethernet -- Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: bond0.373 3b3cccb0-d7a9-3546-b076-dbc701181e95 vlan -- Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f0-slave-ovs-clone fa5e54c8-c34b-4f4d-8fff-3653f1ed084f ethernet -- Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f1-slave-ovs-clone 82423461-b0b1-4cd6-a8ec-6a4dde3f0114 ethernet -- Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn down ovs-if-phys0 Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: Connection 'ovs-if-phys0' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/8) Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn up 52eecf5a-df5e-30ae-9ca1-6297f0239027 Nov 12 02:38:09 openshift-worker-3 configure-ovs.sh[5173]: Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/10) Nov 12 02:38:09 openshift-worker-3 configure-ovs.sh[5173]: + exit 10 Steps to Reproduce: ------------------- 1. Deploy baremetal IPI cluster 2. Perform OVN migration to 2nd interface 3. After node reboot configure-ovs might fail Actual results: --------------- MCO fails to proceed and node stays in SchedulingDisabled state oc logs -n openshift-machine-config-operator -c machine-config-daemon machine-config-daemon-brrhj Trace[1804756265]: [30.001843433s] [30.001843433s] END E1112 09:02:11.116566 16951 reflector.go:138] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfig: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout I1112 09:02:57.376779 16951 trace.go:205] Trace[1042946614]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:134 (12-Nov-2021 09:02:27.375) (total time: 30001ms): Trace[1042946614]: [30.001337111s] [30.001337111s] END E1112 09:02:57.376828 16951 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout Expected results: ----------------- configure-ovs succeeds --- Additional comment from bfournie on 2021-11-15 00:35:08 UTC --- Yurii - it looks like you created the fix for this so reassigning this.
Verified on: [kni@provisionhost-0-0 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-11-29-140653 True False 7h37m Cluster version is 4.8.0-0.nightly-2021-11-29-140653 with bond and migration of OVN to 2nd interface performed. After evacuating and rebooting all nodes one by one, no WARN spotted and ovs-configuration.service ok as well as ovs-migration.service on master-0-0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.8.23 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4881