Bug 2022641 - configure-ovs.sh failure - Error: unknown connection 'WARN:'
Summary: configure-ovs.sh failure - Error: unknown connection 'WARN:'
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.9.z
Assignee: Yurii Prokulevych
QA Contact: Victor Voronkov
URL:
Whiteboard:
Depends On: 2022646
Blocks: 2026085
TreeView+ depends on / blocked
 
Reported: 2021-11-12 09:06 UTC by Yurii Prokulevych
Modified: 2021-11-29 10:53 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-29 10:53:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2785 0 None open [release-4.9] Send WARN message to stderr 2021-11-12 09:14:54 UTC
Red Hat Product Errata RHBA-2021:4834 0 None None None 2021-11-29 10:53:58 UTC

Description Yurii Prokulevych 2021-11-12 09:06:06 UTC
Description of problem:
------------------------
ovs-configuration.service fails with error:

 systemctl status ovs-configuration.service
● ovs-configuration.service - Configures OVS with proper host networking configuration
   Loaded: loaded (/etc/systemd/system/ovs-configuration.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2021-11-12 02:38:09 UTC; 6h ago
 Main PID: 5173 (code=exited, status=10)
      CPU: 1.321s

Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f1-slave-ovs-clone  82423461-b0b1-4cd6-a8ec-6a4dde3f0114  ethernet       --
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn down ovs-if-phys0
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: Connection 'ovs-if-phys0' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/8)
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn up 52eecf5a-df5e-30ae-9ca1-6297f0239027
Nov 12 02:38:09 openshift-worker-3 configure-ovs.sh[5173]: Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/10)
Nov 12 02:38:09 openshift-worker-3 configure-ovs.sh[5173]: + exit 10
Nov 12 02:38:09 openshift-worker-3 systemd[1]: ovs-configuration.service: Main process exited, code=exited, status=10/n/a
Nov 12 02:38:09 openshift-worker-3 systemd[1]: ovs-configuration.service: Failed with result 'exit-code'.
Nov 12 02:38:09 openshift-worker-3 systemd[1]: Failed to start Configures OVS with proper host networking configuration.
Nov 12 02:38:09 openshift-worker-3 systemd[1]: ovs-configuration.service: Consumed 1.321s CPU time

Problem is that qarning message sent by 'clone_slave_connection' function ends up in a 'new_uuid' variable breaking following command:

Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + '[' bond0 '!=' bond0 ']'
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + local new_uuid
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ clone_slave_connection 8950883b-a416-360a-a597-cb308946aaa0
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ local uuid=8950883b-a416-360a-a597-cb308946aaa0
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ local old_name
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: +++ nmcli -g connection.id connection show uuid 8950883b-a416-360a-a597-cb308946aaa0
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ old_name=ens4f0
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ local new_name=ens4f0-slave-ovs-clone
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ nmcli connection show id ens4f0-slave-ovs-clone
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ echo 'WARN: existing ovs slave ens4f0-slave-ovs-clone connection profile file found, overwriting...'
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ nmcli connection delete id ens4f0-slave-ovs-clone
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ nmcli connection clone 8950883b-a416-360a-a597-cb308946aaa0 ens4f0-slave-ovs-clone
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ nmcli -g connection.uuid connection show ens4f0-slave-ovs-clone
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + new_uuid='WARN: existing ovs slave ens4f0-slave-ovs-clone connection profile file found, overwriting...
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: fa5e54c8-c34b-4f4d-8fff-3653f1ed084f'
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn mod uuid WARN: existing ovs slave ens4f0-slave-ovs-clone connection profile file found, overwriting... fa5e54c8-c34b-4f4d-8fff-3653f1ed084f connectio>
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: Error: unknown connection 'WARN:'.
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + handle_exit_error
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + e=10
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + '[' 10 -eq 0 ']'
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + set +e
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli c show
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: NAME                    UUID                                  TYPE           DEVICE
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: bond0                   52eecf5a-df5e-30ae-9ca1-6297f0239027  bond           bond0
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ovs-if-br-ex            72596cc5-698c-4b27-a494-0bbb06fb6862  ovs-interface  br-ex
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: br-ex                   66168247-7db4-4466-8997-370639c70d54  ovs-bridge     br-ex
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f0                  8950883b-a416-360a-a597-cb308946aaa0  ethernet       ens4f0
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f1                  9b808a8b-13ba-3749-8b8b-6f6f208666a2  ethernet       ens4f1
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ovs-if-phys0            a3798cf3-e56b-4a52-893f-41846723f0f5  vlan           bond0.373
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ovs-port-br-ex          8207e9c4-0312-4622-ace6-f9c92b9d8773  ovs-port       br-ex
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ovs-port-phys0          df4e486e-3e2d-49a0-a5fa-142fa8c8a5ca  ovs-port       bond0.373
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: Wired connection 1      e22a8deb-684a-3a4f-9e1d-567e3f4546bd  ethernet       --
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: bond0.373               3b3cccb0-d7a9-3546-b076-dbc701181e95  vlan           --
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f0-slave-ovs-clone  fa5e54c8-c34b-4f4d-8fff-3653f1ed084f  ethernet       --
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f1-slave-ovs-clone  82423461-b0b1-4cd6-a8ec-6a4dde3f0114  ethernet       --
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn down ovs-if-phys0
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: Connection 'ovs-if-phys0' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/8)
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn up 52eecf5a-df5e-30ae-9ca1-6297f0239027
Nov 12 02:38:09 openshift-worker-3 configure-ovs.sh[5173]: Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/10)
Nov 12 02:38:09 openshift-worker-3 configure-ovs.sh[5173]: + exit 10


Steps to Reproduce:
-------------------
1. Deploy baremetal IPI cluster
2. Perform OVN migration to 2nd interface
3. After node reboot configure-ovs might fail

Actual results:
---------------
MCO fails to proceed and node stays in SchedulingDisabled state

oc logs -n openshift-machine-config-operator -c machine-config-daemon machine-config-daemon-brrhj

Trace[1804756265]: [30.001843433s] [30.001843433s] END
E1112 09:02:11.116566   16951 reflector.go:138] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfig: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
I1112 09:02:57.376779   16951 trace.go:205] Trace[1042946614]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:134 (12-Nov-2021 09:02:27.375) (total time: 30001ms):
Trace[1042946614]: [30.001337111s] [30.001337111s] END
E1112 09:02:57.376828   16951 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp
172.30.0.1:443: i/o timeout

Expected results:
-----------------
configure-ovs succeeds

Comment 1 Bob Fournier 2021-11-15 00:35:08 UTC
Yurii - it looks like you created the fix for this so reassigning this.

Comment 8 errata-xmlrpc 2021-11-29 10:53:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.9 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4834


Note You need to log in before you can comment on or make changes to this bug.