Bug 2022646 - configure-ovs.sh failure - Error: unknown connection 'WARN:'
Summary: configure-ovs.sh failure - Error: unknown connection 'WARN:'
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.10.0
Assignee: Yurii Prokulevych
QA Contact: Yurii Prokulevych
URL:
Whiteboard:
Depends On:
Blocks: 2022641
TreeView+ depends on / blocked
 
Reported: 2021-11-12 09:24 UTC by Yurii Prokulevych
Modified: 2022-03-10 16:27 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: warning message sent by 'clone_slave_connection' function was stored in a 'new_uuid' variable which is intended to store just connection's UUID Consequence: nmcli commands that reference 'new_uuid' variable as input were failing due to incorrect value being stored in the 'new_uuid' variable Fix: warning message from 'clone_slave_connection' function is redirected to stderr Result: nmcli commands that reference 'new_uuid' variable don't fail due to unknown connection
Clone Of:
Environment:
Last Closed: 2022-03-10 16:26:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2774 0 None Merged Send WARN message to stderr 2021-11-12 09:24:36 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:27:10 UTC

Description Yurii Prokulevych 2021-11-12 09:24:37 UTC
This bug was initially created as a copy of Bug #2022641

I am copying this bug because:
------------------------------
It's needed for back porting to previous releases



Description of problem:
------------------------
ovs-configuration.service fails with error:

 systemctl status ovs-configuration.service
● ovs-configuration.service - Configures OVS with proper host networking configuration
   Loaded: loaded (/etc/systemd/system/ovs-configuration.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2021-11-12 02:38:09 UTC; 6h ago
 Main PID: 5173 (code=exited, status=10)
      CPU: 1.321s

Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f1-slave-ovs-clone  82423461-b0b1-4cd6-a8ec-6a4dde3f0114  ethernet       --
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn down ovs-if-phys0
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: Connection 'ovs-if-phys0' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/8)
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn up 52eecf5a-df5e-30ae-9ca1-6297f0239027
Nov 12 02:38:09 openshift-worker-3 configure-ovs.sh[5173]: Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/10)
Nov 12 02:38:09 openshift-worker-3 configure-ovs.sh[5173]: + exit 10
Nov 12 02:38:09 openshift-worker-3 systemd[1]: ovs-configuration.service: Main process exited, code=exited, status=10/n/a
Nov 12 02:38:09 openshift-worker-3 systemd[1]: ovs-configuration.service: Failed with result 'exit-code'.
Nov 12 02:38:09 openshift-worker-3 systemd[1]: Failed to start Configures OVS with proper host networking configuration.
Nov 12 02:38:09 openshift-worker-3 systemd[1]: ovs-configuration.service: Consumed 1.321s CPU time

Problem is that qarning message sent by 'clone_slave_connection' function ends up in a 'new_uuid' variable breaking following command:

Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + '[' bond0 '!=' bond0 ']'
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + local new_uuid
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ clone_slave_connection 8950883b-a416-360a-a597-cb308946aaa0
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ local uuid=8950883b-a416-360a-a597-cb308946aaa0
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ local old_name
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: +++ nmcli -g connection.id connection show uuid 8950883b-a416-360a-a597-cb308946aaa0
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ old_name=ens4f0
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ local new_name=ens4f0-slave-ovs-clone
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ nmcli connection show id ens4f0-slave-ovs-clone
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ echo 'WARN: existing ovs slave ens4f0-slave-ovs-clone connection profile file found, overwriting...'
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ nmcli connection delete id ens4f0-slave-ovs-clone
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ nmcli connection clone 8950883b-a416-360a-a597-cb308946aaa0 ens4f0-slave-ovs-clone
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ++ nmcli -g connection.uuid connection show ens4f0-slave-ovs-clone
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + new_uuid='WARN: existing ovs slave ens4f0-slave-ovs-clone connection profile file found, overwriting...
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: fa5e54c8-c34b-4f4d-8fff-3653f1ed084f'
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn mod uuid WARN: existing ovs slave ens4f0-slave-ovs-clone connection profile file found, overwriting... fa5e54c8-c34b-4f4d-8fff-3653f1ed084f connectio>
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: Error: unknown connection 'WARN:'.
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + handle_exit_error
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + e=10
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + '[' 10 -eq 0 ']'
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + set +e
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli c show
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: NAME                    UUID                                  TYPE           DEVICE
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: bond0                   52eecf5a-df5e-30ae-9ca1-6297f0239027  bond           bond0
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ovs-if-br-ex            72596cc5-698c-4b27-a494-0bbb06fb6862  ovs-interface  br-ex
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: br-ex                   66168247-7db4-4466-8997-370639c70d54  ovs-bridge     br-ex
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f0                  8950883b-a416-360a-a597-cb308946aaa0  ethernet       ens4f0
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f1                  9b808a8b-13ba-3749-8b8b-6f6f208666a2  ethernet       ens4f1
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ovs-if-phys0            a3798cf3-e56b-4a52-893f-41846723f0f5  vlan           bond0.373
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ovs-port-br-ex          8207e9c4-0312-4622-ace6-f9c92b9d8773  ovs-port       br-ex
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ovs-port-phys0          df4e486e-3e2d-49a0-a5fa-142fa8c8a5ca  ovs-port       bond0.373
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: Wired connection 1      e22a8deb-684a-3a4f-9e1d-567e3f4546bd  ethernet       --
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: bond0.373               3b3cccb0-d7a9-3546-b076-dbc701181e95  vlan           --
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f0-slave-ovs-clone  fa5e54c8-c34b-4f4d-8fff-3653f1ed084f  ethernet       --
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: ens4f1-slave-ovs-clone  82423461-b0b1-4cd6-a8ec-6a4dde3f0114  ethernet       --
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn down ovs-if-phys0
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: Connection 'ovs-if-phys0' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/8)
Nov 12 02:38:08 openshift-worker-3 configure-ovs.sh[5173]: + nmcli conn up 52eecf5a-df5e-30ae-9ca1-6297f0239027
Nov 12 02:38:09 openshift-worker-3 configure-ovs.sh[5173]: Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/10)
Nov 12 02:38:09 openshift-worker-3 configure-ovs.sh[5173]: + exit 10


Steps to Reproduce:
-------------------
1. Deploy baremetal IPI cluster
2. Perform OVN migration to 2nd interface
3. After node reboot configure-ovs might fail

Actual results:
---------------
MCO fails to proceed and node stays in SchedulingDisabled state

oc logs -n openshift-machine-config-operator -c machine-config-daemon machine-config-daemon-brrhj

Trace[1804756265]: [30.001843433s] [30.001843433s] END
E1112 09:02:11.116566   16951 reflector.go:138] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfig: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
I1112 09:02:57.376779   16951 trace.go:205] Trace[1042946614]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:134 (12-Nov-2021 09:02:27.375) (total time: 30001ms):
Trace[1042946614]: [30.001337111s] [30.001337111s] END
E1112 09:02:57.376828   16951 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp
172.30.0.1:443: i/o timeout

Expected results:
-----------------
configure-ovs succeeds

Comment 2 Bob Fournier 2021-11-15 00:34:08 UTC
Yurii - it looks like you created the fix for this so reassigning this to you.

Comment 3 Yurii Prokulevych 2021-11-18 12:29:01 UTC
Verified with 4.10.0-0.nightly-2021-11-15-034648

Fix present in configure-ovs.sh:

```
  if nmcli connection show id "${new_name}" &> /dev/null; then
    echo "WARN: existing ovs slave ${new_name} connection profile file found, overwriting..." >&2
    nmcli connection delete id "${new_name}" &> /dev/null
  fi
```

Rebooting nodes multiple times and issue didn't reproduce

systemctl status ovs-configuration.service
● ovs-configuration.service - Configures OVS with proper host networking configuration
   Loaded: loaded (/etc/systemd/system/ovs-configuration.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Thu 2021-11-18 12:24:20 UTC; 1min 0s ago
 Main PID: 4125 (code=exited, status=0/SUCCESS)
      CPU: 190ms

Nov 18 12:24:20 worker-0-0 configure-ovs.sh[4125]: + '[' -f /etc/ovnk/extra_bridge ']'
Nov 18 12:24:20 worker-0-0 configure-ovs.sh[4125]: + nmcli connection show br-ex1
Nov 18 12:24:20 worker-0-0 configure-ovs.sh[4125]: + nmcli connection show ovs-if-phys1
Nov 18 12:24:20 worker-0-0 configure-ovs.sh[4125]: + '[' '!' -f /etc/ovnk/extra_bridge ']'
Nov 18 12:24:20 worker-0-0 configure-ovs.sh[4125]: + ovs-vsctl --timeout=30 --if-exists del-br br0
Nov 18 12:24:20 worker-0-0 ovs-vsctl[4147]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=30 --if-exists del-br br0
Nov 18 12:24:20 worker-0-0 configure-ovs.sh[4125]: + exit 0
Nov 18 12:24:20 worker-0-0 systemd[1]: ovs-configuration.service: Succeeded.
Nov 18 12:24:20 worker-0-0 systemd[1]: Started Configures OVS with proper host networking configuration.
Nov 18 12:24:20 worker-0-0 systemd[1]: ovs-configuration.service: Consumed 190ms CPU time

Comment 6 errata-xmlrpc 2022-03-10 16:26:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.