Bug 2000671 - [OVN-Migration] ovs-configuration - Connection activation failed: Could not create a software link
Summary: [OVN-Migration] ovs-configuration - Connection activation failed: Could not c...
Keywords:
Status: VERIFIED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Peng Liu
QA Contact: Weibin Liang
URL:
Whiteboard:
Depends On: 1975174
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-02 16:40 UTC by Yurii Prokulevych
Modified: 2022-11-11 19:18 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:
pliu: needinfo-


Attachments (Terms of Use)

Description Yurii Prokulevych 2021-09-02 16:40:34 UTC
Description of problem:
-----------------------
After performing OVN migration to 2nd interface and node reboot 'ovs-configuration.service' fail:

systemctl status ovs-configuration.service
● ovs-configuration.service - Configures OVS with proper host networking configuration
   Loaded: loaded (/etc/systemd/system/ovs-configuration.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2021-09-02 16:26:55 UTC; 29s ago
 Main PID: 3485 (code=exited, status=4)
      CPU: 843ms

Sep 02 16:26:55 master-0-1 configure-ovs.sh[3485]: ++ nmcli -g connection.master connection show uuid 7eb2a2e6-accf-4a33-8f77-94e1859a199c
Sep 02 16:26:55 master-0-1 configure-ovs.sh[3485]: + '[' 5dc2327c-9420-49f5-a1f7-1924f359987b '!=' 4ddf3d3a-db51-362c-a691-d4b49b53a4de ']'
Sep 02 16:26:55 master-0-1 configure-ovs.sh[3485]: + continue
Sep 02 16:26:55 master-0-1 configure-ovs.sh[3485]: + nmcli conn up ovs-if-phys0
Sep 02 16:26:55 master-0-1 configure-ovs.sh[3485]: Error: Connection activation failed: Could not create a software link
Sep 02 16:26:55 master-0-1 configure-ovs.sh[3485]: Hint: use 'journalctl -xe NM_CONNECTION=7eb2a2e6-accf-4a33-8f77-94e1859a199c + NM_DEVICE=bond0.373' to get more details.
Sep 02 16:26:55 master-0-1 systemd[1]: ovs-configuration.service: Main process exited, code=exited, status=4/NOPERMISSION
Sep 02 16:26:55 master-0-1 systemd[1]: ovs-configuration.service: Failed with result 'exit-code'.
Sep 02 16:26:55 master-0-1 systemd[1]: Failed to start Configures OVS with proper host networking configuration.
Sep 02 16:26:55 master-0-1 systemd[1]: ovs-configuration.service: Consumed 843ms CPU time


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
OCP - 4.7.24

ovn2.13-20.12.0-24.el8fdp.x86_64
ovn2.13-vtep-20.12.0-24.el8fdp.x86_64
ovn2.13-host-20.12.0-24.el8fdp.x86_64
ovn2.13-central-20.12.0-24.el8fdp.x86_64



Steps to Reproduce:
-------------------
1. Install disconnected BM IPI cluster with bonded interfaces
2. Create a vlan interface on top of bond, using custom MCP
3. Deliver script/service to perform OVN migration to nodes
4. After OVN migration is finished - drain and reboot a node

Actual results:
---------------
ovs-configuration.service failed


Expected results:
-----------------
ovs-configuration.service starts


Additional info:
----------------
Virtual setup - 3 masters + 2 workers

Comment 3 Jaime Caamaño Ruiz 2021-09-03 07:28:29 UTC
After performing OVN migration and a subsequent reboot, ovs-configuration runs again. It is mostly a noop but it does do in fast sequence `nmcli c down bond0` to disconnect bond0 and `nmcli c up ovs-if-phys0` to connect bond0.373. `nmcli c down bond0`causes bond0.373 to disconnect as well but not fast enough and `nmcli c up ovs-if-phys0` fails because bond0.373 already exists. 

It looks like this could be handled better by NM.

Whith fixes related to https://bugzilla.redhat.com/show_bug.cgi?id=1975174 these operations won't be done any longer and most probably the issue won't happen. So it should not affect 4.9.

@yprokule can you try the PR attached to that BZ? Thank you.

Comment 4 Yurii Prokulevych 2021-09-06 14:51:01 UTC
(In reply to Jaime Caamaño Ruiz from comment #3)
> After performing OVN migration and a subsequent reboot, ovs-configuration
> runs again. It is mostly a noop but it does do in fast sequence `nmcli c
> down bond0` to disconnect bond0 and `nmcli c up ovs-if-phys0` to connect
> bond0.373. `nmcli c down bond0`causes bond0.373 to disconnect as well but
> not fast enough and `nmcli c up ovs-if-phys0` fails because bond0.373
> already exists. 
> 
> It looks like this could be handled better by NM.
> 
> Whith fixes related to https://bugzilla.redhat.com/show_bug.cgi?id=1975174
> these operations won't be done any longer and most probably the issue won't
> happen. So it should not affect 4.9.
> 
> @yprokule can you try the PR attached to that BZ? Thank you.

Hey Jaime,

I replaced /usr/local/bin/configure-ovs.sh on all cluster nodes with content of https://github.com/openshift/machine-config-operator/blob/68b2b47baf87aa15afabc3a6e40317f23628da43/templates/common/_base/files/configure-ovs-network.yaml run OVN migration and didn't notice error

Comment 5 Peng Liu 2021-11-15 03:33:54 UTC
Move to 'modified' as the fix for bz1975174 has been merged.

Comment 6 Weibin Liang 2022-11-11 19:18:42 UTC
Tested and verified in 4.7.0-0.nightly-2022-11-01-171947
OVN migration to 2nd interface/bond0 interface

17: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:da:d2:f3 brd ff:ff:ff:ff:ff:ff
    inet 192.168.123.68/24 brd 192.168.123.255 scope global dynamic noprefixroute bond0
       valid_lft 1943sec preferred_lft 1943sec
23: bond0.373@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovs-system state UP group default qlen 1000
    link/ether 52:54:00:da:d2:f3 brd ff:ff:ff:ff:ff:ff
24: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 52:54:00:da:d2:f3 brd ff:ff:ff:ff:ff:ff
    inet 198.19.0.18/19 brd 198.19.31.255 scope global dynamic noprefixroute br-ex
       valid_lft 1207945sec preferred_lft 1207945sec
    inet6 fe80::5054:ff:feda:d2f3/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever


Note You need to log in before you can comment on or make changes to this bug.