Bug 2000671
| Summary: | [OVN-Migration] ovs-configuration - Connection activation failed: Could not create a software link | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Yurii Prokulevych <yprokule> |
| Component: | Networking | Assignee: | Peng Liu <pliu> |
| Networking sub component: | ovn-kubernetes | QA Contact: | Weibin Liang <weliang> |
| Status: | CLOSED WONTFIX | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | achernet, anbhat, eglottma, jcaamano, mcornea, mzamot, ncocker, pliu, zzhao |
| Version: | 4.7 | Flags: | pliu:
needinfo-
|
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-04-30 18:04:53 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1975174 | ||
| Bug Blocks: | |||
After performing OVN migration and a subsequent reboot, ovs-configuration runs again. It is mostly a noop but it does do in fast sequence `nmcli c down bond0` to disconnect bond0 and `nmcli c up ovs-if-phys0` to connect bond0.373. `nmcli c down bond0`causes bond0.373 to disconnect as well but not fast enough and `nmcli c up ovs-if-phys0` fails because bond0.373 already exists. It looks like this could be handled better by NM. Whith fixes related to https://bugzilla.redhat.com/show_bug.cgi?id=1975174 these operations won't be done any longer and most probably the issue won't happen. So it should not affect 4.9. @yprokule can you try the PR attached to that BZ? Thank you. (In reply to Jaime Caamaño Ruiz from comment #3) > After performing OVN migration and a subsequent reboot, ovs-configuration > runs again. It is mostly a noop but it does do in fast sequence `nmcli c > down bond0` to disconnect bond0 and `nmcli c up ovs-if-phys0` to connect > bond0.373. `nmcli c down bond0`causes bond0.373 to disconnect as well but > not fast enough and `nmcli c up ovs-if-phys0` fails because bond0.373 > already exists. > > It looks like this could be handled better by NM. > > Whith fixes related to https://bugzilla.redhat.com/show_bug.cgi?id=1975174 > these operations won't be done any longer and most probably the issue won't > happen. So it should not affect 4.9. > > @yprokule can you try the PR attached to that BZ? Thank you. Hey Jaime, I replaced /usr/local/bin/configure-ovs.sh on all cluster nodes with content of https://github.com/openshift/machine-config-operator/blob/68b2b47baf87aa15afabc3a6e40317f23628da43/templates/common/_base/files/configure-ovs-network.yaml run OVN migration and didn't notice error Move to 'modified' as the fix for bz1975174 has been merged. Tested and verified in 4.7.0-0.nightly-2022-11-01-171947
OVN migration to 2nd interface/bond0 interface
17: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 52:54:00:da:d2:f3 brd ff:ff:ff:ff:ff:ff
inet 192.168.123.68/24 brd 192.168.123.255 scope global dynamic noprefixroute bond0
valid_lft 1943sec preferred_lft 1943sec
23: bond0.373@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovs-system state UP group default qlen 1000
link/ether 52:54:00:da:d2:f3 brd ff:ff:ff:ff:ff:ff
24: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 52:54:00:da:d2:f3 brd ff:ff:ff:ff:ff:ff
inet 198.19.0.18/19 brd 198.19.31.255 scope global dynamic noprefixroute br-ex
valid_lft 1207945sec preferred_lft 1207945sec
inet6 fe80::5054:ff:feda:d2f3/64 scope link noprefixroute
valid_lft forever preferred_lft forever
OCP is no longer using Bugzilla and this bug appears to have been left in an orphaned state. If the bug is still relevant, please open a new issue in the OCPBUGS Jira project: https://issues.redhat.com/projects/OCPBUGS/summary |
Description of problem: ----------------------- After performing OVN migration to 2nd interface and node reboot 'ovs-configuration.service' fail: systemctl status ovs-configuration.service ● ovs-configuration.service - Configures OVS with proper host networking configuration Loaded: loaded (/etc/systemd/system/ovs-configuration.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Thu 2021-09-02 16:26:55 UTC; 29s ago Main PID: 3485 (code=exited, status=4) CPU: 843ms Sep 02 16:26:55 master-0-1 configure-ovs.sh[3485]: ++ nmcli -g connection.master connection show uuid 7eb2a2e6-accf-4a33-8f77-94e1859a199c Sep 02 16:26:55 master-0-1 configure-ovs.sh[3485]: + '[' 5dc2327c-9420-49f5-a1f7-1924f359987b '!=' 4ddf3d3a-db51-362c-a691-d4b49b53a4de ']' Sep 02 16:26:55 master-0-1 configure-ovs.sh[3485]: + continue Sep 02 16:26:55 master-0-1 configure-ovs.sh[3485]: + nmcli conn up ovs-if-phys0 Sep 02 16:26:55 master-0-1 configure-ovs.sh[3485]: Error: Connection activation failed: Could not create a software link Sep 02 16:26:55 master-0-1 configure-ovs.sh[3485]: Hint: use 'journalctl -xe NM_CONNECTION=7eb2a2e6-accf-4a33-8f77-94e1859a199c + NM_DEVICE=bond0.373' to get more details. Sep 02 16:26:55 master-0-1 systemd[1]: ovs-configuration.service: Main process exited, code=exited, status=4/NOPERMISSION Sep 02 16:26:55 master-0-1 systemd[1]: ovs-configuration.service: Failed with result 'exit-code'. Sep 02 16:26:55 master-0-1 systemd[1]: Failed to start Configures OVS with proper host networking configuration. Sep 02 16:26:55 master-0-1 systemd[1]: ovs-configuration.service: Consumed 843ms CPU time Version-Release number of selected component (if applicable): ------------------------------------------------------------- OCP - 4.7.24 ovn2.13-20.12.0-24.el8fdp.x86_64 ovn2.13-vtep-20.12.0-24.el8fdp.x86_64 ovn2.13-host-20.12.0-24.el8fdp.x86_64 ovn2.13-central-20.12.0-24.el8fdp.x86_64 Steps to Reproduce: ------------------- 1. Install disconnected BM IPI cluster with bonded interfaces 2. Create a vlan interface on top of bond, using custom MCP 3. Deliver script/service to perform OVN migration to nodes 4. After OVN migration is finished - drain and reboot a node Actual results: --------------- ovs-configuration.service failed Expected results: ----------------- ovs-configuration.service starts Additional info: ---------------- Virtual setup - 3 masters + 2 workers