Bug 1934443 - Installation of OCP 4.6.13 fails when teaming interface is used with OVNKubernetes [NEEDINFO]
Summary: Installation of OCP 4.6.13 fails when teaming interface is used with OVNKuber...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Mohamed Mahmoud
QA Contact: Ross Brattain
URL:
Whiteboard:
Depends On:
Blocks: 1977424 1977426
TreeView+ depends on / blocked
 
Reported: 2021-03-03 09:52 UTC by Vinu K
Modified: 2021-10-18 17:29 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1977424 (view as bug list)
Environment:
Last Closed: 2021-10-18 17:29:21 UTC
Target Upstream Version:
zzhao: needinfo? (vkochuku)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2512 0 None open Bug 1934443: Fix ovs-configure script to detect team interface 2021-04-05 13:00:03 UTC
Github openshift machine-config-operator pull 2645 0 None open Bug 1934443: Fix team config JSON format for nmcli command 2021-06-29 04:53:42 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:29:59 UTC

Description Vinu K 2021-03-03 09:52:45 UTC
Description of problem:
ovs-configuration.service fails if we use teaming + OVNKubernetes while installing OCP

Version-Release number of selected component (if applicable):
4.6.13

How reproducible:
Easy

Steps to Reproduce:
1. Install OCP 4.6.13 with teaming + OVNKubernetes
2. ovs-configuration systemd units fails with the error:
   ---
   Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: ++ nmcli --get-values connection.type conn show 702de3eb-2e80-897c-fd52-cd0494dd8123
Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: + '[' team == vlan ']'
Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: ++ nmcli --get-values connection.type conn show 702de3eb-2e80-897c-fd52-cd0494dd8123
Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: + '[' team == bond ']'
Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: + iface_type=802-3-ethernet
Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: + nmcli device disconnect team0
Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: Device 'team0' successfully disconnected.
Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: + nmcli connection show ovs-if-phys0
Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: + nmcli c add type 802-3-ethernet conn.interface team0 master ovs-port-phys0 con-name ovs-if-phys0 connection.autoconnect-priority 100 802-3-ethernet.mtu 1500
Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: Connection 'ovs-if-phys0' (f4b169a8-01b3-42fb-ae21-26e087e048df) successfully added.
Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: + nmcli conn up ovs-if-phys0
Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: Error: Connection activation failed: No suitable device found for this connection (device ens161 not available because profile is not compatible with device (mismatching interface name)).
Feb 28 23:35:35 localhost.localdomain systemd[1]: ovs-configuration.service: Main process exited, code=exited, status=4/NOPERMISSION
Feb 28 23:35:35 localhost.localdomain systemd[1]: ovs-configuration.service: Failed with result 'exit-code'.
Feb 28 23:35:35 localhost.localdomain systemd[1]: Failed to start Configures OVS with proper host networking configuration.
Feb 28 23:35:35 localhost.localdomain systemd[1]: ovs-configuration.service: Consumed 1.307s CPU time
   ---
3.

Actual results:
Cluster should be installed successfully.

Expected results:
Installation fails.

Additional info:
Cluster is installed succsessfully with OpenShiftSDN network.

Comment 4 Vinu K 2021-03-18 08:39:05 UTC
Hello Team,

Any update on the issue?

Thanks,
Vinu K

Comment 7 Thomas Haller 2021-03-22 06:55:50 UTC
> Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: 
> + nmcli c add type 802-3-ethernet conn.interface team0 master ovs-port-phys0 con-name ovs-if-phys0 connection.autoconnect-priority 100 802-3-ethernet.mtu 1500
> 
> Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: Connection 'ovs-if-phys0' (f4b169a8-01b3-42fb-ae21-26e087e048df) successfully added.
> Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: 
> + nmcli conn up ovs-if-phys0
> 
> Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: Error: Connection activation failed: No suitable device found for this connection (device ens161 not available because profile is not compatible with device (mismatching interface name)).

The error message seem self explanatory. Does an ethernet interface named "team0" exist?

What gives `nmcli device` and `ip link` at that time?

Comment 8 Vinu K 2021-04-02 13:00:05 UTC
Hello Team,

---
[core@master1 ~]$ sudo tail -n 30  /etc/sysconfig/network-scripts/*
==> /etc/sysconfig/network-scripts/ifcfg-team0 <==
NAME=team0
DEVICE=team0
DEVICETYPE=Team
TEAM_CONFIG='{"runner": {"name": "activebackup"}}'
BOOTPROTO=none
ONBOOT=yes
AUTOCONNECT_PRIORITY=100
AUTOCONNECT_RETRIES=0
IPADDR=192.168.125.11
PREFIX=24
GATEWAY=192.168.125.254
DNS1=192.168.125.10
DEFROUTE=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
==> /etc/sysconfig/network-scripts/ifcfg-team0-ens192 <==
NAME=team0-ens192
DEVICE=ens192
DEVICETYPE=TeamPort
TEAM_MASTER=team0
TEAM_PORT_CONFIG='{"sticky": true}'
ONBOOT=yes

==> /etc/sysconfig/network-scripts/ifcfg-team0-ens224 <==
NAME=team0-ens224
DEVICE=ens224
DEVICETYPE=TeamPort
TEAM_MASTER=team0
TEAM_PORT_CONFIG='{"sticky": true}'
ONBOOT=yes
---

---
[core@master1 ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens161: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0c:29:07:4b:5f brd ff:ff:ff:ff:ff:ff
3: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 state UP group default qlen 1000
    link/ether 00:0c:29:07:4b:4b brd ff:ff:ff:ff:ff:ff
4: ens224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 state UP group default qlen 1000
    link/ether 00:0c:29:07:4b:4b brd ff:ff:ff:ff:ff:ff
5: ens256: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0c:29:07:4b:55 brd ff:ff:ff:ff:ff:ff
7: team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:0c:29:07:4b:4b brd ff:ff:ff:ff:ff:ff
    inet 192.168.125.11/24 brd 192.168.125.255 scope global noprefixroute team0
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe07:4b4b/64 scope link
       valid_lft forever preferred_lft forever
---

At the beginning all works fine. When network operator is installing the ovs-configuration.service fails.

Followed tips from the https://bugzilla.redhat.com/show_bug.cgi?id=1758162#c11.

Thanks,
Vinu K

Comment 9 Beniamino Galvani 2021-04-02 13:40:11 UTC
> Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: ++ nmcli --get-values connection.type conn show 702de3eb-2e80-897c-fd52-cd0494dd8123
> Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: + '[' team == vlan ']'
> Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: ++ nmcli --get-values connection.type conn show 702de3eb-2e80-897c-fd52-cd0494dd8123
> Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: + '[' team == bond ']'
> Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: + iface_type=802-3-ethernet

> Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: + nmcli device disconnect team0
> Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: Device 'team0' successfully disconnected.
> Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: + nmcli connection show ovs-if-phys0
> Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: + nmcli c add type 802-3-ethernet conn.interface team0 master ovs-port-phys0 con-name ovs-if-phys0 connection.autoconnect-priority 100 802-3-ethernet.mtu 1500

Here the script tries to determine the type of connection to add. It should be 'team' but the scripts only supports 'vlan' or 'bond', and eventually falls back to 'ethernet'. 

I think this issue is similar to bug 1887545 (fixed by https://github.com/openshift/machine-config-operator/pull/2152 ).

@Tim, can you please have a look?

Comment 14 zhaozhanqi 2021-04-15 12:43:19 UTC
@yprokule@redhat.com Hi, do you know if the ocp-edge-virt job can setup 'team' interface type with ovn-kuberntes for verifying this bug?

Comment 16 zhaozhanqi 2021-04-19 02:39:33 UTC
@vkochuku@redhat.com Could you help verify this fix, currently QE does not have this kind of cluster available immediately

Comment 17 Vinu K 2021-04-23 18:09:21 UTC
Hello @zhaozhanqi,

Thank you for your update. I will check if the setup is available and update you.

Thanks,
Vinu K

Comment 22 zhaozhanqi 2021-07-29 13:11:12 UTC
HI, Vinu Could you check again with the new PR.

Comment 24 Ross Brattain 2021-09-02 17:33:34 UTC
openshift/machine-config-operator/pull/2706 merged, setting to Verified.

Comment 25 Ross Brattain 2021-09-02 17:48:30 UTC
Verified the 4.8 backport with teaming as well  https://github.com/openshift/machine-config-operator/pull/2644

RHCOS teaming static IP after reboot http://file.rdu.redhat.com/~rbrattai/logs/ovs-config-172.31.248.217

Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: + '[' team == team ']'
Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: + iface_type=team
Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: ++ nmcli --get-values team.config -e no conn show 702de3eb-2e80-897c-fd52-cd0494dd8123
Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: + team_config_opts='{"runner": {"name": "activebackup"}, "link_watch": {"name": "ethtool"}}'
Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: + '[' -n '{"runner": {"name": "activebackup"}, "link_watch": {"name": "ethtool"}}' ']'
Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: + extra_phys_args+=(team.config "${team_config_opts//[[:space:]]/}")
Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: + nmcli connection show ovs-if-phys0
Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: + nmcli c add type team conn.interface team0 master ovs-port-phys0 con-name ovs-if-phys0 connection.autoconnect-priority 100 802-3-ethernet.mtu 1500 team.config '{"runner":{"name":"activebackup"},"link_watch":{"name":"ethtool"}}'
Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: Connection 'ovs-if-phys0' (1a11fc58-3ac6-4917-8c80-e5a44ad54c1f) successfully added.

Comment 28 errata-xmlrpc 2021-10-18 17:29:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.