Description of problem: ovs-configuration.service fails if we use teaming + OVNKubernetes while installing OCP Version-Release number of selected component (if applicable): 4.6.13 How reproducible: Easy Steps to Reproduce: 1. Install OCP 4.6.13 with teaming + OVNKubernetes 2. ovs-configuration systemd units fails with the error: --- Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: ++ nmcli --get-values connection.type conn show 702de3eb-2e80-897c-fd52-cd0494dd8123 Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: + '[' team == vlan ']' Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: ++ nmcli --get-values connection.type conn show 702de3eb-2e80-897c-fd52-cd0494dd8123 Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: + '[' team == bond ']' Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: + iface_type=802-3-ethernet Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: + nmcli device disconnect team0 Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: Device 'team0' successfully disconnected. Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: + nmcli connection show ovs-if-phys0 Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: + nmcli c add type 802-3-ethernet conn.interface team0 master ovs-port-phys0 con-name ovs-if-phys0 connection.autoconnect-priority 100 802-3-ethernet.mtu 1500 Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: Connection 'ovs-if-phys0' (f4b169a8-01b3-42fb-ae21-26e087e048df) successfully added. Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: + nmcli conn up ovs-if-phys0 Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: Error: Connection activation failed: No suitable device found for this connection (device ens161 not available because profile is not compatible with device (mismatching interface name)). Feb 28 23:35:35 localhost.localdomain systemd[1]: ovs-configuration.service: Main process exited, code=exited, status=4/NOPERMISSION Feb 28 23:35:35 localhost.localdomain systemd[1]: ovs-configuration.service: Failed with result 'exit-code'. Feb 28 23:35:35 localhost.localdomain systemd[1]: Failed to start Configures OVS with proper host networking configuration. Feb 28 23:35:35 localhost.localdomain systemd[1]: ovs-configuration.service: Consumed 1.307s CPU time --- 3. Actual results: Cluster should be installed successfully. Expected results: Installation fails. Additional info: Cluster is installed succsessfully with OpenShiftSDN network.
Hello Team, Any update on the issue? Thanks, Vinu K
> Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: > + nmcli c add type 802-3-ethernet conn.interface team0 master ovs-port-phys0 con-name ovs-if-phys0 connection.autoconnect-priority 100 802-3-ethernet.mtu 1500 > > Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: Connection 'ovs-if-phys0' (f4b169a8-01b3-42fb-ae21-26e087e048df) successfully added. > Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: > + nmcli conn up ovs-if-phys0 > > Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: Error: Connection activation failed: No suitable device found for this connection (device ens161 not available because profile is not compatible with device (mismatching interface name)). The error message seem self explanatory. Does an ethernet interface named "team0" exist? What gives `nmcli device` and `ip link` at that time?
Hello Team, --- [core@master1 ~]$ sudo tail -n 30 /etc/sysconfig/network-scripts/* ==> /etc/sysconfig/network-scripts/ifcfg-team0 <== NAME=team0 DEVICE=team0 DEVICETYPE=Team TEAM_CONFIG='{"runner": {"name": "activebackup"}}' BOOTPROTO=none ONBOOT=yes AUTOCONNECT_PRIORITY=100 AUTOCONNECT_RETRIES=0 IPADDR=192.168.125.11 PREFIX=24 GATEWAY=192.168.125.254 DNS1=192.168.125.10 DEFROUTE=yes IPV4_FAILURE_FATAL=yes IPV6INIT=no ==> /etc/sysconfig/network-scripts/ifcfg-team0-ens192 <== NAME=team0-ens192 DEVICE=ens192 DEVICETYPE=TeamPort TEAM_MASTER=team0 TEAM_PORT_CONFIG='{"sticky": true}' ONBOOT=yes ==> /etc/sysconfig/network-scripts/ifcfg-team0-ens224 <== NAME=team0-ens224 DEVICE=ens224 DEVICETYPE=TeamPort TEAM_MASTER=team0 TEAM_PORT_CONFIG='{"sticky": true}' ONBOOT=yes --- --- [core@master1 ~]$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens161: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:0c:29:07:4b:5f brd ff:ff:ff:ff:ff:ff 3: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 state UP group default qlen 1000 link/ether 00:0c:29:07:4b:4b brd ff:ff:ff:ff:ff:ff 4: ens224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 state UP group default qlen 1000 link/ether 00:0c:29:07:4b:4b brd ff:ff:ff:ff:ff:ff 5: ens256: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:0c:29:07:4b:55 brd ff:ff:ff:ff:ff:ff 7: team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:0c:29:07:4b:4b brd ff:ff:ff:ff:ff:ff inet 192.168.125.11/24 brd 192.168.125.255 scope global noprefixroute team0 valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fe07:4b4b/64 scope link valid_lft forever preferred_lft forever --- At the beginning all works fine. When network operator is installing the ovs-configuration.service fails. Followed tips from the https://bugzilla.redhat.com/show_bug.cgi?id=1758162#c11. Thanks, Vinu K
> Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: ++ nmcli --get-values connection.type conn show 702de3eb-2e80-897c-fd52-cd0494dd8123 > Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: + '[' team == vlan ']' > Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: ++ nmcli --get-values connection.type conn show 702de3eb-2e80-897c-fd52-cd0494dd8123 > Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: + '[' team == bond ']' > Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: + iface_type=802-3-ethernet > Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: + nmcli device disconnect team0 > Feb 28 23:35:35 master2.ocp4.vlan125.mcp configure-ovs.sh[1956]: Device 'team0' successfully disconnected. > Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: + nmcli connection show ovs-if-phys0 > Feb 28 23:35:35 localhost.localdomain configure-ovs.sh[1956]: + nmcli c add type 802-3-ethernet conn.interface team0 master ovs-port-phys0 con-name ovs-if-phys0 connection.autoconnect-priority 100 802-3-ethernet.mtu 1500 Here the script tries to determine the type of connection to add. It should be 'team' but the scripts only supports 'vlan' or 'bond', and eventually falls back to 'ethernet'. I think this issue is similar to bug 1887545 (fixed by https://github.com/openshift/machine-config-operator/pull/2152 ). @Tim, can you please have a look?
@yprokule Hi, do you know if the ocp-edge-virt job can setup 'team' interface type with ovn-kuberntes for verifying this bug?
@vkochuku Could you help verify this fix, currently QE does not have this kind of cluster available immediately
Hello @zhaozhanqi, Thank you for your update. I will check if the setup is available and update you. Thanks, Vinu K
HI, Vinu Could you check again with the new PR.
openshift/machine-config-operator/pull/2706 merged, setting to Verified.
Verified the 4.8 backport with teaming as well https://github.com/openshift/machine-config-operator/pull/2644 RHCOS teaming static IP after reboot http://file.rdu.redhat.com/~rbrattai/logs/ovs-config-172.31.248.217 Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: + '[' team == team ']' Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: + iface_type=team Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: ++ nmcli --get-values team.config -e no conn show 702de3eb-2e80-897c-fd52-cd0494dd8123 Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: + team_config_opts='{"runner": {"name": "activebackup"}, "link_watch": {"name": "ethtool"}}' Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: + '[' -n '{"runner": {"name": "activebackup"}, "link_watch": {"name": "ethtool"}}' ']' Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: + extra_phys_args+=(team.config "${team_config_opts//[[:space:]]/}") Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: + nmcli connection show ovs-if-phys0 Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: + nmcli c add type team conn.interface team0 master ovs-port-phys0 con-name ovs-if-phys0 connection.autoconnect-priority 100 802-3-ethernet.mtu 1500 team.config '{"runner":{"name":"activebackup"},"link_watch":{"name":"ethtool"}}' Aug 31 21:17:08 compute-0 configure-ovs.sh[1392]: Connection 'ovs-if-phys0' (1a11fc58-3ac6-4917-8c80-e5a44ad54c1f) successfully added.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759