Bug 1758162 - [RFE] support team (teamd) interfaces in RHCOS
Summary: [RFE] support team (teamd) interfaces in RHCOS
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.2.z
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: 4.4.0
Assignee: Dusty Mabe
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1186913
TreeView+ depends on / blocked
 
Reported: 2019-10-03 13:02 UTC by Dave Cain
Modified: 2023-12-15 16:48 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: Network teaming support was requested in RHEL CoreOS. Reason: This was previously not supported because the teamd and NetworkManager-team rpms were not included in RHEL CoreOS. Result: The teamd and NetworkManager-team rpms were added to RHEL CoreOS. Setting up and managing teamed network devices is now possible.
Clone Of:
Environment:
Last Closed: 2020-05-04 11:13:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:14:26 UTC

Description Dave Cain 2019-10-03 13:02:13 UTC
Description of problem:
Enterprise customers and service providers deploying OpenShift on baremetal wish to leverage high availability features in the Layer 2 host networking layer with channel bonding, specifically 802.3ad link aggregation via LACP to aggregate separate physical interfaces on servers themselves.

This is possible with the kernel "bonding" driver today, but not with the teaming driver.  Looks to me in RHCoS the teamd user space binary (or teamdctl tooling) is NOT included in 4.2 dev preview as of version 42.80.20190930.0, kernel 4.18.0-80.11.2.el8_0.x86_64.  The "team" kernel module does look to be included.

Given the emphasis we place in RHEL8 docs for teaming, and the lower performance overhead it offers, can we see what it would take to support in RHCOS?


Version-Release number of selected component (if applicable):
4.2.z

How reproducible:
RFE

Steps to Reproduce:
1. Build OCP4.2 cluster with custom ignition files or pass kernel parameters specifying teaming instead of bonding
2. Interfaces don't come up
3. Trouble


Expected results:
Support for teamed interfaces

Additional info:
https://www.redhat.com/en/blog/if-you-bonding-you-will-love-teaming

Comment 1 Micah Abbott 2019-10-03 15:47:48 UTC
4.2 is in code freeze and I don't see this as blocking release; we will evaluate for 4.3

Comment 2 Ben Breard 2019-10-07 14:00:15 UTC
Dave,

Networking is configured at provisioning time w/ RHCOS. We support bonding and teamed interfaces via the dracut cmdline options. When you stand up a node, simply append the desired ip= & team= info for your environment. From the dracut man page:
 team=<teammaster>:<teamslaves>
           Setup team device <teammaster> on top of <teamslaves>.
           <teamslaves> is a comma-separated list of physical (ethernet)
           interfaces.


Adding teamdctl to RHCOS would encourage users to execute commands on the host post provisioning, and we're working to avoid that and keep the host minimal where it makes sense. Is there anything lacking for your use case w/ the dracut workflow?

Comment 4 Micah Abbott 2019-11-08 19:07:51 UTC
Providing teaming information to the kernel command line should work the same as it does with RHEL.  The RHCOS team has not had the chance to specifically test a teaming config, but we believe it should be a non-issue.

However, we are going push the investigation of the teaming config in RHCOS to 4.4.

@dcain Please let us know if you have had a chance to test further.

Comment 5 Dave Cain 2019-11-08 21:33:59 UTC
I will test this out, just give me some time.

Comment 6 Dave Cain 2019-12-19 04:35:13 UTC
This doesn't appear to work for me with 4.3 nightly builds, specifically dracut-43.81.201912171100.0, dracut-049-27.git20190906.el8_1.1.

My test environment is a baremetal system that *does* work using bonding using the following passed in parameters:
bond=bond0:enp6s0,ens15:mode=active-backup,miimon=100,primary=enp6s0 ip=bond0:dhcp

However when using teaming, the system never appears to finish provisioning.  Are there other options I can pass to RHCoS to get a better idea of what is happening behind the scenes to debug further?  This fails:
team=team0:enp6s0,ens15 ip=team0:dhcp

Also, I'm not certain how dracut actually supports teamd runner configuration (as only the interfaces are specified as Ben says in Comment 2), which is used to specify whether or not one wants to use something like activebackup, roundrobin, lacp, etc like one can with Kickstart:

...
network --device=team0 --activate --bootproto=dhcp --teamslaves="enp6s0,ens15" --teamconfig="{\"runner\": {\"name\": \"activebackup\"}}"
...

Comment 11 Dusty Mabe 2020-03-05 20:34:12 UTC
Hey Dave,

I did some investigation into this. The teaming userspace components didn't exist in RHCOS or Fedora CoreOS. We added those to both. However it seems as if dracut support for teaming interfaces is really not sufficient. For example the teamd process started by dracut doesn't survive the switch to the real root (which I assume is expected) so you end up in a half state. In the near future we'd like to run NetworkManager in the initramfs which we think will handle some corner cases like this. Until that time the short term solution will be to use single NIC in the initramfs to grab the ignition config and then have ignition lay down a networking configuration for teaming. For example, in my local node here I'm doing testing on I have ignition create these three files:

- etc/sysconfig/network-scripts/ifcfg-team0
- etc/sysconfig/network-scripts/ifcfg-ens2
- etc/sysconfig/network-scripts/ifcfg-ens3

with contents:

```
$ tail -n 30  etc/sysconfig/network-scripts/*
==> etc/sysconfig/network-scripts/ifcfg-ens2 <==
DEVICE=ens2
DEVICETYPE=TeamPort
ONBOOT=yes
TEAM_MASTER=team0
TEAM_PORT_CONFIG='{"prio": 100}'

==> etc/sysconfig/network-scripts/ifcfg-ens3 <==
DEVICE=ens3
DEVICETYPE=TeamPort
ONBOOT=yes
TEAM_MASTER=team0
TEAM_PORT_CONFIG='{"prio": 100}'

==> etc/sysconfig/network-scripts/ifcfg-team0 <==
DEVICE=team0
NAME=team0
DEVICETYPE=Team
ONBOOT=yes
BOOTPROTO=dhcp
TEAM_CONFIG='{"runner": {"name": "activebackup"}, "link_watch": {"name": "ethtool"}}'
```



Alternatively, if you'd like to use NM key files you can create the following 3 files using ignition:


- etc/NetworkManager/system-connections/team0.nmconnection
- etc/NetworkManager/system-connections/team0-slave-ens2.nmconnection
- etc/NetworkManager/system-connections/team0-slave-ens3.nmconnection

with contents something like:

```
$ tail -n 30 etc/NetworkManager/system-connections/team0* 
==> etc/NetworkManager/system-connections/team0-slave-ens2.nmconnection <==
[connection]
id=team0-slave-ens2
type=ethernet
interface-name=ens2
master=team0
slave-type=team
[team-port]
config={"prio": 100}

==> etc/NetworkManager/system-connections/team0-slave-ens3.nmconnection <==
[connection]
id=team0-slave-ens3
type=ethernet
interface-name=ens3
master=team0
slave-type=team
[team-port]
config={"prio": 100}

==> etc/NetworkManager/system-connections/team0.nmconnection <==
[connection]
id=team0
type=team
interface-name=team0
[team]
config={"runner": {"name": "activebackup"}, "link_watch": {"name": "ethtool"}}
```

Comment 12 Micah Abbott 2020-03-06 00:47:07 UTC
The required packages were landed in RHCOS 44.81.202003052104-0 and will be available in all future builds.

Comment 14 Michael Nguyen 2020-03-06 17:24:06 UTC
Verified on RHCOS 44.81.202003052104-0 with nm key files and network scripts (ignition files below) on libvirt.

[core@localhost ~]$ rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
* ostree://7dbeaecd35b1b336815f70ee2eb2a9ac56b2a027705c5fd46f117fbd4892f261
                   Version: 44.81.202003052104-0 (2020-03-05T21:09:26Z)

[core@localhost ~]$ rpm -qa | grep team
NetworkManager-team-1.20.0-5.el8_1.x86_64
libteam-1.28-4.el8.x86_64
teamd-1.28-4.el8.x86_64

[core@localhost ~]$ ip addr show team0
4: team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:f6:f3:41 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.82/24 brd 192.168.122.255 scope global dynamic noprefixroute team0
       valid_lft 3433sec preferred_lft 3433sec
    inet6 fe80::cd1d:f7a2:ea7d:4cf5/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever


{
  "ignition": {
    "version": "2.2.0"
  },
  "storage": {
    "files": [
      {
        "filesystem": "root",
        "path": "/etc/sudoers",
        "append": true,
        "contents": {
          "source": "data:,core%20ALL=(ALL)%20NOPASSWD:%20ALL"
        }
      },
      {
        "filesystem": "root",
        "path": "/etc/NetworkManager/system-connections/team0-slave-enp1s0.nmconnection",
        "contents": {
          "source": "data:text/plain;base64,W2Nvbm5lY3Rpb25dCmlkPXRlYW0wLXNsYXZlLWVucDFzMAp0eXBlPWV0aGVybmV0CmludGVyZmFjZS1uYW1lPWVucDFzMAptYXN0ZXI9dGVhbTAKc2xhdmUtdHlwZT10ZWFtClt0ZWFtLXBvcnRdCmNvbmZpZz17InByaW8iOiAxMDB9Cg=="
        }
      },
      {
        "filesystem": "root",
        "path": "/etc/NetworkManager/system-connections/team0-slave-enp2s0.nmconnection",
        "contents": {
          "source": "data:text/plain;base64,W2Nvbm5lY3Rpb25dCmlkPXRlYW0wLXNsYXZlLWVucDJzMAp0eXBlPWV0aGVybmV0CmludGVyZmFjZS1uYW1lPWVucDJzMAptYXN0ZXI9dGVhbTAKc2xhdmUtdHlwZT10ZWFtClt0ZWFtLXBvcnRdCmNvbmZpZz17InByaW8iOiAxMDB9Cg=="
        }
      },
      {
        "filesystem": "root",
        "path": "/etc/NetworkManager/system-connections/team0.nmconnection",
        "contents": {
          "source": "data:text/plain;base64,W2Nvbm5lY3Rpb25dCmlkPXRlYW0wCnR5cGU9dGVhbQppbnRlcmZhY2UtbmFtZT10ZWFtMApbdGVhbV0KY29uZmlnPXsicnVubmVyIjogeyJuYW1lIjogImFjdGl2ZWJhY2t1cCJ9LCAibGlua193YXRjaCI6IHsibmFtZSI6ICJldGh0b29sIn19Cg=="
        }
      }
    ]
  },
  "passwd": {
    "users": [
      {
        "name": "core",
        "passwordHash": "$6$PJRvsSuFKHM6A57Y$9iwTakmSLiEzgCuJ1T40jpqby2Q3pB/sZE2KZym1x3yd3rJZFSD8oE5bmjXBEsOpea/5MtWdn6QgzxCpbWM7J.",
        "sshAuthorizedKeys": [
          "ssh-rsa AAA..
      }
    ]
  }
}


{
    "ignition": {"version": "2.2.0"},
    "storage": {
    "files": [
      {
        "filesystem": "root",
        "path": "/etc/sudoers",
        "append": true,
        "contents": { "source": "data:,core%20ALL=(ALL)%20NOPASSWD:%20ALL" }
      },
      {
        "filesystem": "root",
        "path": "/etc/sysconfig/network-scripts/ifcfg-enp1s0",
        "contents": { "source": "data:text/plain;base64,REVWSUNFPWVucDFzMApERVZJQ0VUWVBFPVRlYW1Qb3J0Ck9OQk9PVD15ZXMKVEVBTV9NQVNURVI9dGVhbTAKVEVBTV9QT1JUX0NPTkZJRz0neyJwcmlvIjogMTAwfScK" }
      },
      {
        "filesystem": "root",
        "path": "/etc/sysconfig/network-scripts/ifcfg-enp2s0",
        "contents": { "source": "data:text/plain;base64,REVWSUNFPWVucDJzMApERVZJQ0VUWVBFPVRlYW1Qb3J0Ck9OQk9PVD15ZXMKVEVBTV9NQVNURVI9dGVhbTAKVEVBTV9QT1JUX0NPTkZJRz0neyJwcmlvIjogMTAwfScK" }
      },
      {
        "filesystem": "root",
        "path": "/etc/sysconfig/network-scripts/ifcfg-team0",
        "contents": { "source": "data:text/plain;base64,REVWSUNFPXRlYW0wCk5BTUU9dGVhbTAKREVWSUNFVFlQRT1UZWFtCk9OQk9PVD15ZXMKQk9PVFBST1RPPWRoY3AKVEVBTV9DT05GSUc9J3sicnVubmVyIjogeyJuYW1lIjogImFjdGl2ZWJhY2t1cCJ9LCAibGlua193YXRjaCI6IHsibmFtZSI6ICJldGh0b29sIn19Jwo=" }
      }
    ]
    },
    "passwd": {
      "users": [
        {
          "name": "core",
          "passwordHash": "$6$PJRvsSuFKHM6A57Y$9iwTakmSLiEzgCuJ1T40jpqby2Q3pB/sZE2KZym1x3yd3rJZFSD8oE5bmjXBEsOpea/5MtWdn6QgzxCpbWM7J.",
          "sshAuthorizedKeys": [
            "ssh-rsa AAA.."
          ]
        }
      ]
    }
}

Comment 15 Dusty Mabe 2020-03-07 04:48:49 UTC
related docs bug: https://bugzilla.redhat.com/show_bug.cgi?id=1811047

Comment 17 errata-xmlrpc 2020-05-04 11:13:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.