1986921 – Team interface fails to come up in RHCOS 4.6.8

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1986921 - Team interface fails to come up in RHCOS 4.6.8

Summary: Team interface fails to come up in RHCOS 4.6.8

Keywords:
Status:	CLOSED DUPLICATE of bug 1994246
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	NetworkManager
Sub Component:
Version:	8.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	beta
Target Release:	---
Assignee:	NetworkManager Development Team
QA Contact:	Desktop QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-07-28 14:25 UTC by Akash Semil
Modified:	2021-08-17 06:30 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-08-17 06:30:40 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	miabbott: needinfo-

Attachments	(Terms of Use)
Kernel command line (167 bytes, text/plain) 2021-07-28 14:32 UTC, Akash Semil	no flags	Details
Serial console Logs (158.16 KB, application/octet-stream) 2021-07-28 14:41 UTC, Akash Semil	no flags	Details
ens192.nmconnection (191 bytes, text/plain) 2021-07-28 14:42 UTC, Akash Semil	no flags	Details
ens224.nmconnection (191 bytes, text/plain) 2021-07-28 14:43 UTC, Akash Semil	no flags	Details
team0.nmconnection (218 bytes, text/plain) 2021-07-28 14:45 UTC, Akash Semil	no flags	Details
Serial Console Log 2 (126.33 KB, text/plain) 2021-07-29 14:33 UTC, Akash Semil	no flags	Details
dist-git patch for backport of "fix team in initrd" (18.11 KB, patch) 2021-08-11 14:12 UTC, Thomas Haller	no flags	Details \| Diff
View All

Description Akash Semil 2021-07-28 14:25:58 UTC

OCP Version at Install Time: 4.6.40
RHCOS Version at Install Time: 4.6.8 -- 46.82.202012051820-0
Platform: bare-metal installation on Vsphere
Architecture: x86_64

What is your use case?

Installing a worker node with network teaming configuration.

Process Followed:

> Booting into RHCOS live
> Configuring Network using nmcli / nmtui
> Installing RHCOS using coreos-installer
> Reboot
> Team interface is failing to come up


EXPECTATION: 

> Team interface should come up and DHCP should assign IP to the node then it should fetch the machine config from the machine config server


ENVIRONMENT:

DHCP / DNS / HTTP Server (Ignition File) : 10.2.10.254 (ddns.example.com)
Gateway: 10.2.10.1

Node: 

CPU: 4
Memory: 16 G
Storage: 120 G
Network Interface: ens192 , ens224 (1Gbps)

Steps to REPRODUCE: 

~~~

> Boot into live RHCOS
> Network Configuration:
  
// Delete exsiting connections
  # nmcli con del 'Wired connection'

// Create team
  # nmcli con add con-name team0 ifname team0 type team

// Create team-slave ens192
  # nmcli con add con-name ens192 ifname ens192 type ethernet master team0

// Create team-slave ens224
  # nmcli con add con-name ens224 ifname ens224 type ethernet master team0

// Starting connection
  # nmcli con up team0

// Installing coreos
  # coreos-installer install --insecure-ignition --copy-network --ignition-url=http://ddns.example.com/pxe/rhcos/worker.ign /dev/sda

//REBOOT
  # reboot

~~~

The full contents of the serial console showing disk initialization, network configuration, and Ignition stage:
- FILE: serial.log

Kernel command line (`cat /proc/cmdline`)
- FILE: cmdline

Contents of `/etc/NetworkManager/system-connections/`
- FILES: ens192.nmconnection | ens224.nmconnection | team0.nmconnection

Contents of `/etc/sysconfig/network-scripts/`
- NO FILES : DIR is empty

Comment 1 Akash Semil 2021-07-28 14:32:02 UTC

Created attachment 1806744 [details]
Kernel command line

Comment 2 Akash Semil 2021-07-28 14:41:16 UTC

Created attachment 1806745 [details]
Serial console Logs

Comment 3 Akash Semil 2021-07-28 14:42:42 UTC

Created attachment 1806746 [details]
ens192.nmconnection

Comment 4 Akash Semil 2021-07-28 14:43:54 UTC

Created attachment 1806747 [details]
ens224.nmconnection

Comment 5 Akash Semil 2021-07-28 14:45:26 UTC

Created attachment 1806748 [details]
team0.nmconnection

Comment 6 Akash Semil 2021-07-28 14:51:39 UTC

The serial console logs (serial.log) are the logs after the installing and rebooting of the RHCOS.

Comment 7 Luca BRUNO 2021-07-29 11:58:38 UTC

From the serial logs, I think the network setup is failing at this point:
```
[   15.161401] teamd_team0[857]: teamd_init() failed.
[   15.177451] teamd_team0[857]: Failed: Invalid argument
[   15.212931] NetworkManager[849]: <warn>  [1627477740.5617] device (team0): teamd process 857 quit unexpectedly; failing activation
```

However I'm not sure what is causing that, and I don't think we have seen anything similar so far.

Before further digging into the issue, there are actually newer 4.6 bootimages which may contain fresher NetworkManager packages and possibly improve the situation.
Can you please grab the latest 4.6.40 live ISO and try to install from there?
If that one too fails to provision the teaming, please attach the new generated connection profiles and the new serial log, thanks!

Comment 8 Akash Semil 2021-07-29 14:19:08 UTC

As requested I tried the same with RHCOS 4.6.40.

OUTPUT from the live boot:

OUTPUT - /etc/os-release

~~~
NAME="Red Hat Enterprise Linux CoreOS"
VERSION="46.82.202106161040-0"
VERSION_ID="4.6"
OPENSHIFT_VERSION="4.6"
RHEL_VERSION="8.2"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 46.82.202106161040-0 (Ootpa)"
ID="rhcos"
ID_LIKE="rhel fedora"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::coreos"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.6"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.6"
OSTREE_VERSION='46.82.202106161040-0'
~~~

OUTPUT - Network Configuration - /etc/NetworkManager/system-connections/ 

FILE - ens192.nmconnection

~~~
[connection]
id=ens192
uuid=7b0b1069-1758-4140-87a1-2537f1f8280f
type=ethernet
interface-name=ens192
master=team0
permissions=
slave-type=team

[ethernet]
mac-address-blacklist=

[team-port]
~~~

FILE -ens224.nmconnection

~~~
[connection]
id=ens224
uuid=af298f6f-8d3d-4baa-9842-0f8a123c151f
type=ethernet
interface-name=ens224
master=team0
permissions=
slave-type=team

[ethernet]
mac-address-blacklist=

[team-port]
~~~

FILE - team0.nmconnection

~~~
[connection]
id=team0
uuid=a529e158-43f1-410e-9a7d-551848c3a1b5
type=team
interface-name=team0
permissions=

[team]

[ipv4]
dns-search=
method=auto

[ipv6]
addr-gen-mode=stable-privacy
dns-search=
method=auto

[proxy]
~~~


OUTPUT - IP Address while in Live Boot 

~~~
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 state UP group default qlen 1000
    link/ether 00:50:56:a3:9f:7e brd ff:ff:ff:ff:ff:ff
3: ens224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 state UP group default qlen 1000
    link/ether 00:50:56:a3:9f:7e brd ff:ff:ff:ff:ff:ff
4: team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:50:56:a3:9f:7e brd ff:ff:ff:ff:ff:ff
    inet 10.2.10.79/24 brd 10.2.10.255 scope global dynamic noprefixroute team0
       valid_lft 268sec preferred_lft 268sec
    inet6 fe80::194b:9664:3a45:e7fd/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
~~~

OUTPUT - dig output ($ dig ddns.example.com) while in Live Boot

~~~
; <<>> DiG 9.11.13-RedHat-9.11.13-6.el8_2.3 <<>> ddns.example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29868
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 303f6174f815ad0d6d2d4c756102ac3001f6a96d05031b03 (good)
;; QUESTION SECTION:
;ddns.example.com.	IN	A

;; ANSWER SECTION:
ddns.example.com. 10800	IN	A	10.2.10.254

;; AUTHORITY SECTION:
example.com.	10800	IN	NS	ddns.example.com.

;; Query time: 1 msec
;; SERVER: 10.2.10.254#53(10.2.10.254)
;; WHEN: Thu Jul 29 13:25:05 UTC 2021
;; MSG SIZE  rcvd: 110
~~~

OUTPUT - teamd state ($ teamdctl team0 state view)

~~~
setup:
  runner: roundrobin
ports:
  ens192
    link watches:
      link summary: up
      instance[link_watch_0]:
        name: ethtool
        link: up
        down count: 0
  ens224
    link watches:
      link summary: up
      instance[link_watch_0]:
        name: ethtool
        link: up
        down count: 0
~~~


OUTPUT - coreos-install output ($ coreos-installer install --insecure-ignition --copy-network --ignition-url=http://ddns.example.com/pxe/rhcos/worker.ign /dev/sda)

~~~
Installing Red Hat Enterprise Linux CoreOS 46.82.202106161040-0 (Ootpa) x86_64 (512-byte sectors)
Read disk 3.3 GiB/3.3 GiB (100%)
Writing Ignition config
Copying networking configuration from /etc/NetworkManager/system-connections/
Copying /etc/NetworkManager/system-connections/team0.nmconnection to installed system
Copying /etc/NetworkManager/system-connections/ens192.nmconnection to installed system
Copying /etc/NetworkManager/system-connections/ens224.nmconnection to installed system
Install complete.
~~~

After this, I reboot the system, now the RHCOS boot from the hard drive

Attaching the serial logs: FILE - Serial Console Log 2

Comment 9 Akash Semil 2021-07-29 14:33:01 UTC

Created attachment 1807145 [details]
Serial Console Log 2

Comment 10 Micah Abbott 2021-07-30 13:16:53 UTC

Still seeing similar errors in the latest attempt with the new 4.6 boot media (using NetworkManager version 1.22.8-7.el8_2)

```
[   13.266838] teamd_team0[875]: dbus: Could not acquire the system bus: org.freedesktop.DBus.Error.FileNotFound - Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
[   13.275233] NetworkManager[866]: <info>  [1627565985.8321] manager: NetworkManager state is now CONNECTING
[   13.278671] dracut-initqueue[850]: Daemon not running
[   13.280942] dracut-initqueue[850]: Daemon not running
[   13.282283] teamd_team0[875]: Failed to init dbus.
[   13.286941] NetworkManager[866]: <info>  [1627565985.8321] device (ens224): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
[   13.292494] NetworkManager[866]: <info>  [1627565985.8322] device (team0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
[   13.301516] NetworkManager[866]: <info>  [1627565985.8335] device (team0): Activation: (team) started teamd [pid 875]...
[   13.306011] NetworkManager[866]: <info>  [1627565985.8337] device (ens192): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
[   13.313891] NetworkManager[866]: <info>  [1627565985.8343] device (ens224): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
[   13.321557] teamd_team0[875]: teamd_init() failed.
[   13.324630] NetworkManager[866]: <info>  [1627565985.8347] device (ens192): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
[   13.332611] teamd_team0[875]: Failed: Invalid argument
[   13.335035] NetworkManager[866]: <info>  [1627565985.8348] device (ens224): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
[   13.342934] ignition[833]: GET https://api-int.ocp.example.com:22623/config/worker: attempt #3
[   13.346144] systemd-udevd[871]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
[   13.350634] ignition[833]: GET error: Get "https://api-int.ocp.example.com:22623/config/worker": dial tcp: lookup api-int.ocp.example.com on [::1]:53: read udp [::1]:54304->[::1]:53: read: connection refused
[   13.358690] systemd-udevd[871]: Could not generate persistent MAC address for team0: No such file or directory
[   13.364556] NetworkManager[866]: <warn>  [1627565985.9074] device (team0): teamd process 875 quit unexpectedly; failing activation
[   13.369469] NetworkManager[866]: <info>  [1627565985.9074] device (team0): state change: prepare -> failed (reason 'teamd-control-failed', sys-iface-state: 'managed')
```

Not sure if the DBus errors are fatal, but that looks concerning.

@Beniamino do you think you could have a look?

Comment 11 Micah Abbott 2021-07-30 13:32:20 UTC

Based on feedback from the other folks on the CoreOS team, we believe this is a problem with the interaction between NetworkManager and teamd.

Sending this to the NetworkManager directly for additional triage.

Comment 13 Thomas Haller 2021-07-30 14:00:35 UTC

> Still seeing similar errors in the latest attempt with the new 4.6 boot media (using NetworkManager version 1.22.8-7.el8_2)

With that version?


You'd need fix bug 1784363 (upstream 1.26.0 [1]).

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/b74c333413dfeaff5fb6007981c41c5fcfe8cb4c


This is a dupe of 1784363.

Comment 16 Thomas Haller 2021-08-02 17:44:29 UTC

(In reply to Akash Semil from comment #14)
> Will this issue will be resolved in any new RHCOS 4.6.X version.

I am not familiar with RHCOS version numbers.

The issue will be fixed if you use NetworkManager 1.26.0 or newer.

The issue might be fixed, if we decide to do a backport for older versions (Z-stream). Such a backport was not done nor is it planned -- so far.

Comment 17 Gris Ge 2021-08-09 07:52:20 UTC

Hi Thomas,

Could you provide a 8.2 backport rpm scratch build for Micah to do a initial tryout?

Comment 18 Thomas Haller 2021-08-11 14:12:03 UTC

Created attachment 1813161 [details]
dist-git patch for backport of "fix team in initrd"

Dist-git patch for rhel-8.2.0, NetworkManager-1.22.8-8.rh1986921.1, applies on f3219afe27c5eca35c95f143479beacfb4302e81.


Find a scratch build of this here:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=38885784

Comment 19 Thomas Haller 2021-08-11 14:13:40 UTC

@miabbott any chance to test scratch build from comment 18?

Comment 20 Micah Abbott 2021-08-12 15:13:34 UTC

- Built a custom RHCOS 4.6 using the scratch NetworkManager build provided in comment #18
- Ran through reproducer steps in comment #1
- Confirmed `team0` activation in installed system

```
[core@localhost ~]$ rpm-ostree status 
State: idle
Deployments:
* ostree://113b8fc449600115990b96639b0dd757eaef6cc5788e334e6890c253c5b3d36e
                   Version: 46.82.202108121350-0 (2021-08-12T13:52:42Z)
[core@localhost ~]$ rpm -q NetworkManager
NetworkManager-1.22.8-8.rh1986921.1.el8_2.x86_64
[core@localhost ~]$ ls -l /etc/NetworkManager/system-connections/
total 12
-rw-------. 1 root root 191 Aug 12 15:08 enp1s0.nmconnection
-rw-------. 1 root root 191 Aug 12 15:08 enp2s0.nmconnection
-rw-------. 1 root root 218 Aug 12 15:08 team0.nmconnection
[core@localhost ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master team0 state UP group default qlen 1000
    link/ether 52:54:00:2d:a4:5d brd ff:ff:ff:ff:ff:ff
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master team0 state UP group default qlen 1000
    link/ether 52:54:00:2d:a4:5d brd ff:ff:ff:ff:ff:ff
4: team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:2d:a4:5d brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.200/24 brd 192.168.122.255 scope global dynamic noprefixroute team0
       valid_lft 3405sec preferred_lft 3405sec
    inet6 fe80::29:7541:e676:ace5/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
[core@localhost ~]$ nmcli con show
NAME    UUID                                  TYPE      DEVICE 
team0   af9dc856-c26f-4b48-b39f-26a47b447675  team      team0  
enp1s0  774eb31c-bdbe-4a07-a8d2-b3244e374d84  ethernet  enp1s0 
enp2s0  5cffac48-d13a-4f76-98f7-e624b6e768fa  ethernet  enp2s0 
[core@localhost ~]$ sudo teamdctl team0 state
setup:
  runner: roundrobin
ports:
  enp1s0
    link watches:
      link summary: up
      instance[link_watch_0]:
        name: ethtool
        link: up
        down count: 0
  enp2s0
    link watches:
      link summary: up
      instance[link_watch_0]:
        name: ethtool
        link: up
        down count: 0
```

Comment 21 Gris Ge 2021-08-17 06:30:40 UTC

Thanks for the test feedback!

The 8.2.0.z has been approved at https://bugzilla.redhat.com/show_bug.cgi?id=1994246


Closing as dup of it.

*** This bug has been marked as a duplicate of bug 1994246 ***

Note You need to log in before you can comment on or make changes to this bug.