Bug 1986921
| Summary: | Team interface fails to come up in RHCOS 4.6.8 | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Akash Semil <asemil> | ||||||||||||||||
| Component: | NetworkManager | Assignee: | NetworkManager Development Team <nm-team> | ||||||||||||||||
| Status: | CLOSED DUPLICATE | QA Contact: | Desktop QE <desktop-qa-list> | ||||||||||||||||
| Severity: | medium | Docs Contact: | |||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||
| Version: | 8.2 | CC: | asemil, atragler, bgalvani, dornelas, fge, jligon, lrintel, lucab, miabbott, mrussell, nstielau, rkhan, sukulkar, thaller, till | ||||||||||||||||
| Target Milestone: | beta | Flags: | miabbott:
needinfo-
pm-rhel: mirror+ |
||||||||||||||||
| Target Release: | --- | ||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||
| OS: | Linux | ||||||||||||||||||
| Whiteboard: | |||||||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||
| Last Closed: | 2021-08-17 06:30:40 UTC | Type: | Bug | ||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
| Embargoed: | |||||||||||||||||||
| Attachments: |
|
||||||||||||||||||
|
Description
Akash Semil
2021-07-28 14:25:58 UTC
Created attachment 1806744 [details]
Kernel command line
Created attachment 1806745 [details]
Serial console Logs
Created attachment 1806746 [details]
ens192.nmconnection
Created attachment 1806747 [details]
ens224.nmconnection
Created attachment 1806748 [details]
team0.nmconnection
The serial console logs (serial.log) are the logs after the installing and rebooting of the RHCOS. From the serial logs, I think the network setup is failing at this point: ``` [ 15.161401] teamd_team0[857]: teamd_init() failed. [ 15.177451] teamd_team0[857]: Failed: Invalid argument [ 15.212931] NetworkManager[849]: <warn> [1627477740.5617] device (team0): teamd process 857 quit unexpectedly; failing activation ``` However I'm not sure what is causing that, and I don't think we have seen anything similar so far. Before further digging into the issue, there are actually newer 4.6 bootimages which may contain fresher NetworkManager packages and possibly improve the situation. Can you please grab the latest 4.6.40 live ISO and try to install from there? If that one too fails to provision the teaming, please attach the new generated connection profiles and the new serial log, thanks! As requested I tried the same with RHCOS 4.6.40. OUTPUT from the live boot: OUTPUT - /etc/os-release ~~~ NAME="Red Hat Enterprise Linux CoreOS" VERSION="46.82.202106161040-0" VERSION_ID="4.6" OPENSHIFT_VERSION="4.6" RHEL_VERSION="8.2" PRETTY_NAME="Red Hat Enterprise Linux CoreOS 46.82.202106161040-0 (Ootpa)" ID="rhcos" ID_LIKE="rhel fedora" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:8::coreos" HOME_URL="https://www.redhat.com/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform" REDHAT_BUGZILLA_PRODUCT_VERSION="4.6" REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform" REDHAT_SUPPORT_PRODUCT_VERSION="4.6" OSTREE_VERSION='46.82.202106161040-0' ~~~ OUTPUT - Network Configuration - /etc/NetworkManager/system-connections/ FILE - ens192.nmconnection ~~~ [connection] id=ens192 uuid=7b0b1069-1758-4140-87a1-2537f1f8280f type=ethernet interface-name=ens192 master=team0 permissions= slave-type=team [ethernet] mac-address-blacklist= [team-port] ~~~ FILE -ens224.nmconnection ~~~ [connection] id=ens224 uuid=af298f6f-8d3d-4baa-9842-0f8a123c151f type=ethernet interface-name=ens224 master=team0 permissions= slave-type=team [ethernet] mac-address-blacklist= [team-port] ~~~ FILE - team0.nmconnection ~~~ [connection] id=team0 uuid=a529e158-43f1-410e-9a7d-551848c3a1b5 type=team interface-name=team0 permissions= [team] [ipv4] dns-search= method=auto [ipv6] addr-gen-mode=stable-privacy dns-search= method=auto [proxy] ~~~ OUTPUT - IP Address while in Live Boot ~~~ 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 state UP group default qlen 1000 link/ether 00:50:56:a3:9f:7e brd ff:ff:ff:ff:ff:ff 3: ens224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 state UP group default qlen 1000 link/ether 00:50:56:a3:9f:7e brd ff:ff:ff:ff:ff:ff 4: team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:50:56:a3:9f:7e brd ff:ff:ff:ff:ff:ff inet 10.2.10.79/24 brd 10.2.10.255 scope global dynamic noprefixroute team0 valid_lft 268sec preferred_lft 268sec inet6 fe80::194b:9664:3a45:e7fd/64 scope link noprefixroute valid_lft forever preferred_lft forever ~~~ OUTPUT - dig output ($ dig ddns.example.com) while in Live Boot ~~~ ; <<>> DiG 9.11.13-RedHat-9.11.13-6.el8_2.3 <<>> ddns.example.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29868 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ; COOKIE: 303f6174f815ad0d6d2d4c756102ac3001f6a96d05031b03 (good) ;; QUESTION SECTION: ;ddns.example.com. IN A ;; ANSWER SECTION: ddns.example.com. 10800 IN A 10.2.10.254 ;; AUTHORITY SECTION: example.com. 10800 IN NS ddns.example.com. ;; Query time: 1 msec ;; SERVER: 10.2.10.254#53(10.2.10.254) ;; WHEN: Thu Jul 29 13:25:05 UTC 2021 ;; MSG SIZE rcvd: 110 ~~~ OUTPUT - teamd state ($ teamdctl team0 state view) ~~~ setup: runner: roundrobin ports: ens192 link watches: link summary: up instance[link_watch_0]: name: ethtool link: up down count: 0 ens224 link watches: link summary: up instance[link_watch_0]: name: ethtool link: up down count: 0 ~~~ OUTPUT - coreos-install output ($ coreos-installer install --insecure-ignition --copy-network --ignition-url=http://ddns.example.com/pxe/rhcos/worker.ign /dev/sda) ~~~ Installing Red Hat Enterprise Linux CoreOS 46.82.202106161040-0 (Ootpa) x86_64 (512-byte sectors) Read disk 3.3 GiB/3.3 GiB (100%) Writing Ignition config Copying networking configuration from /etc/NetworkManager/system-connections/ Copying /etc/NetworkManager/system-connections/team0.nmconnection to installed system Copying /etc/NetworkManager/system-connections/ens192.nmconnection to installed system Copying /etc/NetworkManager/system-connections/ens224.nmconnection to installed system Install complete. ~~~ After this, I reboot the system, now the RHCOS boot from the hard drive Attaching the serial logs: FILE - Serial Console Log 2 Created attachment 1807145 [details]
Serial Console Log 2
Still seeing similar errors in the latest attempt with the new 4.6 boot media (using NetworkManager version 1.22.8-7.el8_2) ``` [ 13.266838] teamd_team0[875]: dbus: Could not acquire the system bus: org.freedesktop.DBus.Error.FileNotFound - Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory [ 13.275233] NetworkManager[866]: <info> [1627565985.8321] manager: NetworkManager state is now CONNECTING [ 13.278671] dracut-initqueue[850]: Daemon not running [ 13.280942] dracut-initqueue[850]: Daemon not running [ 13.282283] teamd_team0[875]: Failed to init dbus. [ 13.286941] NetworkManager[866]: <info> [1627565985.8321] device (ens224): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') [ 13.292494] NetworkManager[866]: <info> [1627565985.8322] device (team0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') [ 13.301516] NetworkManager[866]: <info> [1627565985.8335] device (team0): Activation: (team) started teamd [pid 875]... [ 13.306011] NetworkManager[866]: <info> [1627565985.8337] device (ens192): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') [ 13.313891] NetworkManager[866]: <info> [1627565985.8343] device (ens224): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') [ 13.321557] teamd_team0[875]: teamd_init() failed. [ 13.324630] NetworkManager[866]: <info> [1627565985.8347] device (ens192): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') [ 13.332611] teamd_team0[875]: Failed: Invalid argument [ 13.335035] NetworkManager[866]: <info> [1627565985.8348] device (ens224): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') [ 13.342934] ignition[833]: GET https://api-int.ocp.example.com:22623/config/worker: attempt #3 [ 13.346144] systemd-udevd[871]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. [ 13.350634] ignition[833]: GET error: Get "https://api-int.ocp.example.com:22623/config/worker": dial tcp: lookup api-int.ocp.example.com on [::1]:53: read udp [::1]:54304->[::1]:53: read: connection refused [ 13.358690] systemd-udevd[871]: Could not generate persistent MAC address for team0: No such file or directory [ 13.364556] NetworkManager[866]: <warn> [1627565985.9074] device (team0): teamd process 875 quit unexpectedly; failing activation [ 13.369469] NetworkManager[866]: <info> [1627565985.9074] device (team0): state change: prepare -> failed (reason 'teamd-control-failed', sys-iface-state: 'managed') ``` Not sure if the DBus errors are fatal, but that looks concerning. @Beniamino do you think you could have a look? Based on feedback from the other folks on the CoreOS team, we believe this is a problem with the interaction between NetworkManager and teamd. Sending this to the NetworkManager directly for additional triage. > Still seeing similar errors in the latest attempt with the new 4.6 boot media (using NetworkManager version 1.22.8-7.el8_2) With that version? You'd need fix bug 1784363 (upstream 1.26.0 [1]). [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/b74c333413dfeaff5fb6007981c41c5fcfe8cb4c This is a dupe of 1784363. (In reply to Akash Semil from comment #14) > Will this issue will be resolved in any new RHCOS 4.6.X version. I am not familiar with RHCOS version numbers. The issue will be fixed if you use NetworkManager 1.26.0 or newer. The issue might be fixed, if we decide to do a backport for older versions (Z-stream). Such a backport was not done nor is it planned -- so far. Hi Thomas, Could you provide a 8.2 backport rpm scratch build for Micah to do a initial tryout? Created attachment 1813161 [details] dist-git patch for backport of "fix team in initrd" Dist-git patch for rhel-8.2.0, NetworkManager-1.22.8-8.rh1986921.1, applies on f3219afe27c5eca35c95f143479beacfb4302e81. Find a scratch build of this here: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=38885784 @miabbott any chance to test scratch build from comment 18? - Built a custom RHCOS 4.6 using the scratch NetworkManager build provided in comment #18 - Ran through reproducer steps in comment #1 - Confirmed `team0` activation in installed system ``` [core@localhost ~]$ rpm-ostree status State: idle Deployments: * ostree://113b8fc449600115990b96639b0dd757eaef6cc5788e334e6890c253c5b3d36e Version: 46.82.202108121350-0 (2021-08-12T13:52:42Z) [core@localhost ~]$ rpm -q NetworkManager NetworkManager-1.22.8-8.rh1986921.1.el8_2.x86_64 [core@localhost ~]$ ls -l /etc/NetworkManager/system-connections/ total 12 -rw-------. 1 root root 191 Aug 12 15:08 enp1s0.nmconnection -rw-------. 1 root root 191 Aug 12 15:08 enp2s0.nmconnection -rw-------. 1 root root 218 Aug 12 15:08 team0.nmconnection [core@localhost ~]$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master team0 state UP group default qlen 1000 link/ether 52:54:00:2d:a4:5d brd ff:ff:ff:ff:ff:ff 3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master team0 state UP group default qlen 1000 link/ether 52:54:00:2d:a4:5d brd ff:ff:ff:ff:ff:ff 4: team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:2d:a4:5d brd ff:ff:ff:ff:ff:ff inet 192.168.122.200/24 brd 192.168.122.255 scope global dynamic noprefixroute team0 valid_lft 3405sec preferred_lft 3405sec inet6 fe80::29:7541:e676:ace5/64 scope link noprefixroute valid_lft forever preferred_lft forever [core@localhost ~]$ nmcli con show NAME UUID TYPE DEVICE team0 af9dc856-c26f-4b48-b39f-26a47b447675 team team0 enp1s0 774eb31c-bdbe-4a07-a8d2-b3244e374d84 ethernet enp1s0 enp2s0 5cffac48-d13a-4f76-98f7-e624b6e768fa ethernet enp2s0 [core@localhost ~]$ sudo teamdctl team0 state setup: runner: roundrobin ports: enp1s0 link watches: link summary: up instance[link_watch_0]: name: ethtool link: up down count: 0 enp2s0 link watches: link summary: up instance[link_watch_0]: name: ethtool link: up down count: 0 ``` Thanks for the test feedback! The 8.2.0.z has been approved at https://bugzilla.redhat.com/show_bug.cgi?id=1994246 Closing as dup of it. *** This bug has been marked as a duplicate of bug 1994246 *** |