1273052 – teamd fails to start after reboot

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1273052 - teamd fails to start after reboot

Summary: teamd fails to start after reboot

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	libteam
Sub Component:
Version:	7.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Marcelo Ricardo Leitner
QA Contact:	Amit Supugade
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1301628 1313485
TreeView+	depends on / blocked

Reported:	2015-10-19 13:09 UTC by ctcard
Modified:	2016-11-04 01:00 UTC (History)
CC List:	7 users (show)
Fixed In Version:	libteam-1.25-4.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-04 01:00:31 UTC
Target Upstream Version:
Embargoed:
Flags:	lxin: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2016:2219	0	normal	SHIPPED_LIVE	libteam bug fix update	2016-11-03 13:25:19 UTC

Description ctcard 2015-10-19 13:09:26 UTC

Description of problem:
We are seeing occasionally seeing issues with the teaming daemon not starting after a reboot on centos 7 VMs. Here is an example (from /var/log/messages.minor):
Oct  6 23:36:21 ****** ovs-ctl[623]: Starting ovsdb-server [  OK  ]
Oct  6 23:36:22 ****** ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=7.6.2
Oct  6 23:36:22 ****** ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . ovs-version=2.3.1 "external-ids:system-id=\"47ff9309-5609-47e0-819c-b9055b25edbb\"" "system-type=\"CentOS\"" "system-version=\"7.1.1503-Core\""
Oct  6 23:36:22 ****** ovs-ctl[623]: Configuring Open vSwitch system IDs [  OK  ]
Oct  6 23:36:22 ****** network[733]: Bringing up loopback interface:  [  OK  ]
Oct  6 23:36:22 ****** kernel: [    6.158533] gre: GRE over IPv4 demultiplexor driver
Oct  6 23:36:22 ****** systemd[1]: Starting system-teamd.slice.
Oct  6 23:36:22 ****** systemd[1]: Created slice system-teamd.slice.
Oct  6 23:36:22 ****** systemd[1]: Starting Team Daemon for device bond0...
Oct  6 23:36:22 ****** kernel: [    6.199635] openvswitch: Open vSwitch switching datapath
Oct  6 23:36:22 ****** ovs-ctl[623]: Inserting openvswitch module [  OK  ]
Oct  6 23:36:22 ****** kernel: [    6.338577] device ovs-system entered promiscuous mode
Oct  6 23:36:22 ****** kernel: [    6.340086] openvswitch: netlink: Unknown key attribute (type=62, max=21).
Oct  6 23:36:22 ****** kernel: [    6.385293] device br-ex entered promiscuous mode
Oct  6 23:36:22 ****** kernel: [    6.426511] device br-int entered promiscuous mode
Oct  6 23:36:22 ****** teamd[857]: Failed to get interface information list.
Oct  6 23:36:22 ****** teamd[857]: Failed to init interface information list.
Oct  6 23:36:22 ****** teamd[857]: Team init failed.
Oct  6 23:36:22 ****** teamd[857]: teamd_init() failed.
Oct  6 23:36:22 ****** teamd[857]: Failed: Invalid argument
Oct  6 23:36:22 ****** systemd[1]: teamd: main process exited, code=exited, status=1/FAILURE
Oct  6 23:36:22 ****** network[733]: Bringing up interface bond0:  Job for teamd failed. See 'systemctl status teamd' and 'journalctl -xn' for details.
Oct  6 23:36:22 ****** kernel: [    6.433515] device br-tun entered promiscuous mode
Oct  6 23:36:22 ****** systemd[1]: Unit teamd entered failed state.
Oct  6 23:36:22 ****** ovs-ctl[623]: Starting ovs-vswitchd [  OK  ]
Oct  6 23:36:22 ****** network[733]: [FAILED]
Oct  6 23:36:22 ****** ovs-ctl[623]: Enabling remote OVSDB managers [  OK  ] 

Version-Release number of selected component (if applicable):
teamd-1.15-1.el7.centos.x86_64
libteam-1.15-1.el7.centos.x86_64


How reproducible:
Only happens occasionally, not reproducible on demand


Steps to Reproduce:
1. reboot a VM
2. after reboot teamd fails to start with error "Failed to get interface information list."

Actual results:


Expected results:


Additional info:
Investigation has showed that teamd is failing because libteam code in ifinfo.c is not handling error NLE_DUMP_INTR returned by nl_recvmsgs (part of libnl3)

Comment 2 Jiri Benc 2015-10-20 10:05:58 UTC

Adding upstream maintainer to CC.

Comment 3 Xin Long 2015-10-20 11:53:27 UTC

Hi, can you offer the starting commands of VM and the network config file in guest?

Comment 4 Jiri Pirko 2015-12-25 16:13:12 UTC

This is fixed by:

https://github.com/jpirko/libteam/commit/8e44b17159522e6afecd64a507cdfae3ed341257

Comment 5 Xin Long 2015-12-26 09:06:40 UTC

(In reply to Jiri Pirko from comment #4)
> This is fixed by:
> 
> https://github.com/jpirko/libteam/commit/
> 8e44b17159522e6afecd64a507cdfae3ed341257

ok, thanks, Jiri.

Comment 6 Marcelo Ricardo Leitner 2016-01-20 15:35:09 UTC

Fix is prepared for 7.3.
Flagging 7.2.z as there is no workaround for this issue.

Comment 7 Marcelo Ricardo Leitner 2016-03-11 19:16:36 UTC

Oups, should be Modified really, as libteam is updated to 1.23 which contains that commit.

Comment 9 Amit Supugade 2016-06-28 15:29:24 UTC

Verified on-
libteam-1.23-1.el7.x86_64
teamd-1.23-1.el7.x86_64

Ran test multiple times.
LOG-

:: [   PASS   ] :: Command 'virsh reboot vm1' (Expected 0, got 0)
:: [   LOG    ] :: Duration: 2m 54s
:: [   LOG    ] :: Assertions: 12 good, 0 bad
:: [   PASS   ] :: RESULT: Start VM and upgrade kernel

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [   LOG    ] :: Output of 'vmsh run_cmd vm1 'ip a | grep team0:'':
:: [   LOG    ] :: --------------- OUTPUT START ---------------
:: [   LOG    ] :: spawn virsh console vm1
:: [   LOG    ] :: 
:: [   LOG    ] :: Connected to domain vm1
:: [   LOG    ] :: 
:: [   LOG    ] :: Escape character is ^]
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: Red Hat Enterprise Linux Server 7.2 Beta (Maipo)
:: [   LOG    ] :: 
:: [   LOG    ] :: Kernel 3.10.0-451.el7.x86_64 on an x86_64
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: localhost login: root
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: Password: 
:: [   LOG    ] :: 
:: [   LOG    ] :: Last login: Tue Jun 28 10:02:56 on ttyS0
:: [   LOG    ] :: 
:: [   LOG    ] :: [root@localhost ~]# ip a | grep team0:
:: [   LOG    ] :: 
:: [   LOG    ] :: 4: [01;31m[Kteam0:[m[K <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
:: [   LOG    ] :: 
:: [   LOG    ] :: [root@localhost ~]# echo $?
:: [   LOG    ] :: 
:: [   LOG    ] :: 0
:: [   LOG    ] :: 
:: [   LOG    ] :: [root@localhost ~]# logout
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: Red Hat Enterprise Linux Server 7.2 Beta (Maipo)
:: [   LOG    ] :: 
:: [   LOG    ] :: Kernel 3.10.0-451.el7.x86_64 on an x86_64
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: localhost login: 
:: [   LOG    ] :: 
:: [   LOG    ] :: ---------------  OUTPUT END  ---------------
:: [   PASS   ] :: Command 'vmsh run_cmd vm1 'ip a | grep team0:'' (Expected 0, got 0)
:: [   PASS   ] :: There should not be an error and Team should initialise without errors (Assert: '0' should equal '0')
:: [   LOG    ] :: Output of 'ping -c 5 192.168.1.22':
:: [   LOG    ] :: --------------- OUTPUT START ---------------
:: [   LOG    ] :: PING 192.168.1.22 (192.168.1.22) 56(84) bytes of data.
:: [   LOG    ] :: 64 bytes from 192.168.1.22: icmp_seq=1 ttl=64 time=0.328 ms
:: [   LOG    ] :: 64 bytes from 192.168.1.22: icmp_seq=2 ttl=64 time=0.118 ms
:: [   LOG    ] :: 64 bytes from 192.168.1.22: icmp_seq=3 ttl=64 time=0.192 ms
:: [   LOG    ] :: 64 bytes from 192.168.1.22: icmp_seq=4 ttl=64 time=0.118 ms
:: [   LOG    ] :: 64 bytes from 192.168.1.22: icmp_seq=5 ttl=64 time=0.112 ms
:: [   LOG    ] :: 
:: [   LOG    ] :: --- 192.168.1.22 ping statistics ---
:: [   LOG    ] :: 5 packets transmitted, 5 received, 0% packet loss, time 3999ms
:: [   LOG    ] :: rtt min/avg/max/mdev = 0.112/0.173/0.328/0.083 ms
:: [   LOG    ] :: ---------------  OUTPUT END  ---------------
:: [   PASS   ] :: Command 'ping -c 5 192.168.1.22' (Expected 0, got 0)
:: [   LOG    ] :: Duration: 13s
:: [   LOG    ] :: Assertions: 3 good, 0 bad
:: [   PASS   ] :: RESULT: Test

Comment 13 errata-xmlrpc 2016-11-04 01:00:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2219.html

Note You need to log in before you can comment on or make changes to this bug.