Bug 1273052 - teamd fails to start after reboot
teamd fails to start after reboot
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libteam (Show other bugs)
7.1
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Marcelo Ricardo Leitner
Amit Supugade
:
Depends On:
Blocks: 1301628 1313485
  Show dependency treegraph
 
Reported: 2015-10-19 09:09 EDT by ctcard
Modified: 2016-11-03 21:00 EDT (History)
7 users (show)

See Also:
Fixed In Version: libteam-1.25-4.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-03 21:00:31 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
lxin: needinfo-


Attachments (Terms of Use)

  None (edit)
Description ctcard 2015-10-19 09:09:26 EDT
Description of problem:
We are seeing occasionally seeing issues with the teaming daemon not starting after a reboot on centos 7 VMs. Here is an example (from /var/log/messages.minor):
Oct  6 23:36:21 ****** ovs-ctl[623]: Starting ovsdb-server [  OK  ]
Oct  6 23:36:22 ****** ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=7.6.2
Oct  6 23:36:22 ****** ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . ovs-version=2.3.1 "external-ids:system-id=\"47ff9309-5609-47e0-819c-b9055b25edbb\"" "system-type=\"CentOS\"" "system-version=\"7.1.1503-Core\""
Oct  6 23:36:22 ****** ovs-ctl[623]: Configuring Open vSwitch system IDs [  OK  ]
Oct  6 23:36:22 ****** network[733]: Bringing up loopback interface:  [  OK  ]
Oct  6 23:36:22 ****** kernel: [    6.158533] gre: GRE over IPv4 demultiplexor driver
Oct  6 23:36:22 ****** systemd[1]: Starting system-teamd.slice.
Oct  6 23:36:22 ****** systemd[1]: Created slice system-teamd.slice.
Oct  6 23:36:22 ****** systemd[1]: Starting Team Daemon for device bond0...
Oct  6 23:36:22 ****** kernel: [    6.199635] openvswitch: Open vSwitch switching datapath
Oct  6 23:36:22 ****** ovs-ctl[623]: Inserting openvswitch module [  OK  ]
Oct  6 23:36:22 ****** kernel: [    6.338577] device ovs-system entered promiscuous mode
Oct  6 23:36:22 ****** kernel: [    6.340086] openvswitch: netlink: Unknown key attribute (type=62, max=21).
Oct  6 23:36:22 ****** kernel: [    6.385293] device br-ex entered promiscuous mode
Oct  6 23:36:22 ****** kernel: [    6.426511] device br-int entered promiscuous mode
Oct  6 23:36:22 ****** teamd[857]: Failed to get interface information list.
Oct  6 23:36:22 ****** teamd[857]: Failed to init interface information list.
Oct  6 23:36:22 ****** teamd[857]: Team init failed.
Oct  6 23:36:22 ****** teamd[857]: teamd_init() failed.
Oct  6 23:36:22 ****** teamd[857]: Failed: Invalid argument
Oct  6 23:36:22 ****** systemd[1]: teamd@bond0.service: main process exited, code=exited, status=1/FAILURE
Oct  6 23:36:22 ****** network[733]: Bringing up interface bond0:  Job for teamd@bond0.service failed. See 'systemctl status teamd@bond0.service' and 'journalctl -xn' for details.
Oct  6 23:36:22 ****** kernel: [    6.433515] device br-tun entered promiscuous mode
Oct  6 23:36:22 ****** systemd[1]: Unit teamd@bond0.service entered failed state.
Oct  6 23:36:22 ****** ovs-ctl[623]: Starting ovs-vswitchd [  OK  ]
Oct  6 23:36:22 ****** network[733]: [FAILED]
Oct  6 23:36:22 ****** ovs-ctl[623]: Enabling remote OVSDB managers [  OK  ] 

Version-Release number of selected component (if applicable):
teamd-1.15-1.el7.centos.x86_64
libteam-1.15-1.el7.centos.x86_64


How reproducible:
Only happens occasionally, not reproducible on demand


Steps to Reproduce:
1. reboot a VM
2. after reboot teamd fails to start with error "Failed to get interface information list."

Actual results:


Expected results:


Additional info:
Investigation has showed that teamd is failing because libteam code in ifinfo.c is not handling error NLE_DUMP_INTR returned by nl_recvmsgs (part of libnl3)
Comment 2 Jiri Benc 2015-10-20 06:05:58 EDT
Adding upstream maintainer to CC.
Comment 3 Xin Long 2015-10-20 07:53:27 EDT
Hi, can you offer the starting commands of VM and the network config file in guest?
Comment 5 Xin Long 2015-12-26 04:06:40 EST
(In reply to Jiri Pirko from comment #4)
> This is fixed by:
> 
> https://github.com/jpirko/libteam/commit/
> 8e44b17159522e6afecd64a507cdfae3ed341257

ok, thanks, Jiri.
Comment 6 Marcelo Ricardo Leitner 2016-01-20 10:35:09 EST
Fix is prepared for 7.3.
Flagging 7.2.z as there is no workaround for this issue.
Comment 7 Marcelo Ricardo Leitner 2016-03-11 14:16:36 EST
Oups, should be Modified really, as libteam is updated to 1.23 which contains that commit.
Comment 9 Amit Supugade 2016-06-28 11:29:24 EDT
Verified on-
libteam-1.23-1.el7.x86_64
teamd-1.23-1.el7.x86_64

Ran test multiple times.
LOG-

:: [   PASS   ] :: Command 'virsh reboot vm1' (Expected 0, got 0)
:: [   LOG    ] :: Duration: 2m 54s
:: [   LOG    ] :: Assertions: 12 good, 0 bad
:: [   PASS   ] :: RESULT: Start VM and upgrade kernel

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [   LOG    ] :: Output of 'vmsh run_cmd vm1 'ip a | grep team0:'':
:: [   LOG    ] :: --------------- OUTPUT START ---------------
:: [   LOG    ] :: spawn virsh console vm1
:: [   LOG    ] :: 
:: [   LOG    ] :: Connected to domain vm1
:: [   LOG    ] :: 
:: [   LOG    ] :: Escape character is ^]
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: Red Hat Enterprise Linux Server 7.2 Beta (Maipo)
:: [   LOG    ] :: 
:: [   LOG    ] :: Kernel 3.10.0-451.el7.x86_64 on an x86_64
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: localhost login: root
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: Password: 
:: [   LOG    ] :: 
:: [   LOG    ] :: Last login: Tue Jun 28 10:02:56 on ttyS0
:: [   LOG    ] :: 
:: [   LOG    ] :: [root@localhost ~]# ip a | grep team0:
:: [   LOG    ] :: 
:: [   LOG    ] :: 4: [01;31m[Kteam0:[m[K <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
:: [   LOG    ] :: 
:: [   LOG    ] :: [root@localhost ~]# echo $?
:: [   LOG    ] :: 
:: [   LOG    ] :: 0
:: [   LOG    ] :: 
:: [   LOG    ] :: [root@localhost ~]# logout
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: Red Hat Enterprise Linux Server 7.2 Beta (Maipo)
:: [   LOG    ] :: 
:: [   LOG    ] :: Kernel 3.10.0-451.el7.x86_64 on an x86_64
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: 
:: [   LOG    ] :: localhost login: 
:: [   LOG    ] :: 
:: [   LOG    ] :: ---------------  OUTPUT END  ---------------
:: [   PASS   ] :: Command 'vmsh run_cmd vm1 'ip a | grep team0:'' (Expected 0, got 0)
:: [   PASS   ] :: There should not be an error and Team should initialise without errors (Assert: '0' should equal '0')
:: [   LOG    ] :: Output of 'ping -c 5 192.168.1.22':
:: [   LOG    ] :: --------------- OUTPUT START ---------------
:: [   LOG    ] :: PING 192.168.1.22 (192.168.1.22) 56(84) bytes of data.
:: [   LOG    ] :: 64 bytes from 192.168.1.22: icmp_seq=1 ttl=64 time=0.328 ms
:: [   LOG    ] :: 64 bytes from 192.168.1.22: icmp_seq=2 ttl=64 time=0.118 ms
:: [   LOG    ] :: 64 bytes from 192.168.1.22: icmp_seq=3 ttl=64 time=0.192 ms
:: [   LOG    ] :: 64 bytes from 192.168.1.22: icmp_seq=4 ttl=64 time=0.118 ms
:: [   LOG    ] :: 64 bytes from 192.168.1.22: icmp_seq=5 ttl=64 time=0.112 ms
:: [   LOG    ] :: 
:: [   LOG    ] :: --- 192.168.1.22 ping statistics ---
:: [   LOG    ] :: 5 packets transmitted, 5 received, 0% packet loss, time 3999ms
:: [   LOG    ] :: rtt min/avg/max/mdev = 0.112/0.173/0.328/0.083 ms
:: [   LOG    ] :: ---------------  OUTPUT END  ---------------
:: [   PASS   ] :: Command 'ping -c 5 192.168.1.22' (Expected 0, got 0)
:: [   LOG    ] :: Duration: 13s
:: [   LOG    ] :: Assertions: 3 good, 0 bad
:: [   PASS   ] :: RESULT: Test
Comment 13 errata-xmlrpc 2016-11-03 21:00:31 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2219.html

Note You need to log in before you can comment on or make changes to this bug.