Bug 674051

Summary: RHELS5.6 x64 bonding kernel BUG causes server reboot
Product: Red Hat Enterprise Linux 5 Reporter: Andre ten Bohmer <andre.tenbohmer>
Component: kernelAssignee: Andy Gospodarek <agospoda>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.6CC: ivecera, jarod, peterm, syamazak
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-22 16:01:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Andre ten Bohmer 2011-01-31 13:22:44 UTC
Description of problem:
Due to configuration changes on HP Virtual Connect, sometimes the NIC links drops for a few seconds. This is not desirable but RHELS 6.0 systems in this Blade enclosure come back on-line without problems. This specific RH 5.6 system crashed with a kernel BUG and was rebooted by HP ASR because it hang after the crash.

Version-Release number of selected component (if applicable):


How reproducible:
Hopefully not because last time I 'only' had to restart the networking service via the ILO console to enable networking again. This is the first time it actually crashed the system.

Expected results:
Detecting links down and up again without loosing network connectivity in the end or a complete system failure.

Additional info:
Manufacturer: HP
Product Name: ProLiant BL460c G7
SKU Number: 603718-B21      
Family: ProLiant

]# /var/log/messages
Jan 31 13:49:00 scomp1101 kernel: bonding: bond0: link status definitely up for interface eth1.
Jan 31 13:49:00 scomp1101 kernel: ----------- [cut here ] --------- [please bite here ] ---------
Jan 31 13:49:00 scomp1101 kernel: Kernel BUG at drivers/net/bonding/bonding.h:135

]# cat /etc/modprobe.conf
alias eth0 be2net
alias eth1 be2net
alias eth2 be2net
alias eth3 be2net
alias eth4 be2net
alias eth5 be2net
alias eth6 be2net
alias eth7 be2net
alias bond0 bonding
options bond0 miimon=100 mode=active-backup primary=eth0

#/ etc/rc.local
# Increasing The Transmit Queue Length from 1000 to 10000
for iFace in `ifconfig | grep eth | cut -f 1 -d" "` ; do ifconfig $iFace txqueuelen 10000 ; done
unset iFace

# ifconfig  eth0
eth0      Link encap:Ethernet  HWaddr D4:85:64:57:0B:08  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:2378 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1363 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:10000 
          RX bytes:210611 (205.6 KiB)  TX bytes:516464 (504.3 KiB)

]# ethtool eth0
Settings for eth0:
	Supported ports: [ TP ]
	Supported link modes:   1000baseT/Full 
                               10000baseT/Full 
	Supports auto-negotiation: Yes
	Advertised link modes:  1000baseT/Full 
	                        10000baseT/Full 
	Advertised auto-negotiation: No
	Speed: 5000Mb/s
	Duplex: Full
	Port: Twisted Pair
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: g
	Wake-on: d
	Link detected: yes

]# lsb_release -a
LSB Version:	:core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
Distributor ID:	RedHatEnterpriseServer
Description:	Red Hat Enterprise Linux Server release 5.6 (Tikanga)
Release:	5.6
Codename:	Tikanga

]# uname -a
Linux scomp1101.wurnet.nl 2.6.18-238.1.1.el5 #1 SMP Tue Jan 4 13:32:19 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

]# modinfo bonding
filename:       /lib/modules/2.6.18-238.1.1.el5/kernel/drivers/net/bonding/bonding.ko
author:         Thomas Davis, tadavis and many others
description:    Ethernet Channel Bonding Driver, v3.4.0-1
version:        3.4.0-1
license:        GPL
srcversion:     956FDE3FEBDD81E105B7727
depends:        ipv6
vermagic:       2.6.18-238.1.1.el5 SMP mod_unload gcc-4.1

]# modinfo  be2net
filename:       /lib/modules/2.6.18-238.1.1.el5/kernel/drivers/net/benet/be2net.ko
license:        GPL
author:         ServerEngines Corporation
description:    ServerEngines BladeEngine 10Gbps NIC Driver 2.102.518r
version:        2.102.518r
srcversion:     76890C397EB8D93CCC6B539

]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.4.0-1 (October 7, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: eth0 (primary_reselect always)
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d4:85:64:57:0b:08

Slave Interface: eth1
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d4:85:64:57:0b:0c

Slave Interface: eth2
MII Status: down
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d4:85:64:57:0b:09

interfaces eth2...eth7 are all down as configured .

Comment 1 Andre ten Bohmer 2011-02-01 09:23:21 UTC
Correction:
Link drops are not for a few seconds but for 8 to 10 minutes all links are down.

Extra info:
This is our first  ProLiant BL460c G7 blade and instead of the Broadcom chipset (ProLiant BL460c G6 series Ethernet controller: Broadcom Corporation NetXtreme II BCM57711E 10-Gigabit PCIe) it has an emulex NIC chipset eg Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01)

Comment 3 Andy Gospodarek 2011-02-22 16:01:17 UTC
This is likely a duplicate of bug 671595.  That bug will be fixed in RHEL5.7 and in RHEL5.6 errata kernel version 2.6.18-238.4.1.el5.

Please test that kernel and reopen if it does not resolve the issue.

Thanks!

*** This bug has been marked as a duplicate of bug 671595 ***