Bug 674051 - RHELS5.6 x64 bonding kernel BUG causes server reboot
Summary: RHELS5.6 x64 bonding kernel BUG causes server reboot
Keywords:
Status: CLOSED DUPLICATE of bug 671595
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.6
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Andy Gospodarek
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-31 13:22 UTC by Andre ten Bohmer
Modified: 2014-06-29 23:03 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-02-22 16:01:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andre ten Bohmer 2011-01-31 13:22:44 UTC
Description of problem:
Due to configuration changes on HP Virtual Connect, sometimes the NIC links drops for a few seconds. This is not desirable but RHELS 6.0 systems in this Blade enclosure come back on-line without problems. This specific RH 5.6 system crashed with a kernel BUG and was rebooted by HP ASR because it hang after the crash.

Version-Release number of selected component (if applicable):


How reproducible:
Hopefully not because last time I 'only' had to restart the networking service via the ILO console to enable networking again. This is the first time it actually crashed the system.

Expected results:
Detecting links down and up again without loosing network connectivity in the end or a complete system failure.

Additional info:
Manufacturer: HP
Product Name: ProLiant BL460c G7
SKU Number: 603718-B21      
Family: ProLiant

]# /var/log/messages
Jan 31 13:49:00 scomp1101 kernel: bonding: bond0: link status definitely up for interface eth1.
Jan 31 13:49:00 scomp1101 kernel: ----------- [cut here ] --------- [please bite here ] ---------
Jan 31 13:49:00 scomp1101 kernel: Kernel BUG at drivers/net/bonding/bonding.h:135

]# cat /etc/modprobe.conf
alias eth0 be2net
alias eth1 be2net
alias eth2 be2net
alias eth3 be2net
alias eth4 be2net
alias eth5 be2net
alias eth6 be2net
alias eth7 be2net
alias bond0 bonding
options bond0 miimon=100 mode=active-backup primary=eth0

#/ etc/rc.local
# Increasing The Transmit Queue Length from 1000 to 10000
for iFace in `ifconfig | grep eth | cut -f 1 -d" "` ; do ifconfig $iFace txqueuelen 10000 ; done
unset iFace

# ifconfig  eth0
eth0      Link encap:Ethernet  HWaddr D4:85:64:57:0B:08  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:2378 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1363 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:10000 
          RX bytes:210611 (205.6 KiB)  TX bytes:516464 (504.3 KiB)

]# ethtool eth0
Settings for eth0:
	Supported ports: [ TP ]
	Supported link modes:   1000baseT/Full 
                               10000baseT/Full 
	Supports auto-negotiation: Yes
	Advertised link modes:  1000baseT/Full 
	                        10000baseT/Full 
	Advertised auto-negotiation: No
	Speed: 5000Mb/s
	Duplex: Full
	Port: Twisted Pair
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: g
	Wake-on: d
	Link detected: yes

]# lsb_release -a
LSB Version:	:core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
Distributor ID:	RedHatEnterpriseServer
Description:	Red Hat Enterprise Linux Server release 5.6 (Tikanga)
Release:	5.6
Codename:	Tikanga

]# uname -a
Linux scomp1101.wurnet.nl 2.6.18-238.1.1.el5 #1 SMP Tue Jan 4 13:32:19 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

]# modinfo bonding
filename:       /lib/modules/2.6.18-238.1.1.el5/kernel/drivers/net/bonding/bonding.ko
author:         Thomas Davis, tadavis and many others
description:    Ethernet Channel Bonding Driver, v3.4.0-1
version:        3.4.0-1
license:        GPL
srcversion:     956FDE3FEBDD81E105B7727
depends:        ipv6
vermagic:       2.6.18-238.1.1.el5 SMP mod_unload gcc-4.1

]# modinfo  be2net
filename:       /lib/modules/2.6.18-238.1.1.el5/kernel/drivers/net/benet/be2net.ko
license:        GPL
author:         ServerEngines Corporation
description:    ServerEngines BladeEngine 10Gbps NIC Driver 2.102.518r
version:        2.102.518r
srcversion:     76890C397EB8D93CCC6B539

]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.4.0-1 (October 7, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: eth0 (primary_reselect always)
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d4:85:64:57:0b:08

Slave Interface: eth1
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d4:85:64:57:0b:0c

Slave Interface: eth2
MII Status: down
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d4:85:64:57:0b:09

interfaces eth2...eth7 are all down as configured .

Comment 1 Andre ten Bohmer 2011-02-01 09:23:21 UTC
Correction:
Link drops are not for a few seconds but for 8 to 10 minutes all links are down.

Extra info:
This is our first  ProLiant BL460c G7 blade and instead of the Broadcom chipset (ProLiant BL460c G6 series Ethernet controller: Broadcom Corporation NetXtreme II BCM57711E 10-Gigabit PCIe) it has an emulex NIC chipset eg Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01)

Comment 3 Andy Gospodarek 2011-02-22 16:01:17 UTC
This is likely a duplicate of bug 671595.  That bug will be fixed in RHEL5.7 and in RHEL5.6 errata kernel version 2.6.18-238.4.1.el5.

Please test that kernel and reopen if it does not resolve the issue.

Thanks!

*** This bug has been marked as a duplicate of bug 671595 ***


Note You need to log in before you can comment on or make changes to this bug.