Bug 494382 - tg3: Link is Down after Network Stress Testing
tg3: Link is Down after Network Stress Testing
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
All Linux
medium Severity high
: rc
: ---
Assigned To: John Feeney
Red Hat Kernel QE team
http://rhts.redhat.com/cgi-bin/rhts/t...
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-06 12:47 EDT by CAI Qian
Modified: 2010-05-14 17:25 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-05-14 17:25:58 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description CAI Qian 2009-04-06 12:47:44 EDT
Description of problem:
After creating ~4000 SSH connections to the server, the server's link seems down.

mptscsih: ioc0: bus reset: SUCCESS (sc=e7100300)
NETDEV WATCHDOG: eth0: transmit timed out
tg3: eth0: transmit timed out, resetting
tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000]
tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000]
tg3: eth0: Link is down.
mptscsih: ioc0: attempting task abort! (sc=e7100440)
sd 0:1:0:0: 
        command: Write(10): 2a 00 00 03 31 75 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=e7100440)
mptscsih: ioc0: attempting task abort! (sc=e7100580)
sd 0:1:0:0: 
        command: Write(10): 2a 00 08 3b 31 4d 00 00 18 00
mptscsih: ioc0: task abort: SUCCESS (sc=e7100580)
mptscsih: ioc0: attempting task abort! (sc=e71006c0)
sd 0:1:0:0: 
        command: Write(10): 2a 00 08 3b 31 75 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=e71006c0)
mptscsih: ioc0: attempting task abort! (sc=e7100800)
sd 0:1:0:0: 
        command: Write(10): 2a 00 08 3b 51 5d 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=e7100800)
mptscsih: ioc0: attempting task abort! (sc=e7100d00)
sd 0:1:0:0: 
        command: Write(10): 2a 00 08 3c f1 55 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=e7100d00)
mptscsih: ioc0: attempting task abort! (sc=e7100e40)
sd 0:1:0:0: 
        command: Write(10): 2a 00 13 0b 35 4d 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=e7100e40)
mptscsih: ioc0: attempting task abort! (sc=f79c16c0)
sd 0:1:0:0: 
        command: Write(10): 2a 00 13 0b 35 75 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=f79c16c0)
mptscsih: ioc0: attempting task abort! (sc=f79c1300)
sd 0:1:0:0: 
        command: Write(10): 2a 00 13 0b 3b 15 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=f79c1300)
mptscsih: ioc0: attempting task abort! (sc=f79c1bc0)
sd 0:1:0:0: 
        command: Write(10): 2a 00 19 df 31 5d 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=f79c1bc0)
mptscsih: ioc0: attempting task abort! (sc=f79c1080)
sd 0:1:0:0: 
        command: Write(10): 2a 00 19 df 31 6d 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=f79c1080)
mptscsih: ioc0: attempting task abort! (sc=f79c1440)
sd 0:1:0:0: 
        command: Write(10): 2a 00 1d af 31 65 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=f79c1440)
mptscsih: ioc0: attempting task abort! (sc=f79c1a80)
sd 0:1:0:0: 
        command: Write(10): 2a 00 1d af 32 d5 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=f79c1a80)
mptscsih: ioc0: attempting task abort! (sc=f79c1d00)
sd 0:1:0:0: 
        command: Write(10): 2a 00 20 eb 31 5d 00 00 10 00
mptscsih: ioc0: task abort: SUCCESS (sc=f79c1d00)
mptscsih: ioc0: attempting task abort! (sc=f79c1940)
sd 0:1:0:0: 
        command: Write(10): 2a 00 20 eb 32 95 00 00 08 00
mptscsih: ioc0: task abort: SUCCESS (sc=f79c1940)
mptscsih: ioc0: attempting target reset! (sc=e7100a80)
sd 0:1:0:0: 
        command: Write(10): 2a 00 1e 77 04 55 00 00 08 00
mptscsih: ioc0: target reset: SUCCESS (sc=e7100a80)
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is off for TX and off for RX.


Version-Release number of selected component (if applicable):
kernel-2.6.18-128.el5

How reproducible:
unknown

Steps to Reproduce:
1. setup SSH server on dell-per300-01.rhts.bos.redhat.com i386
2. setup SSH key for both the client and server.
3. create ~4000 SSH connections from the client to server for a few hours.
   ssh -4 -f -N dell-per300-01.rhts.bos.redhat.com

Actual results:
tg3: eth0: Link is down.

Expected results:
No link down.

Additional info:
Some information for the server,
********** System Information **********
Hostname                = dell-per300-01.rhts.bos.redhat.com
Kernel Version          = 2.6.18-128.el5PAE
Machine Hardware Name   = i686
Processor Type          = i686
uname -a output         = Linux dell-per300-01.rhts.bos.redhat.com 2.6.18-128.el5PAE #1 SMP Wed Dec 17 12:02:33 EST 2008 i686 i686 i386 GNU/Linux
Swap Size               = 5279 MB
Mem Size                = 4047 MB
Number of Processors    = 4
System Release          = Red Hat Enterprise Linux Server release 5.3 (Tikanga)
Command Line            = ro root=/dev/VolGroup00/LogVol00 console=ttyS1,57600
System NMI Interrupts   = NMI:          0          0          0          0 
********** LSPCI **********
00:00.0 Host bridge: Intel Corporation 5100 Chipset Memory Controller Hub (rev 90)
00:02.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x8 Port 2-3 (rev 90)
00:03.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x4 Port 3 (rev 90)
00:04.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x8 Port 4-5 (rev 90)
00:05.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x4 Port 5 (rev 90)
00:06.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x8 Port 6-7 (rev 90)
00:07.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x4 Port 7 (rev 90)
00:10.0 Host bridge: Intel Corporation 5100 Chipset FSB Registers (rev 90)
00:10.1 Host bridge: Intel Corporation 5100 Chipset FSB Registers (rev 90)
00:10.2 Host bridge: Intel Corporation 5100 Chipset FSB Registers (rev 90)
00:11.0 Host bridge: Intel Corporation 5100 Chipset Reserved Registers (rev 90)
00:13.0 Host bridge: Intel Corporation 5100 Chipset Reserved Registers (rev 90)
00:15.0 Host bridge: Intel Corporation 5100 Chipset DDR Channel 0 Registers (rev 90)
00:16.0 Host bridge: Intel Corporation 5100 Chipset DDR Channel 1 Registers (rev 90)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02)
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5722 Gigabit Ethernet PCI Express
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5722 Gigabit Ethernet PCI Express
05:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08)
0a:07.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
********** Modprob **********
alias eth0 tg3
alias eth1 tg3
alias scsi_hostadapter mptbase
alias scsi_hostadapter1 mptsas
********** Module Information **********
Checking module information autofs4:
Checking module information hidp:
Bluetooth HIDP ver 1.1
1.1
Checking module information rfcomm:
Bluetooth RFCOMM ver 1.8
1.8
Checking module information l2cap:
Bluetooth L2CAP ver 2.8
2.8
Checking module information bluetooth:
Bluetooth Core ver 2.10
2.10
Checking module information sunrpc:
Checking module information ipv6:
IPv6 protocol stack for Linux
Checking module information xfrm_nalgo:
Checking module information crypto_api:
Cryptographic API (backported)
Checking module information cpufreq_ondemand:
'cpufreq_ondemand' - A dynamic cpufreq governor for Low Latency Frequency Transition capable processors
Checking module information acpi_cpufreq:
ACPI Processor P-States Driver
Checking module information dm_multipath:
device-mapper multipath target
Checking module information scsi_dh:
SCSI device handler
Checking module information video:
ACPI Video Driver
Checking module information hwmon:
hardware monitoring sysfs/class support
Checking module information backlight:
Backlight Lowlevel Control Abstraction
Checking module information sbs:
Smart Battery System ACPI interface driver
Checking module information i2c_ec:
ACPI EC SMBus driver
Checking module information i2c_core:
I2C-Bus main module
Checking module information button:
ACPI Button Driver
Checking module information battery:
ACPI Battery Driver
Checking module information asus_acpi:
Asus Laptop ACPI Extras Driver
Checking module information ac:
ACPI AC Adapter Driver
Checking module information parport_pc:
PC-style parallel port driver
Checking module information lp:
Checking module information parport:
Checking module information tg3:
Broadcom Tigon3 ethernet driver
3.93
Checking module information sg:
SCSI generic (sg) driver
3.5.34
Checking module information libphy:
PHY library
Checking module information serio_raw:
Raw serio driver
Checking module information pcspkr:
PC Speaker beeper driver
Checking module information dm_raid45:
device-mapper raid4/5 target
Checking module information dm_message:
device-mapper device-mapper target message parser
Checking module information dm_region_hash:
device-mapper region hash
Checking module information dm_mem_cache:
device-mapper dm memory cache
Checking module information dm_snapshot:
device-mapper snapshot target
Checking module information dm_zero:
device-mapper dummy target returning zeros
Checking module information dm_mirror:
device-mapper mirror target
Checking module information dm_log:
device-mapper dirty region log
Checking module information dm_mod:
device-mapper driver
Checking module information mptsas:
Fusion MPT SAS Host driver
3.04.07
Checking module information mptscsih:
Fusion MPT SCSI Host driver
3.04.07
Checking module information mptbase:
Fusion MPT base driver
3.04.07
Checking module information scsi_transport_sas:
SAS Transphy Attributes
Checking module information sd_mod:
SCSI disk (sd) driver
Checking module information scsi_mod:
SCSI core
Checking module information ext3:
Second Extended Filesystem with journaling extensions
Checking module information jbd:
Checking module information uhci_hcd:
USB Universal Host Controller Interface driver
Checking module information ohci_hcd:
2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver
Checking module information ehci_hcd:
10 Dec 2004 USB 2.0 'Enhanced' Host Controller (EHCI) Driver
********** SELinux Status **********
SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   enforcing
Mode from config file:          enforcing
Policy version:                 21
Policy from config file:        targeted
********** SELinux Module list **********
amavis	1.1.0
ccs	1.0.0
clamav	1.1.0
dcc	1.1.0
dnsmasq	1.1.1
evolution	1.1.0
ipsec	1.4.0
iscsid	1.0.0
mozilla	1.1.0
mplayer	1.1.0
nagios	1.1.0
oddjob	1.0.1
pcscd	1.0.0
pki	1.0.0
prelude	1.0.0
pyzor	1.1.0
razor	1.1.0
ricci	1.0.0
smartmon	1.1.0
testPolicy	1.0.0
virt	1.0.0
zosremote	1.0.0
******** End System Information ********
Comment 1 John Feeney 2009-08-04 10:36:30 EDT
I was wondering if it was possible to test a kernel on my people page that may fix this problem. See people.redhat.com/jfeeney/.bz511918 

The patch has been posted and SHOULD make the next kernel build but more testing would be really good.

Thank you so much if this is possible.
  John
Comment 3 Lijian Xu 2010-05-06 21:41:05 EDT
I've tested this instead of CAI Qian, with the lastest RHEL 5.5, kernel 2.6.18-194.el5. The test lasted for 20 hours with 4000 ssh connections and found no problem, no link-down messages. I think this tg3 issue has already been fixed.

Thanks,
Lijian Xu
Comment 4 John Feeney 2010-05-10 15:27:55 EDT
Lijian,

Thanks for the update. So I guess I will close this as CURRENT RELEASE then.
  John

Note You need to log in before you can comment on or make changes to this bug.