Description of problem: After creating ~4000 SSH connections to the server, the server's link seems down. mptscsih: ioc0: bus reset: SUCCESS (sc=e7100300) NETDEV WATCHDOG: eth0: transmit timed out tg3: eth0: transmit timed out, resetting tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] tg3: eth0: Link is down. mptscsih: ioc0: attempting task abort! (sc=e7100440) sd 0:1:0:0: command: Write(10): 2a 00 00 03 31 75 00 00 08 00 mptscsih: ioc0: task abort: SUCCESS (sc=e7100440) mptscsih: ioc0: attempting task abort! (sc=e7100580) sd 0:1:0:0: command: Write(10): 2a 00 08 3b 31 4d 00 00 18 00 mptscsih: ioc0: task abort: SUCCESS (sc=e7100580) mptscsih: ioc0: attempting task abort! (sc=e71006c0) sd 0:1:0:0: command: Write(10): 2a 00 08 3b 31 75 00 00 08 00 mptscsih: ioc0: task abort: SUCCESS (sc=e71006c0) mptscsih: ioc0: attempting task abort! (sc=e7100800) sd 0:1:0:0: command: Write(10): 2a 00 08 3b 51 5d 00 00 08 00 mptscsih: ioc0: task abort: SUCCESS (sc=e7100800) mptscsih: ioc0: attempting task abort! (sc=e7100d00) sd 0:1:0:0: command: Write(10): 2a 00 08 3c f1 55 00 00 08 00 mptscsih: ioc0: task abort: SUCCESS (sc=e7100d00) mptscsih: ioc0: attempting task abort! (sc=e7100e40) sd 0:1:0:0: command: Write(10): 2a 00 13 0b 35 4d 00 00 08 00 mptscsih: ioc0: task abort: SUCCESS (sc=e7100e40) mptscsih: ioc0: attempting task abort! (sc=f79c16c0) sd 0:1:0:0: command: Write(10): 2a 00 13 0b 35 75 00 00 08 00 mptscsih: ioc0: task abort: SUCCESS (sc=f79c16c0) mptscsih: ioc0: attempting task abort! (sc=f79c1300) sd 0:1:0:0: command: Write(10): 2a 00 13 0b 3b 15 00 00 08 00 mptscsih: ioc0: task abort: SUCCESS (sc=f79c1300) mptscsih: ioc0: attempting task abort! (sc=f79c1bc0) sd 0:1:0:0: command: Write(10): 2a 00 19 df 31 5d 00 00 08 00 mptscsih: ioc0: task abort: SUCCESS (sc=f79c1bc0) mptscsih: ioc0: attempting task abort! (sc=f79c1080) sd 0:1:0:0: command: Write(10): 2a 00 19 df 31 6d 00 00 08 00 mptscsih: ioc0: task abort: SUCCESS (sc=f79c1080) mptscsih: ioc0: attempting task abort! (sc=f79c1440) sd 0:1:0:0: command: Write(10): 2a 00 1d af 31 65 00 00 08 00 mptscsih: ioc0: task abort: SUCCESS (sc=f79c1440) mptscsih: ioc0: attempting task abort! (sc=f79c1a80) sd 0:1:0:0: command: Write(10): 2a 00 1d af 32 d5 00 00 08 00 mptscsih: ioc0: task abort: SUCCESS (sc=f79c1a80) mptscsih: ioc0: attempting task abort! (sc=f79c1d00) sd 0:1:0:0: command: Write(10): 2a 00 20 eb 31 5d 00 00 10 00 mptscsih: ioc0: task abort: SUCCESS (sc=f79c1d00) mptscsih: ioc0: attempting task abort! (sc=f79c1940) sd 0:1:0:0: command: Write(10): 2a 00 20 eb 32 95 00 00 08 00 mptscsih: ioc0: task abort: SUCCESS (sc=f79c1940) mptscsih: ioc0: attempting target reset! (sc=e7100a80) sd 0:1:0:0: command: Write(10): 2a 00 1e 77 04 55 00 00 08 00 mptscsih: ioc0: target reset: SUCCESS (sc=e7100a80) tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. Version-Release number of selected component (if applicable): kernel-2.6.18-128.el5 How reproducible: unknown Steps to Reproduce: 1. setup SSH server on dell-per300-01.rhts.bos.redhat.com i386 2. setup SSH key for both the client and server. 3. create ~4000 SSH connections from the client to server for a few hours. ssh -4 -f -N dell-per300-01.rhts.bos.redhat.com Actual results: tg3: eth0: Link is down. Expected results: No link down. Additional info: Some information for the server, ********** System Information ********** Hostname = dell-per300-01.rhts.bos.redhat.com Kernel Version = 2.6.18-128.el5PAE Machine Hardware Name = i686 Processor Type = i686 uname -a output = Linux dell-per300-01.rhts.bos.redhat.com 2.6.18-128.el5PAE #1 SMP Wed Dec 17 12:02:33 EST 2008 i686 i686 i386 GNU/Linux Swap Size = 5279 MB Mem Size = 4047 MB Number of Processors = 4 System Release = Red Hat Enterprise Linux Server release 5.3 (Tikanga) Command Line = ro root=/dev/VolGroup00/LogVol00 console=ttyS1,57600 System NMI Interrupts = NMI: 0 0 0 0 ********** LSPCI ********** 00:00.0 Host bridge: Intel Corporation 5100 Chipset Memory Controller Hub (rev 90) 00:02.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x8 Port 2-3 (rev 90) 00:03.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x4 Port 3 (rev 90) 00:04.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x8 Port 4-5 (rev 90) 00:05.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x4 Port 5 (rev 90) 00:06.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x8 Port 6-7 (rev 90) 00:07.0 PCI bridge: Intel Corporation 5100 Chipset PCI Express x4 Port 7 (rev 90) 00:10.0 Host bridge: Intel Corporation 5100 Chipset FSB Registers (rev 90) 00:10.1 Host bridge: Intel Corporation 5100 Chipset FSB Registers (rev 90) 00:10.2 Host bridge: Intel Corporation 5100 Chipset FSB Registers (rev 90) 00:11.0 Host bridge: Intel Corporation 5100 Chipset Reserved Registers (rev 90) 00:13.0 Host bridge: Intel Corporation 5100 Chipset Reserved Registers (rev 90) 00:15.0 Host bridge: Intel Corporation 5100 Chipset DDR Channel 0 Registers (rev 90) 00:16.0 Host bridge: Intel Corporation 5100 Chipset DDR Channel 1 Registers (rev 90) 00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02) 00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02) 00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 02) 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02) 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92) 00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02) 01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5722 Gigabit Ethernet PCI Express 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5722 Gigabit Ethernet PCI Express 05:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08) 0a:07.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) ********** Modprob ********** alias eth0 tg3 alias eth1 tg3 alias scsi_hostadapter mptbase alias scsi_hostadapter1 mptsas ********** Module Information ********** Checking module information autofs4: Checking module information hidp: Bluetooth HIDP ver 1.1 1.1 Checking module information rfcomm: Bluetooth RFCOMM ver 1.8 1.8 Checking module information l2cap: Bluetooth L2CAP ver 2.8 2.8 Checking module information bluetooth: Bluetooth Core ver 2.10 2.10 Checking module information sunrpc: Checking module information ipv6: IPv6 protocol stack for Linux Checking module information xfrm_nalgo: Checking module information crypto_api: Cryptographic API (backported) Checking module information cpufreq_ondemand: 'cpufreq_ondemand' - A dynamic cpufreq governor for Low Latency Frequency Transition capable processors Checking module information acpi_cpufreq: ACPI Processor P-States Driver Checking module information dm_multipath: device-mapper multipath target Checking module information scsi_dh: SCSI device handler Checking module information video: ACPI Video Driver Checking module information hwmon: hardware monitoring sysfs/class support Checking module information backlight: Backlight Lowlevel Control Abstraction Checking module information sbs: Smart Battery System ACPI interface driver Checking module information i2c_ec: ACPI EC SMBus driver Checking module information i2c_core: I2C-Bus main module Checking module information button: ACPI Button Driver Checking module information battery: ACPI Battery Driver Checking module information asus_acpi: Asus Laptop ACPI Extras Driver Checking module information ac: ACPI AC Adapter Driver Checking module information parport_pc: PC-style parallel port driver Checking module information lp: Checking module information parport: Checking module information tg3: Broadcom Tigon3 ethernet driver 3.93 Checking module information sg: SCSI generic (sg) driver 3.5.34 Checking module information libphy: PHY library Checking module information serio_raw: Raw serio driver Checking module information pcspkr: PC Speaker beeper driver Checking module information dm_raid45: device-mapper raid4/5 target Checking module information dm_message: device-mapper device-mapper target message parser Checking module information dm_region_hash: device-mapper region hash Checking module information dm_mem_cache: device-mapper dm memory cache Checking module information dm_snapshot: device-mapper snapshot target Checking module information dm_zero: device-mapper dummy target returning zeros Checking module information dm_mirror: device-mapper mirror target Checking module information dm_log: device-mapper dirty region log Checking module information dm_mod: device-mapper driver Checking module information mptsas: Fusion MPT SAS Host driver 3.04.07 Checking module information mptscsih: Fusion MPT SCSI Host driver 3.04.07 Checking module information mptbase: Fusion MPT base driver 3.04.07 Checking module information scsi_transport_sas: SAS Transphy Attributes Checking module information sd_mod: SCSI disk (sd) driver Checking module information scsi_mod: SCSI core Checking module information ext3: Second Extended Filesystem with journaling extensions Checking module information jbd: Checking module information uhci_hcd: USB Universal Host Controller Interface driver Checking module information ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver Checking module information ehci_hcd: 10 Dec 2004 USB 2.0 'Enhanced' Host Controller (EHCI) Driver ********** SELinux Status ********** SELinux status: enabled SELinuxfs mount: /selinux Current mode: enforcing Mode from config file: enforcing Policy version: 21 Policy from config file: targeted ********** SELinux Module list ********** amavis 1.1.0 ccs 1.0.0 clamav 1.1.0 dcc 1.1.0 dnsmasq 1.1.1 evolution 1.1.0 ipsec 1.4.0 iscsid 1.0.0 mozilla 1.1.0 mplayer 1.1.0 nagios 1.1.0 oddjob 1.0.1 pcscd 1.0.0 pki 1.0.0 prelude 1.0.0 pyzor 1.1.0 razor 1.1.0 ricci 1.0.0 smartmon 1.1.0 testPolicy 1.0.0 virt 1.0.0 zosremote 1.0.0 ******** End System Information ********
I was wondering if it was possible to test a kernel on my people page that may fix this problem. See people.redhat.com/jfeeney/.bz511918 The patch has been posted and SHOULD make the next kernel build but more testing would be really good. Thank you so much if this is possible. John
I've tested this instead of CAI Qian, with the lastest RHEL 5.5, kernel 2.6.18-194.el5. The test lasted for 20 hours with 4000 ssh connections and found no problem, no link-down messages. I think this tg3 issue has already been fixed. Thanks, Lijian Xu
Lijian, Thanks for the update. So I guess I will close this as CURRENT RELEASE then. John