Bug 308041
Summary: | tg3: watchdog timeout in BCM95700A6 rev 7104 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Marcus Alves Grando <marcus> |
Component: | kernel | Assignee: | Andy Gospodarek <agospoda> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Martin Jenner <mjenner> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.5 | CC: | agospoda, benlu, davem, duck, fhirtz, jtorrice, mcarlson, mchan, pale, peterm, tao, zbuhman |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-01-05 19:00:56 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 461297 |
Description
Marcus Alves Grando
2007-09-26 20:50:22 UTC
Forget that: # modinfo tg3 filename: /lib/modules/2.6.9-59.ELsmp/kernel/drivers/net/tg3.ko author: David S. Miller (davem) and Jeff Garzik (jgarzik) description: Broadcom Tigon3 ethernet driver license: GPL version: 3.77 71727D9639669384D3745EA parm: tg3_debug:Tigon3 bitmapped debugging message enable value vermagic: 2.6.9-59.ELsmp SMP 686 REGPARM 4KSTACKS gcc-3.4 Do you see this problem only when sending large amounts of traffic on the network? (In reply to comment #2) > Do you see this problem only when sending large amounts of traffic on the network? Yes, that's occurrs only when this server have huge network usage. Regards More info about that: I rebuild jbarton kernel 2.6.9-62 + tg3 3.81 update and put in this server. Afther that i see that TSO are disable by default on this network model. Another point is that after some time i see watchdog timeout again. See below: NETDEV WATCHDOG: eth1: transmit timed out tg3: eth1: transmit timed out, resetting tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2 tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth1: Link is down. tg3: eth1: Link is up at 1000 Mbps, full duplex. tg3: eth1: Flow control is off for TX and off for RX. NETDEV WATCHDOG: eth1: transmit timed out tg3: eth1: transmit timed out, resetting tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2 tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth1: Link is down. tg3: eth1: Link is up at 1000 Mbps, full duplex. tg3: eth1: Flow control is off for TX and off for RX. NETDEV WATCHDOG: eth1: transmit timed out tg3: eth1: transmit timed out, resetting tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2 tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth1: Link is down. tg3: eth1: Link is up at 1000 Mbps, full duplex. tg3: eth1: Flow control is off for TX and off for RX. At this time my network traffic on this card are +-30Mbps. eth1 on this server are used only to NFS. dmesg on boot tg3.c:v3.81 (September 5, 2007) divert: allocating divert_blk for eth0 eth0: Tigon3 [partno(BCM95700A6) rev 7104 PHY(5411)] (PCI:66MHz:64-bit) 10/100/1000Base-T Ethernet 00:0d:56:70:df:4d eth0: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] WireSpeed[0] TSOcap[0] eth0: dma_rwctrl[76ff000f] dma_mask[64-bit] divert: allocating divert_blk for eth1 eth1: Tigon3 [partno(BCM95700A6) rev 7104 PHY(5411)] (PCI:66MHz:64-bit) 10/100/1000Base-T Ethernet 00:0d:56:70:df:4e eth1: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] WireSpeed[0] TSOcap[0] eth1: dma_rwctrl[76ff000f] dma_mask[64-bit] tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. tg3: eth1: Link is up at 1000 Mbps, full duplex. tg3: eth1: Flow control is off for TX and off for RX. # modinfo tg3 filename: /lib/modules/2.6.9-62.EL.mnag.2smp/kernel/drivers/net/tg3.ko author: David S. Miller (davem) and Jeff Garzik (jgarzik) description: Broadcom Tigon3 ethernet driver license: GPL version: 3.81 7761AE4F01E4189F0085C8E parm: tg3_debug:Tigon3 bitmapped debugging message enable value vermagic: 2.6.9-62.EL.mnag.2smp SMP 686 REGPARM 4KSTACKS gcc-3.4 Ok... i rebuild src.rpm kernel and run tg3_dump_state() in tg3_tx_timeout(). I don't know if it can help, result below: NETDEV WATCHDOG: eth1: transmit timed out tg3: eth1: transmit timed out, resetting DEBUG: PCI status [02b0] TG3PCI state[0000008e] DEBUG: MAC_MODE[00e04c04] MAC_STATUS[01400003] MAC_EVENT[00000000] MAC_LED_CTRL[00000100] DEBUG: MAC_TX_MODE[00000002] MAC_TX_STATUS[00000008] MAC_RX_MODE[00000402] MAC_RX_STATUS[00000000] DEBUG: SNDDATAI_MODE[00000002] SNDDATAI_STATUS[00000000] SNDDATAI_STATSCTRL[00000003] DEBUG: SNDDATAC_MODE[00000002] DEBUG: SNDBDS_MODE[00000006] SNDBDS_STATUS[00000000] DEBUG: SNDBDI_MODE[00000006] SNDBDI_STATUS[00000000] DEBUG: SNDBDC_MODE[00000002] DEBUG: RCVLPC_MODE[00000002] RCVLPC_STATUS[00000000] RCVLPC_STATSCTRL[00000001] DEBUG: RCVDBDI_MODE[00000012] RCVDBDI_STATUS[00000000] DEBUG: RCVDCC_MODE[00000006] DEBUG: RCVBDI_MODE[00000006] RCVBDI_STATUS[00000000] DEBUG: RCVCC_MODE[00000006] RCVCC_STATUS[00000000] DEBUG: RCVLSC_MODE[00000006] RCVLSC_STATUS[00000000] DEBUG: MBFREE_MODE[00000002] MBFREE_STATUS[00000000] DEBUG: HOSTCC_MODE[00000002] HOSTCC_STATUS[00000000] DEBUG: HOSTCC_STATS_BLK_HOST_ADDR[000000000d568000] DEBUG: HOSTCC_STATUS_BLK_HOST_ADDR[0000000024870000] DEBUG: HOSTCC_STATS_BLK_NIC_ADDR[00000300] DEBUG: HOSTCC_STATUS_BLK_NIC_ADDR[00000b00] DEBUG: MEMARB_MODE[00000002] MEMARB_STATUS[00000000] DEBUG: BUFMGR_MODE[00000006] BUFMGR_STATUS[00000010] DEBUG: BUFMGR_MB_POOL_ADDR[00008000] BUFMGR_MB_POOL_SIZE[00018000] DEBUG: BUFMGR_DMA_DESC_POOL_ADDR[00002000] BUFMGR_DMA_DESC_POOL_SIZE[00002000] DEBUG: RDMAC_MODE[000003fe] RDMAC_STATUS[00000000] DEBUG: WDMAC_MODE[000003fe] WDMAC_STATUS[00000000] DEBUG: DMAC_MODE[00000002] DEBUG: GRC_MODE[04130034] GRC_MISC_CFG[0001f082] DEBUG: GRC_LOCAL_CTRL[01009709] DEBUG: RCVDBDI_JUMBO_BD[0000000000000000:00000002:00000000] DEBUG: RCVDBDI_STD_BD[00000000229a8000:06000000:00006000] DEBUG: RCVDBDI_MINI_BD[0000000000000000:00000002:00000000] DEBUG: SRAM_SEND_RCB_0[000000001e838000:02000000:00004000] DEBUG: SRAM_RCV_RET_RCB_0[0000000036a68000:04000000:00000000] DEBUG: SRAM_STATUS_BLK[00000001:00000000:01990000:00000000:003a0399] DEBUG: Host status block [00000000:00000000:(0000:0199:0000):(0399:003a)] DEBUG: Host statistics block [00000000:00000000:00000000:00000000] DEBUG: SNDHOST_PROD[0000000000000026] SNDNIC_PROD[000000000000007e] DEBUG: NIC TXD(0)[00000000:0304ba02:002a0004:00000000] DEBUG: NIC TXD(1)[00000000:0332f602:002a0004:00000000] DEBUG: NIC TXD(2)[00000000:37d7da02:002a0004:00000000] DEBUG: NIC TXD(3)[00000000:37e35802:002a0004:00000000] DEBUG: NIC TXD(4)[00000000:37d7d202:002a0004:00000000] DEBUG: NIC TXD(5)[00000000:37fc6602:002a0004:00000000] DEBUG: NIC RXD_STD(0)[0][00000000:1c2bb812:00000040:00000004] DEBUG: NIC RXD_STD(0)[1][00006bb4:00000000:00000000:00010180] DEBUG: NIC RXD_STD(1)[0][00000000:0d832012:00000040:00000004] DEBUG: NIC RXD_STD(1)[1][00006bb8:00000000:00000000:00010181] DEBUG: NIC RXD_STD(2)[0][00000000:18ad8812:00000040:00000004] DEBUG: NIC RXD_STD(2)[1][00006c4c:00000000:00000000:00010182] DEBUG: NIC RXD_STD(3)[0][00000000:090d4012:00000040:00000004] DEBUG: NIC RXD_STD(3)[1][00004594:00000000:00000000:00010183] DEBUG: NIC RXD_STD(4)[0][00000000:2e740812:00000107:00003004] DEBUG: NIC RXD_STD(4)[1][ffffffff:00000000:00000000:00010184] DEBUG: NIC RXD_STD(5)[0][00000000:0a49f012:00000040:00000004] DEBUG: NIC RXD_STD(5)[1][00004594:00000000:00000000:00010185] DEBUG: NIC RXD_JUMBO(0)[0][0a0e1854:3c15d171:3db6bd4c:d7d77251] DEBUG: NIC RXD_JUMBO(0)[1][84e41061:52a7221d:bc2576b2:5ec87a1b] DEBUG: NIC RXD_JUMBO(1)[0][349c8333:c3612bd9:7fdb1ef7:267e9c33] DEBUG: NIC RXD_JUMBO(1)[1][c2272f85:1ee96985:071fa587:5a526f09] DEBUG: NIC RXD_JUMBO(2)[0][81f926a3:f08b860d:787552bc:66df5b7e] DEBUG: NIC RXD_JUMBO(2)[1][c5640bb1:6ec792e3:8388ad34:e18df4a5] DEBUG: NIC RXD_JUMBO(3)[0][be6a696e:0a440ca9:53668abf:ebb2da3b] DEBUG: NIC RXD_JUMBO(3)[1][dae40140:1814eccf:f7e855ba:5539f469] DEBUG: NIC RXD_JUMBO(4)[0][c2a23bf3:73bc7680:bf48b1a7:7bc5a4fc] DEBUG: NIC RXD_JUMBO(4)[1][c4f751cd:e5e2096d:83ff78e6:77dc9d3b] DEBUG: NIC RXD_JUMBO(5)[0][16dc7de5:5cd1657f:f2f9d5fb:9fedc6cc] DEBUG: NIC RXD_JUMBO(5)[1][5a89287e:78413466:a6ba56fb:bff73134] tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2 tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth1: Link is down. tg3: eth1: Link is up at 100 Mbps, full duplex. tg3: eth1: Flow control is off for TX and off for RX. TSO is not supported by 5700 at all and should not be enabled by any version of the driver. I don't see anything obvious in the register dump. Which version of tg3 before 3.77 did not show this tx timeout problem? (In reply to comment #6) > TSO is not supported by 5700 at all and should not be enabled by any version > of the driver. I don't see anything obvious in the register dump. Now i know that. ;) > > Which version of tg3 before 3.77 did not show this tx timeout problem? I really don't know. Because this server always have this problem. I rebuild again latest redhat kernel with 3.81 driver update and enable again debug on tg3. See below again: # uname -r 2.6.9-62.EL.smp # modinfo tg3 filename: /lib/modules/2.6.9-62.EL.smp/kernel/drivers/net/tg3.ko author: David S. Miller (davem) and Jeff Garzik (jgarzik) description: Broadcom Tigon3 ethernet driver license: GPL version: 3.81 367C507F175A91EAAFC7F7D parm: tg3_debug:Tigon3 bitmapped debugging message enable value vermagic: 2.6.9-62.EL.smp SMP 686 REGPARM 4KSTACKS gcc-3.4 # dmesg | egrep "(tg3|eth)" divert: not allocating divert_blk for non-ethernet device lo tg3.c:v3.81 (September 5, 2007) divert: allocating divert_blk for eth0 eth0: Tigon3 [partno(BCM95700A6) rev 7104 PHY(5411)] (PCI:66MHz:64-bit) 10/100/1000Base-T Ethernet 00:0d:56:70:df:71 eth0: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] WireSpeed[0] TSOcap[0] eth0: dma_rwctrl[76ff000f] dma_mask[64-bit] divert: allocating divert_blk for eth1 eth1: Tigon3 [partno(BCM95700A6) rev 7104 PHY(5411)] (PCI:66MHz:64-bit) 10/100/1000Base-T Ethernet 00:0d:56:70:df:72 eth1: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] WireSpeed[0] TSOcap[0] eth1: dma_rwctrl[76ff000f] dma_mask[64-bit] tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. tg3: eth1: Link is up at 100 Mbps, full duplex. tg3: eth1: Flow control is off for TX and off for RX. NETDEV WATCHDOG: eth1: transmit timed out tg3: eth1: transmit timed out, resetting tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] tg3: DEBUG: PCI status [02b0] TG3PCI state[0000008e] tg3: DEBUG: MAC_MODE[00e04c04] MAC_STATUS[05401013] tg3: DEBUG: MAC_TX_MODE[00000002] MAC_TX_STATUS[00000008] tg3: DEBUG: SNDDATAI_MODE[00000002] SNDDATAI_STATUS[00000000] tg3: DEBUG: SNDDATAC_MODE[00000002] tg3: DEBUG: SNDBDS_MODE[00000006] SNDBDS_STATUS[00000000] tg3: DEBUG: SNDBDI_MODE[00000006] SNDBDI_STATUS[00000000] tg3: DEBUG: SNDBDC_MODE[00000002] tg3: DEBUG: RCVLPC_MODE[00000002] RCVLPC_STATUS[00000000] tg3: DEBUG: RCVDBDI_MODE[00000012] RCVDBDI_STATUS[00000000] tg3: DEBUG: RCVDCC_MODE[00000006] tg3: DEBUG: RCVBDI_MODE[00000006] RCVBDI_STATUS[00000000] tg3: DEBUG: RCVCC_MODE[00000006] RCVCC_STATUS[00000000] tg3: DEBUG: RCVLSC_MODE[00000006] RCVLSC_STATUS[00000000] tg3: DEBUG: MBFREE_MODE[00000002] MBFREE_STATUS[00000000] tg3: DEBUG: HOSTCC_MODE[00000002] HOSTCC_STATUS[00000000] tg3: DEBUG: HOSTCC_STATS_BLK_HOST_ADDR[0000000036529000] tg3: DEBUG: HOSTCC_STATUS_BLK_HOST_ADDR[0000000036574000] tg3: DEBUG: HOSTCC_STATS_BLK_NIC_ADDR[00000300] tg3: DEBUG: HOSTCC_STATUS_BLK_NIC_ADDR[00000b00] tg3: DEBUG: MEMARB_MODE[00000002] MEMARB_STATUS[00000000] tg3: DEBUG: BUFMGR_MODE[00000006] BUFMGR_STATUS[00000010] tg3: DEBUG: BUFMGR_MB_POOL_ADDR[00008000] BUFMGR_MB_POOL_SIZE[00018000] tg3: DEBUG: BUFMGR_DMA_DESC_POOL_ADDR[00002000] BUFMGR_DMA_DESC_POOL_SIZE[00002000] tg3: DEBUG: RDMAC_MODE[000003fe] RDMAC_STATUS[00000000] tg3: DEBUG: WDMAC_MODE[000003fe] WDMAC_STATUS[00000000] tg3: DEBUG: DMAC_MODE[00000002] tg3: DEBUG: GRC_MODE[04130034] GRC_MISC_CFG[0001f082] tg3: DEBUG: GRC_LOCAL_CTRL[01009709] tg3: DEBUG: RCVDBDI_JUMBO_BD[0000000000000000:00000002:00000000] tg3: DEBUG: RCVDBDI_STD_BD[0000000035ed4000:06000000:00006000] tg3: DEBUG: RCVDBDI_MINI_BD[0000000000000000:00000002:00000000] tg3: DEBUG: SRAM_SEND_RCB_0[0000000035ee0000:02000000:00004000] tg3: DEBUG: SRAM_RCV_RET_RCB_0[0000000035ed8000:04000000:00000000] tg3: DEBUG: SRAM_STATUS_BLK[00000001:00000000:01fd0000:00000000:01d203fd] tg3: DEBUG: Host status block [00000000:00000000:(0000:01fd:0000):(03fd:01d2)] tg3: DEBUG: Host statistics block [00000000:00000000:00000000:00000000] tg3: DEBUG: SNDHOST_PROD[00000000000001be] SNDNIC_PROD[0000000000000016] tg3: DEBUG: NIC TXD(0)[00000000:030bba02:002a0004:00000000] tg3: DEBUG: NIC TXD(1)[00000000:030fa202:002a0004:00000000] tg3: DEBUG: NIC TXD(2)[00000000:030ffe02:002a0004:00000000] tg3: DEBUG: NIC TXD(3)[00000000:030be402:005a0005:00000000] tg3: DEBUG: NIC TXD(4)[00000000:03090602:002a0004:00000000] tg3: DEBUG: NIC TXD(5)[00000000:0309a602:002a0004:00000000] tg3: DEBUG: NIC RXD_STD(0)[0][00000000:35ee7012:00000108:00003004] tg3: DEBUG: NIC RXD_STD(0)[1][ffffffff:00000000:00000000:00010180] tg3: DEBUG: NIC RXD_STD(1)[0][00000000:35a7e812:00000040:00000004] tg3: DEBUG: NIC RXD_STD(1)[1][00006bb9:00000000:00000000:00010181] tg3: DEBUG: NIC RXD_STD(2)[0][00000000:35a46012:00000040:00000004] tg3: DEBUG: NIC RXD_STD(2)[1][0000d11e:00000000:00000000:00010182] tg3: DEBUG: NIC RXD_STD(3)[0][00000000:35d7f812:00000040:00000004] tg3: DEBUG: NIC RXD_STD(3)[1][00006c4b:00000000:00000000:00010183] tg3: DEBUG: NIC RXD_STD(4)[0][00000000:35ee5012:00000040:00000004] tg3: DEBUG: NIC RXD_STD(4)[1][00006b9f:00000000:00000000:00010184] tg3: DEBUG: NIC RXD_STD(5)[0][00000000:35a82812:00000040:00000004] tg3: DEBUG: NIC RXD_STD(5)[1][00006ba3:00000000:00000000:00010185] tg3: DEBUG: NIC RXD_JUMBO(0)[0][020e9254:2c11d913:3db69d7c:d7c77250] tg3: DEBUG: NIC RXD_JUMBO(0)[1][84c41071:12a7229c:fe2576b2:7ac93a1b] tg3: DEBUG: NIC RXD_JUMBO(1)[0][34bc8b33:83612bd9:7fdb1ef7:267e1c33] tg3: DEBUG: NIC RXD_JUMBO(1)[1][e2232f85:9e096984:471ba587:9a5a3f99] tg3: DEBUG: NIC RXD_JUMBO(2)[0][81f922a3:e08b8e8c:587552bc:66ff5b7e] tg3: DEBUG: NIC RXD_JUMBO(2)[1][c56403b1:6ec793e3:8394ad34:e18df0a0] tg3: DEBUG: NIC RXD_JUMBO(3)[0][be6a412e:0bc61ca9:d36e8abf:e1b2da3b] tg3: DEBUG: NIC RXD_JUMBO(3)[1][f8e40040:3810eccf:f7ead13e:d539f469] tg3: DEBUG: NIC RXD_JUMBO(4)[0][c2a238f3:73ac7680:bb68b1a7:7bc5a4fc] tg3: DEBUG: NIC RXD_JUMBO(4)[1][d1f751cd:e56a1d69:83f778e6:77cc9d1b] tg3: DEBUG: NIC RXD_JUMBO(5)[0][17dc7de7:5cd0655e:f3f9ddfa:9fed46cc] tg3: DEBUG: NIC RXD_JUMBO(5)[1][5a89287e:f84124e6:a6ba56f3:bff73534] tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth1: Link is down. tg3: eth1: Link is up at 100 Mbps, full duplex. tg3: eth1: Flow control is off for TX and off for RX. # ethtool eth1 Settings for eth1: Supported ports: [ MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Current message level: 0x000000ff (255) Link detected: yes # ethtool -k eth1 Offload parameters for eth1: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: off Maybe disable "tx-checksumming: on" can help? I see in dmesg support for RXcsums[1] but don't see anything about TX checksum? Make any sense? Regards (In reply to comment #8) > Maybe disable "tx-checksumming: on" can help? I see in dmesg support for > RXcsums[1] but don't see anything about TX checksum? Make any sense? > > Regards You could try it, but I wouldn't guess that it will save much. By the way, was the 3.81 update from here? http://people.redhat.com/agospoda/#rhel4 I'm hoping that it is. Thanks! :-) I don't think checksum will have any effect on tx timeout, but you can try turning it off. You have 2 5700 devices eth0 and eth1. Do you see tx timeout only on eth1? I just tested a 5700 NIC card with the same rev of chip using netperf and it ran fine. What kind of traffic do you have on eth1? (In reply to comment #9) > (In reply to comment #8) > > Maybe disable "tx-checksumming: on" can help? I see in dmesg support for > > RXcsums[1] but don't see anything about TX checksum? Make any sense? > > > > Regards > > You could try it, but I wouldn't guess that it will save much. By the way, was > the 3.81 update from here? > > http://people.redhat.com/agospoda/#rhel4 > > I'm hoping that it is. Thanks! :-) > Yes. I take 3.81 from your repo. (In reply to comment #10) > I don't think checksum will have any effect on tx timeout, but you can try > turning it off. > > You have 2 5700 devices eth0 and eth1. Do you see tx timeout only on eth1? Yes. I see that only on eth1. > > I just tested a 5700 NIC card with the same rev of chip using netperf and it > ran fine. What kind of traffic do you have on eth1? It's very strange... because some time accurrs every time and after pass more than 2 days without that. This nic are used to NFS traffic. More precisely email traffic via NFS. I'll try turn off tx checksum to see what's happening. Similar traffic goes through eth0 and eth1, but you only saw timeout on eth1? The 2 devices are on the same bus (08) and so if there are any issues on the bus, both devices should be affected. Marcus, Is your MTU 1500 for these interfaces or larger? If larger can you reproduce this issue with an MTU of 1500? Thanks! (In reply to comment #12) > Similar traffic goes through eth0 and eth1, but you only saw timeout on eth1? More or less. For example now eth0 receive 0.8Mb and send 8.51Mb and eth1 send 2.02Mb and receive 6.89Mb. > The 2 devices are on the same bus (08) and so if there are any issues on the > bus, both devices should be affected. Hmmm... actually eth0 is plugged in one cisco catalyst 297024 (IOS 12.2(25)SEB4) and eth1 are plugged in one cisco catalyst 2924XLv (IOS 12.0(5)WC9a) But i don't think that can make eth1 watchdog timeout. (In reply to comment #13) > Marcus, > > Is your MTU 1500 for these interfaces or larger? If larger can you reproduce > this issue with an MTU of 1500? > > Thanks! All servers that accurrs that MTU are 1500. I don't use MTU greater than 1500. Regards >> The 2 devices are on the same bus (08) and so if there are any issues on >> the bus, both devices should be affected. > > Hmmm... actually eth0 is plugged in one cisco catalyst 297024 (IOS > 12.2(25)SEB4) and eth1 are plugged in one cisco catalyst 2924XLv (IOS > 12.0(5)WC9a) I was referring to the PCI bus. Both devices are on bus 8 based on your lspci output. (In reply to comment #15) > >> The 2 devices are on the same bus (08) and so if there are any issues on > >> the bus, both devices should be affected. > > > > Hmmm... actually eth0 is plugged in one cisco catalyst 297024 (IOS > > 12.2(25)SEB4) and eth1 are plugged in one cisco catalyst 2924XLv (IOS > > 12.0(5)WC9a) > > I was referring to the PCI bus. Both devices are on bus 8 based on your lspci > output. Yes, both devices are on bus 08. 08:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5700 Gigabit Ethernet (rev 14) 08:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5700 Gigabit Ethernet (rev 14) People, some news about that? Regards Because you always see the problem on eth0 only no matter what version of the driver you use, I think it is possible that you have a bad chip in eth0. Do you have different machines exhibiting the same problem? Yes. I have many servers with this problem. Last Friday i update driver to 3.84 and see this problems too. All servers that i maintain have this problem. Another idea to debug this problem? Regards OK, I'll ask our QA lab to see if they have a Dell PE6650 to reproduce the problem. Can you find a simple traffic pattern (such as netperf, iperf) that will easily trigger the problem? This will make it easier for us to reproduce the problem. (In reply to comment #20) > OK, I'll ask our QA lab to see if they have a Dell PE6650 to reproduce the > problem. Can you find a simple traffic pattern (such as netperf, iperf) that > will easily trigger the problem? This will make it easier for us to reproduce > the problem. I'll try use netperf/iperf to reproduce that. One interesting point is after disable tx checksum offload with ethtool my servers works normally. I enable that one week ago and works fine until now. # ethtool -k eth0 Offload parameters for eth0: rx-checksumming: on tx-checksumming: off scatter-gather: on tcp segmentation offload: off Maybe that's related with tx checksum? Regards Hmmm, tg3_get_invariants definitely sets checksumming off for what I would guess it an older version of 5700. Any chance this needs to be exteneded to other versions too? /* 5700 B0 chips do not support checksumming correctly due * to hardware bugs. */ if (tp->pci_chip_rev_id == CHIPREV_ID_5700_B0) tp->tg3_flags |= TG3_FLAG_BROKEN_CHECKSUMS; He is using B4 which shouldn't have the problem any more. But in any case, the checksum problem was algorithmic, meaning that it would generate the wrong checksum on B0 chips. I don't understand how tx checksum can cause tx timeout on eth0 only and not on eth1. (In reply to comment #23) > He is using B4 which shouldn't have the problem any more. But in any case, > the checksum problem was algorithmic, meaning that it would generate the wrong > checksum on B0 chips. I don't understand how tx checksum can cause tx timeout > on eth0 only and not on eth1. It would seem odd to me as well since these are on 2 different cards, right? The 5700 isn't a dual-port card (that is just the 5704 iirc), is it? I could understand if it was a 5704 where two ports are sharing one chip, so it would seem that congestion on one port could cause problems on another port, but I cannot guarantee that would even be a problem since I know little about the hardware design itself. Marcus, Is this still a problem? We can try to clean up the tg3 driver and disable tx checksumming on rev B4 5700 chips, but Michael doesn't seem to think that is needed so I'm reluctant to do that (and he knows the hardware well enough to know what is needed). I'll take a look at the patch for tg3 that was added to 2.6.9-59 if you feel that was the first kernel that you noticed having problems. Andy, Well, my test server that has a problem running for 43 days without a problem. Now all my servers that has a tg3 NIC or have a old driver or have a 3.84 driver. I'll update to 2.6.9-69 (tg3 3.86) and see what's happening. I don't know how I can reproduce that, and if I have more info I'll add here. Regards Thanks for the update, Marcus. I am glad your servers are running well, but concerned that the problem has gone away. Did you do anything else to the servers (like update the BIOS) recently? Ok, more news... Now we change some servers with this chip to AS5. Now every time occurs watchdog timeout. # uname -r 2.6.18-89.el5.gtest.46PAE # cat /var/log/dmesg | egrep "(eth|tg3)" tg3.c:v3.86-rh (November 9, 2007) eth0: Tigon3 [partno(BCM95700A6) rev 7104 PHY(5411)] (PCI:66MHz:64-bit) 10/100/1000Base-T Ethernet 00:11:43:32:41:58 eth0: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] WireSpeed[0] TSOcap[0] eth0: dma_rwctrl[76ff000f] dma_mask[64-bit] eth1: Tigon3 [partno(BCM95700A6) rev 7104 PHY(5411)] (PCI:66MHz:64-bit) 10/100/1000Base-T Ethernet 00:11:43:32:41:59 eth1: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] WireSpeed[0] TSOcap[0] eth1: dma_rwctrl[76ff000f] dma_mask[64-bit] # dmesg | egrep "(eth|tg3)" tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth0: Link is down. tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. NETDEV WATCHDOG: eth0: transmit timed out tg3: eth0: transmit timed out, resetting tg3: DEBUG: MAC_TX_STATUS[00000018] MAC_RX_STATUS[00000008] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth0: Link is down. tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. NETDEV WATCHDOG: eth0: transmit timed out tg3: eth0: transmit timed out, resetting tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000008] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth0: Link is down. tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. # ethtool -i eth0 driver: tg3 version: 3.86-rh firmware-version: bus-info: 0000:08:01.0 # ethtool -i eth1 driver: tg3 version: 3.86-rh firmware-version: bus-info: 0000:08:02.0 I have a ethtool -d too... but I don't know if it help. So, Michael I have the HW to test, maybe you can prepare one patch to identify this? Regards Andy, Maybe you can replicate this BUG to AS5? It's critical since tg3 3.86 already commited to AS5 kernel and AS5 is in beta stage. Regards Is the tg3 using MSI interrupts? Problems like these seem to happen when network cards don't operate with some bridge chips. If your system uses MSI, can you boot with pci=nomsi on the kernel command line and let me know how well that works? Also an lspci -vvv from the system would be helpful. Thanks. Andy, this old chip does not support MSI. lspci will show that MSI is supported but tg3 will not use MSI. Joe@broadcom, can you see if you can reproduce this problem using the AS5 kernel? We can also send a debug patch to Marcus to dump all registers during watchdog. ethtool -d won't help because by then the chip has been reset already. Thanks for chiming in, Michael. I get so many of these watchdog timeouts on various drivers and many of them seem to come from irqs not working well and servicing the tx ring buffers. Most of these come from a lack of interaction between msi bridgees and nics. (In reply to comment #30) > Is the tg3 using MSI interrupts? Problems like these seem to happen when > network cards don't operate with some bridge chips. # lspci -vvv 08:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5700 Gigabit Ethernet (rev 14) Subsystem: Dell Broadcom BCM5700 1000Base-T Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (16000ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 193 Region 0: Memory at fcd10000 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] PCI-X non-bridge device Command: DPERE- ERO- RBC=512 OST=1 Status: Dev=ff:1f.1 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512 DMOST=1 DMCRS=8 RSCEM- 266MHz- 533MHz- Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- Address: 6451b204961402c0 Data: c620 08:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5700 Gigabit Ethernet (rev 14) Subsystem: Dell Broadcom BCM5700 1000Base-T Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (16000ns min), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 201 Region 0: Memory at fcd00000 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] PCI-X non-bridge device Command: DPERE- ERO- RBC=512 OST=1 Status: Dev=ff:1f.1 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512 DMOST=1 DMCRS=8 RSCEM+ 266MHz- 533MHz- Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- Address: 43a2947549442a08 Data: 1983 # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 686828617 0 0 0 0 0 0 0 IO-APIC-edge timer 1: 22 426 35 0 12 0 0 0 IO-APIC-edge i8042 6: 3 0 0 0 0 0 0 0 IO-APIC-edge floppy 8: 1 0 0 0 0 0 0 0 IO-APIC-edge rtc 9: 1 0 0 0 0 0 0 0 IO-APIC-level acpi 10: 0 0 0 0 0 0 0 0 IO-APIC-level ohci_hcd:usb1 12: 106 7 0 0 0 0 0 0 IO-APIC-edge i8042 14: 33 930 10175 66 2568 0 0 0 IO-APIC-edge ide0 177: 5895 194684 14801 1586214 0 5248290 0 0 IO-APIC-level megaraid 185: 15 0 0 0 0 0 0 0 IO-APIC-level aic7xxx 193: 451 67297841 0 2085 0 0 0 3462533099 IO-APIC-level eth0 201: 385 5131688 2311 1197221598 0 331115277 0 0 IO-APIC-level eth1 NMI: 0 0 0 0 0 0 0 0 LOC: 686919525 686911835 686893240 686913213 686919644 686919498 686919687 686919644 ERR: 0 MIS: 0 So, like Michael say about MSI, in lspci show enable and /proc/interrupts does not appear. Michael, feel free to sent me a patch to test, every ~5min watchdog timeout appear. Thanks all. Michael, any news? Marcus, A couple things you could try : 1) Can you add the following just above 'schedule_work(&tp->reset_task);' in tg3_tx_timeout: printk(KERN_NOTICE "MAILBOX_INTERRUPT_0 = 0x%x, tp->irq_sync = %d\n", tr32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW), tp->irq_sync); I want to make sure interrupts are still enabled. 2) I noticed that the link reports tx and rx flow control is off. Is it possible to reproduce the problem if you connect to a switch that supports flow control? 3) I also noticed that PHY autopolling is turned on. I really don't think it would have any effect, but could you comment the following block of code in tg3_setup_copper_phy : #if 0 /* ??? Without this setting Netgear GA302T PHY does not * ??? send/receive packets... */ if ((tp->phy_id & PHY_ID_MASK) == PHY_ID_BCM5411 && tp->pci_chip_rev_id == CHIPREV_ID_5700_ALTIMA) { tp->mi_mode |= MAC_MI_MODE_AUTO_POLL; tw32_f(MAC_MI_MODE, tp->mi_mode); udelay(80); } #endif Well, I've added printk for first item in running kernel now. Let's wait for a new watchdog now. About second item, I need to find some switch to do that, but I'll try. I've added a similar patch to tirth item. When if is true I put a printk to see when this code are executed and the rest are commented. When I boot the server with new kernel I already see this printk, let's wait a new whatchdog timeout. -- if ((tp->phy_id & PHY_ID_MASK) == PHY_ID_BCM5411 && tp->pci_chip_rev_id == CHIPREV_ID_5700_ALTIMA) { if ((tp->phy_id & PHY_ID_MASK) == PHY_ID_BCM5411 && tp->pci_chip_rev_id == CHIPREV_ID_5700_ALTIMA) { tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. if ((tp->phy_id & PHY_ID_MASK) == PHY_ID_BCM5411 && tp->pci_chip_rev_id == CHIPREV_ID_5700_ALTIMA) { if ((tp->phy_id & PHY_ID_MASK) == PHY_ID_BCM5411 && tp->pci_chip_rev_id == CHIPREV_ID_5700_ALTIMA) { tg3: eth1: Link is up at 100 Mbps, full duplex. tg3: eth1: Flow control is off for TX and off for RX. -- mcarlson, Now it's happen again... --dmesg-- NETDEV WATCHDOG: eth0: transmit timed out tg3: eth0: transmit timed out, resetting tg3: DEBUG: MAC_TX_STATUS[00000018] MAC_RX_STATUS[00000008] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth0: Link is down. if ((tp->phy_id & PHY_ID_MASK) == PHY_ID_BCM5411 && tp->pci_chip_rev_id == CHIPREV_ID_5700_ALTIMA) { if ((tp->phy_id & PHY_ID_MASK) == PHY_ID_BCM5411 && tp->pci_chip_rev_id == CHIPREV_ID_5700_ALTIMA) { tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. NETDEV WATCHDOG: eth0: transmit timed out tg3: eth0: transmit timed out, resetting tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2 tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth0: Link is down. if ((tp->phy_id & PHY_ID_MASK) == PHY_ID_BCM5411 && tp->pci_chip_rev_id == CHIPREV_ID_5700_ALTIMA) { if ((tp->phy_id & PHY_ID_MASK) == PHY_ID_BCM5411 && tp->pci_chip_rev_id == CHIPREV_ID_5700_ALTIMA) { tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. NETDEV WATCHDOG: eth0: transmit timed out tg3: eth0: transmit timed out, resetting tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000008] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth0: Link is down. -- I'll test today turn on flow control. Regards I forgot to put a important part... sorry. -- NETDEV WATCHDOG: eth0: transmit timed out tg3: eth0: transmit timed out, resetting tg3: DEBUG: MAC_TX_STATUS[00000018] MAC_RX_STATUS[00000008] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] MAILBOX_INTERRUPT_0 = 0x0, tp->irq_sync = 0 tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth0: Link is down. if ((tp->phy_id & PHY_ID_MASK) == PHY_ID_BCM5411 && tp->pci_chip_rev_id == CHIPREV_ID_5700_ALTIMA) { if ((tp->phy_id & PHY_ID_MASK) == PHY_ID_BCM5411 && tp->pci_chip_rev_id == CHIPREV_ID_5700_ALTIMA) { tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. NETDEV WATCHDOG: eth0: transmit timed out tg3: eth0: transmit timed out, resetting tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] MAILBOX_INTERRUPT_0 = 0x0, tp->irq_sync = 0 tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2 tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth0: Link is down. -- NETDEV WATCHDOG: eth0: transmit timed out tg3: eth0: transmit timed out, resetting tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000008] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] MAILBOX_INTERRUPT_0 = 0x0, tp->irq_sync = 0 tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2 tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth0: Link is down. if ((tp->phy_id & PHY_ID_MASK) == PHY_ID_BCM5411 && tp->pci_chip_rev_id == CHIPREV_ID_5700_ALTIMA) { if ((tp->phy_id & PHY_ID_MASK) == PHY_ID_BCM5411 && tp->pci_chip_rev_id == CHIPREV_ID_5700_ALTIMA) { tg3: eth0: Link is up at 1000 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. Again... any news? I really am interested to see if flow control has any effect on the problem. The "tg3_stop_block timed out" messages appear to be telling us the internal state machines are hung. I'm thinking flow control will help. Could you give me the output of 'ethtool -e <eth> offset 0x94 length 4'? If you were to put a print message in tg3_reset_task() to tell us when it is called, do you see it happen near these transmit timeouts? Matt were you suggesting that the customer enable flow control or ensure it is disabled? From your past comments, I'm guessing you would like to see it enabled. Actually, I did mean to turn flow control off. It isn't enough to just turn off the autonegotiation field though. You have to turn off rx and tx flow control too. Otherwise, the driver interprets the settings to mean the administrator wanted to force flow control on. Thanks, Matt! Marcus, if you could test with flow-control completely disabled that would be great. Matt is correct that we really need to split this out into two separate bugs. For now, I'd like to find out how things are progressing with Marcus. In comment #46 Matt suggesting disabling flow control completely. Since flow control is a parameter that can be autonegotiated between the host and the switch, you will need to disable it in all three spots, so your output from 'ethtool -a' now looks like this: # ethtool -a eth2 Pause parameters for eth2: Autonegotiate: off RX: off TX: off Would you be willing to try that, Marcus? I created bug 468420 to address the 5704 issues, so I'm going to make all those private and we should just focus on 5700 in this bug. Marcus, I'm trying to get these RHEL4 bugzillas completed since we are doing another update soon. I realize this has been around for a while, but I wonder if you have had a chance to try some of the flow-control changes from comment #58 and how your systems are now doing. Thanks! (In reply to comment #60) > Marcus, I'm trying to get these RHEL4 bugzillas completed since we are doing > another update soon. I realize this has been around for a while, but I wonder > if you have had a chance to try some of the flow-control changes from comment > #58 and how your systems are now doing. Thanks! Guys, I can't reproduce this anymore, since we changed related servers. I've tried to install again but without real usage, NICs works fine. Regards Thanks, Marcus. I hate to close this issue without any resolution, but it seems like you were the only person who could reproduce this issue and with your servers now being out of production, I'm not sure there is much we can do. Please re-open this bug if you are able to reproduce this or if you or anyone else are experiencing problems. Hello. I can reproduce this bug on similar hardware (in this case a PE 2550) using kernel 2.6.39. I've been able to consistently reproduce this error; it happens every time my network administrator feels like resetting the switch that the machine is connected to, and while there is some substantial network load. I have also been able to reproduce this on kernels 2.6.32 and 2.6.38 (those are the only other two I've tested). I would be happy to produce any additional information. root@server1:~# dmesg | tail -n 44 [ 3538.016036] ------------[ cut here ]------------ [ 3538.021040] WARNING: at /build/buildd-linux-2.6_2.6.39-3-i386-0YkQQW/linux-2.6-2.6.39/debian/build/source_i386_none/net/sched/sch_generic.c:256 dev_watchdog+0xc9/0x15d() [ 3538.031611] Hardware name: PowerEdge 2550 [ 3538.037070] NETDEV WATCHDOG: eth1 (tg3): transmit queue 0 timed out [ 3538.042569] Modules linked in: decnet loop snd_pcm snd_timer snd soundcore snd_page_alloc evdev pcspkr i2c_piix4 i2c_core psmouse serio_raw dcdbas shpchp pci_hotplug parport_pc parport processor thermal_sys button ext4 mbcache jbd2 crc16 sr_mod cdrom ata_generic sg sd_mod crc_t10dif pata_serverworks libata ohci_hcd aacraid ehci_hcd floppy tg3 usbcore scsi_mod e100 libphy mii [last unloaded: scsi_wait_scan] [ 3538.073246] Pid: 0, comm: swapper Not tainted 2.6.39-2-686-pae #1 [ 3538.079758] Call Trace: [ 3538.086161] [<c1036b45>] ? warn_slowpath_common+0x6a/0x7b [ 3538.092660] [<c1225fc4>] ? dev_watchdog+0xc9/0x15d [ 3538.099123] [<c1036bbc>] ? warn_slowpath_fmt+0x28/0x2c [ 3538.105595] [<c1225fc4>] ? dev_watchdog+0xc9/0x15d [ 3538.112130] [<c102ba83>] ? get_nohz_timer_target+0x3f/0x5c [ 3538.118680] [<c1041b1a>] ? __mod_timer+0x10c/0x116 [ 3538.125188] [<c103bc50>] ? irq_enter+0x49/0x49 [ 3538.131626] [<c1041be5>] ? mod_timer+0x67/0x6c [ 3538.138048] [<c103bc50>] ? irq_enter+0x49/0x49 [ 3538.144364] [<c1041485>] ? run_timer_softirq+0x167/0x20b [ 3538.150653] [<c1225efb>] ? netif_tx_lock+0x4f/0x4f [ 3538.156888] [<c103bc50>] ? irq_enter+0x49/0x49 [ 3538.163064] [<c103bceb>] ? __do_softirq+0x9b/0x14e [ 3538.169106] [<c103bc50>] ? irq_enter+0x49/0x49 [ 3538.175219] <IRQ> [<c103bb4b>] ? irq_exit+0x2f/0x79 [ 3538.181309] [<c101bc1b>] ? smp_apic_timer_interrupt+0x6b/0x75 [ 3538.187449] [<c12b2711>] ? apic_timer_interrupt+0x31/0x38 [ 3538.193526] [<c1021c68>] ? native_safe_halt+0x2/0x3 [ 3538.199534] [<c100dc94>] ? default_idle+0x50/0x87 [ 3538.205460] [<c1007f9f>] ? cpu_idle+0x95/0xb2 [ 3538.211416] [<c14207d4>] ? start_kernel+0x337/0x33c [ 3538.217465] ---[ end trace f74ed1aa79d1afe1 ]--- [ 3538.223529] tg3 0000:01:08.0: eth1: transmit timed out, resetting [ 3538.229698] tg3 0000:01:08.0: eth1: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000008] [ 3538.236001] tg3 0000:01:08.0: eth1: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000] [ 3538.343602] tg3 0000:01:08.0: tg3_stop_block timed out, ofs=1800 enable_bit=2 [ 3538.449609] tg3 0000:01:08.0: tg3_stop_block timed out, ofs=c00 enable_bit=2 [ 3538.555402] tg3 0000:01:08.0: tg3_stop_block timed out, ofs=4800 enable_bit=2 [ 3538.692079] tg3 0000:01:08.0: eth1: Link is down [ 3542.708988] tg3 0000:01:08.0: eth1: Link is up at 1000 Mbps, full duplex [ 3542.714535] tg3 0000:01:08.0: eth1: Flow control is off for TX and off for RX [ 6729.236040] tg3 0000:01:08.0: BAR 0: set to [mem 0xfeb00000-0xfeb0ffff 64bit] (PCI address [0xfeb00000-0xfeb0ffff]) [ 6729.462135] ADDRCONF(NETDEV_UP): eth1: link is not ready [ 6732.460994] tg3 0000:01:08.0: eth1: Link is up at 1000 Mbps, full duplex [ 6732.466846] tg3 0000:01:08.0: eth1: Flow control is off for TX and off for RX [ 6732.473256] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready [ 6742.968016] eth1: no IPv6 routers present root@server1:~# lspci -vvv 00:00.0 Host bridge: Broadcom CNB20HE Host Bridge (rev 23) Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:00.1 Host bridge: Broadcom CNB20HE Host Bridge (rev 01) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx- Latency: 32, Cache Line Size: 32 bytes 00:00.2 Host bridge: Broadcom CNB20HE Host Bridge (rev 01) Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx- 00:00.3 Host bridge: Broadcom CNB20HE Host Bridge (rev 01) Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx- 00:0e.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA controller]) Subsystem: Dell PowerEdge 2550 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 (2000ns min), Cache Line Size: 32 bytes Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=16M] Region 1: I/O ports at ec00 [size=256] Region 2: Memory at fe101000 (32-bit, non-prefetchable) [size=4K] [virtual] Expansion ROM at 80000000 [disabled] [size=128K] Capabilities: [5c] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- 00:0f.0 ISA bridge: Broadcom OSB4 South Bridge (rev 50) Subsystem: Broadcom OSB4 South Bridge Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Kernel driver in use: piix4_smbus 00:0f.1 IDE interface: Broadcom OSB4 IDE Controller (prog-if 8a [Master SecP PriP]) Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64 Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8] Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) [size=1] Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8] Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) [size=1] Region 4: I/O ports at 08b0 [size=16] Kernel driver in use: pata_serverworks 00:0f.2 USB Controller: Broadcom OSB4/CSB5 OHCI USB Controller (rev 04) (prog-if 10 [OHCI]) Subsystem: Broadcom OSB4/CSB5 OHCI USB Controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 (20000ns max), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 11 Region 0: Memory at fe100000 (32-bit, non-prefetchable) [size=4K] Kernel driver in use: ohci_hcd 01:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5700 Gigabit Ethernet (rev 12) Subsystem: Dell Broadcom BCM5700 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 (16000ns min), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 17 Region 0: Memory at feb00000 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] PCI-X non-bridge device Command: DPERE- ERO+ RBC=512 OST=1 Status: Dev=ff:1f.1 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512 DMOST=1 DMCRS=8 RSCEM- 266MHz- 533MHz- Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data Unknown large resource type 00, will not decode more. Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 604ac44e9ead9ed0 Data: 8db0 Kernel driver in use: tg3 02:02.0 PCI bridge: Intel Corporation 80960RM (i960RM) Bridge (rev 01) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32, Cache Line Size: 64 bytes Bus: primary=02, secondary=03, subordinate=03, sec-latency=32 I/O behind bridge: 0000f000-00000fff Memory behind bridge: fff00000-000fffff Prefetchable memory behind bridge: fff00000-000fffff Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- 02:02.1 RAID bus controller: Dell PowerEdge Expandable RAID Controller 3/Di (rev 01) Subsystem: Dell PERC 3/DiV [Viper] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 31 Region 0: Memory at f0000000 (32-bit, prefetchable) [size=128M] Expansion ROM at fe800000 [disabled] [size=64K] Kernel driver in use: aacraid 02:04.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 08) Subsystem: Dell 10/100 Ethernet Server Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 (2000ns min, 14000ns max), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at fe900000 (32-bit, non-prefetchable) [size=4K] Region 1: I/O ports at ccc0 [size=64] Region 2: Memory at fe700000 (32-bit, non-prefetchable) [size=1M] Expansion ROM at 80100000 [disabled] [size=1M] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=2 PME- Kernel driver in use: e100 root@server1:~# cat /proc/interrupts CPU0 CPU1 0: 42 0 IO-APIC-edge timer 1: 545 512 IO-APIC-edge i8042 6: 0 3 IO-APIC-edge floppy 7: 0 0 IO-APIC-edge parport0 8: 3 0 IO-APIC-edge rtc0 9: 0 0 IO-APIC-fasteoi acpi 11: 0 0 IO-APIC-fasteoi ohci_hcd:usb1 12: 400 345 IO-APIC-edge i8042 14: 75 29 IO-APIC-edge pata_serverworks 15: 0 0 IO-APIC-edge pata_serverworks 16: 6 10 IO-APIC-fasteoi 17: 709399 709871 IO-APIC-fasteoi eth1 31: 23738 23477 IO-APIC-fasteoi aacraid NMI: 5593 5593 Non-maskable interrupts LOC: 2479111 1979429 Local timer interrupts SPU: 0 0 Spurious interrupts PMI: 5593 5593 Performance monitoring interrupts IWI: 0 0 IRQ work interrupts RES: 43696 43299 Rescheduling interrupts CAL: 14332 12977 Function call interrupts TLB: 4580 4019 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts MCE: 0 0 Machine check exceptions MCP: 32 32 Machine check polls ERR: 0 MIS: 0 I can dig deeper into my kernel logs for .38 errors. root@server1:/var/log# cat messages.3 | grep kernel | tail -n 35 Jun 27 14:40:46 server1 kernel: imklog 5.8.1, log source = /proc/kmsg started. Jun 28 02:42:05 server1 kernel: [51106.000021] ------------[ cut here ]------------ Jun 28 02:42:05 server1 kernel: [51106.005564] WARNING: at /build/buildd-linux-2.6_2.6.38-5-i386-gvX4XH/linux-2.6-2.6.38/debian/build/source_i386_none/net/sched/sch_generic.c:256 dev_watchdog+0xc9/0x15d() Jun 28 02:42:05 server1 kernel: [51106.017080] Hardware name: PowerEdge 2550 Jun 28 02:42:05 server1 kernel: [51106.022989] NETDEV WATCHDOG: eth1 (tg3): transmit queue 0 timed out Jun 28 02:42:05 server1 kernel: [51106.029031] Modules linked in: fuse btrfs zlib_deflate crc32c libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs exportfs reiserfs ext3 jbd ext2 dm_mod decnet loop snd_pcm snd_timer snd soundcore snd_page_alloc psmouse tpm_tis shpchp dcdbas evdev pcspkr parport_pc processor i2c_piix4 tpm tpm_bios serio_raw i2c_core pci_hotplug parport thermal_sys button ext4 mbcache jbd2 crc16 sr_mod cdrom ata_generic sg sd_mod crc_t10dif pata_serverworks libata aacraid ohci_hcd ehci_hcd tg3 usbcore scsi_mod libphy e100 floppy mii nls_base [last unloaded: scsi_wait_scan] Jun 28 02:42:05 server1 kernel: [51106.068981] Pid: 0, comm: kworker/0:0 Not tainted 2.6.38-2-686 #1 Jun 28 02:42:05 server1 kernel: [51106.075786] Call Trace: Jun 28 02:42:05 server1 kernel: [51106.082625] [<c102fa29>] ? warn_slowpath_common+0x6a/0x7b Jun 28 02:42:05 server1 kernel: [51106.089485] [<c12080c3>] ? dev_watchdog+0xc9/0x15d Jun 28 02:42:05 server1 kernel: [51106.096251] [<c102faa0>] ? warn_slowpath_fmt+0x28/0x2c Jun 28 02:42:05 server1 kernel: [51106.102908] [<c12080c3>] ? dev_watchdog+0xc9/0x15d Jun 28 02:42:05 server1 kernel: [51106.109462] [<c10349b1>] ? __do_softirq+0x0/0x14f Jun 28 02:42:05 server1 kernel: [51106.116085] [<c1041068>] ? __queue_work+0x2a9/0x2c3 Jun 28 02:42:05 server1 kernel: [51106.122758] [<c10349b1>] ? __do_softirq+0x0/0x14f Jun 28 02:42:05 server1 kernel: [51106.129506] [<c1039b9d>] ? run_timer_softirq+0x167/0x20b Jun 28 02:42:05 server1 kernel: [51106.136217] [<c1207ffa>] ? dev_watchdog+0x0/0x15d Jun 28 02:42:05 server1 kernel: [51106.142906] [<c10349b1>] ? __do_softirq+0x0/0x14f Jun 28 02:42:05 server1 kernel: [51106.149909] [<c1034a4c>] ? __do_softirq+0x9b/0x14f Jun 28 02:42:05 server1 kernel: [51106.156533] [<c10349b1>] ? __do_softirq+0x0/0x14f Jun 28 02:42:05 server1 kernel: [51106.163165] <IRQ> [<c1034935>] ? irq_exit+0x26/0x59 Jun 28 02:42:05 server1 kernel: [51106.169828] [<c1015e30>] ? smp_apic_timer_interrupt+0x6b/0x75 Jun 28 02:42:05 server1 kernel: [51106.176521] [<c1290651>] ? apic_timer_interrupt+0x31/0x38 Jun 28 02:42:05 server1 kernel: [51106.183122] [<c101bb94>] ? native_safe_halt+0x2/0x3 Jun 28 02:42:05 server1 kernel: [51106.189536] [<c10085c9>] ? default_idle+0x50/0x87 Jun 28 02:42:05 server1 kernel: [51106.195768] [<c1002201>] ? cpu_idle+0x95/0xb0 Jun 28 02:42:05 server1 kernel: [51106.201959] [<c128c207>] ? start_secondary+0x1b8/0x1bd Jun 28 02:42:05 server1 kernel: [51106.208157] ---[ end trace d2c7eb5d333ff5a9 ]--- Jun 28 02:42:05 server1 kernel: [51106.577902] tg3 0000:01:08.0: eth1: Link is down Jun 28 02:42:09 server1 kernel: [51109.997016] tg3 0000:01:08.0: eth1: Link is up at 1000 Mbps, full duplex Jun 28 02:42:09 server1 kernel: [51110.002972] tg3 0000:01:08.0: eth1: Flow control is off for TX and off for RX Jun 28 16:51:07 server1 kernel: [102048.068257] ices2[32063]: segfault at 0 ip 0805079f sp bfc17180 error 4 in ices2[8048000+13000] Jun 29 15:20:13 server1 kernel: [182994.015710] ip_tables: (C) 2000-2006 Netfilter Core Team Jul 2 14:32:40 server1 kernel: Kernel logging (proc) stopped. Jul 2 14:32:40 server1 kernel: imklog 5.8.2, log source = /proc/kmsg started. This bug seems to be still shown in RHEL 6.4 x64. 2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) Hardware name: ProLiant DL360p Gen8 NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl autofs4 sunrpc cpufreq_ondemand freq_table pcc_cpufreq bonding 8021q garp stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ext3 jbd dm_round_robin hpilo hpwdt tg3 microcode sg ses enclosure serio_raw iTCO_wdt iTCO_vendor_support ioatdma dca power_meter shpchp ext4 mbcache jbd2 sd_mod crc_t10dif hpsa qla2xxx scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 0, comm: swapper Not tainted 2.6.32-358.14.1.el6.x86_64 #1 Call Trace: <IRQ> [<ffffffff8106e307>] ? warn_slowpath_common+0x87/0xc0 [<ffffffff8106e3f6>] ? warn_slowpath_fmt+0x46/0x50 [<ffffffff81467d3d>] ? dev_watchdog+0x26d/0x280 [<ffffffff81090dad>] ? insert_work+0x6d/0xb0 [<ffffffff81012bf9>] ? sched_clock+0x9/0x10 [<ffffffff81467ad0>] ? dev_watchdog+0x0/0x280 [<ffffffff81081857>] ? run_timer_softirq+0x197/0x340 [<ffffffff810a7f80>] ? tick_sched_timer+0x0/0xc0 [<ffffffff8102ea2d>] ? lapic_next_event+0x1d/0x30 [<ffffffff81076fd1>] ? __do_softirq+0xc1/0x1e0 [<ffffffff8109b79b>] ? hrtimer_interrupt+0x14b/0x260 [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30 [<ffffffff8100de05>] ? do_softirq+0x65/0xa0 [<ffffffff81076db5>] ? irq_exit+0x85/0x90 [<ffffffff81517420>] ? smp_apic_timer_interrupt+0x70/0x9b [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20 <EOI> [<ffffffff812d3a9e>] ? intel_idle+0xde/0x170 [<ffffffff812d3a81>] ? intel_idle+0xc1/0x170 [<ffffffff814153a7>] ? cpuidle_idle_call+0xa7/0x140 [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110 [<ffffffff814f35ca>] ? rest_init+0x7a/0x80 [<ffffffff81c27f7b>] ? start_kernel+0x424/0x430 [<ffffffff81c2733a>] ? x86_64_start_reservations+0x125/0x129 [<ffffffff81c27438>] ? x86_64_start_kernel+0xfa/0x109 ---[ end trace e5884b70674dc1df ]--- I had to ifdown/ifup the interface via ILOM. Is there a workable workaround? Our switches are autoneg 1000MBps/FD and thereis no way I can get this changed. This is a different (newer) model though. 3:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) Patrick, this was a RHEL4 bug, so adding a report about RHEL6 isn't going to get much attention. Please open a new bug on the product 'Red Hat Enterprise Linux 6' with the information you have. You can assign this bug directly to ivecera. |