Bug 116704
Summary: | (NET TG3) Broadcom Ethernet Adapter Locks Up | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Mark Donner <mark> |
Component: | kernel | Assignee: | David Miller <davem> |
Status: | CLOSED WONTFIX | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 3.0 | CC: | bammi, davem, jgarzik, john, mvoelker, petrides, tofranci |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-10-19 19:29:52 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Mark Donner
2004-02-24 15:31:24 UTC
Most likely a kernel driver problem, reassigning to correct component. Read ya, Phil We have seen this same issue with tg3 on a Tyan s2882 motherboard, which also uses the Broadcom BCM5704. We saw the box completely freeze up (Magic Sysreq wasn't able to do anything) at apparently random intervals for no apparent reason. No error messages on screen or /var/log/messages or dmesg or anywhere else that we could see. Usually we were able to crash the box within an hour or two by SSHing from the aflicted machine to another server and running some commands that produce a lot of output. The network load was fairly low on the BCM5704 though, so I'm not sure why this would trigger the freeze up. After switching from the tg3 driver to the bcm5700 driver from Broadcom the crashes went away. We understand, of course, that Red Hat does not support the bcm5700 driver and would rather put more effort into fixing the tg3 driver than have to support a vendor-built driver. But until this bug is fixed it's the only way we can keep the server alive. I believe we've now run into this issue as well, in addition to the autonegotiation problems with the tg3 drivers / Cisco switches mentioned in other reports. Wondering if the bcm5700 drivers are still the best solution, and if anyone has had problems with them under RHEL3 U2... Tymm--we tried U2 and still had the same problems, so we're still using the bcm5700 drivers for now. I have a system with a Tyan K8W connected to a 100Mb network. The system hangs requiring a power cycle to recover with heavy network traffic. I've tried the tg3 and Broadcom's drivers and various Linux kernel versions. It hangs repeatably under very heavy network load - for instance running bonnie on an NFS mounted directory. Otherwise the system seems ok. However, the system also hangs under high network load when running the Windows 2003 64 bit beta. I'm planning to RMA the motherboard. We're running quite a few HP DL140-based servers that all have the Broadcom Gigabit chipset in them. These servers are running as PPTP Concentrators using PoPToP. All of them were mysteriously locking up randomly and I couldn't figure it out... someone mentioned a problem with Broadcom NIC's and recommended either replacing the NIC's with Intel's (not a great option in 1U servers) or disabling apic. Our problem seems to go away when we boot with nosmp noapic ... this is using the bcm5700 driver which I grabbed from HP's website. Using 2.4.21-20ELsmp (w/ MPPE patch from PPP 2.4.3) Is this still happening with the tg3 driver, in RHEL3? Is this still a problem with RHEL4 or has the bug been fixed for RHELv3. We are just installing machines with the Tyan S2882 motherboard and two Broadcomm gigabit adapters. The tg3 driver appears to work in normal use but when we load the ethernet port the machine hangs without any error messages and we wondered if we were seeing this bug? We're experiencing identical problems here with HP DL-350's and RHEL4 (latest kernel/patches). Any update on this issue? I am seeing the same problem on a Shuttle box, with the same Broadcom adapter onboard with Fedora Core 5 Test2 (i never hit it in Fedora Core 4, fc4 had the other problem with broadcom, when the buffer was allocated above the 1Gig address range). I am seeing a similar problem on FC4 kernel 2.6.17-1.2142_FC4smp. This machine has a Gigabyte motherboard with on-board Broadcom 57xx GigE NIC. I only get the problem when using BitTorrent. I have tried different BT clients and they all exhibit the same behaviour. The eth0 port just locks up every few minutes then resets itself. The machine is cabled into a Cisco switch. I have changed switchport and cables, tried 100baseT - all make zero difference. Broadcom sent me an ISO of a boot CD with FreeDOS and some diags which I ran and everything passed ok. They have suggested maybe an interrupt conflict with PCI RAID card. Sample log: Oct 2 10:02:15 localhost kernel: NETDEV WATCHDOG: eth0: transmit timed out Oct 2 10:02:15 localhost kernel: tg3: eth0: transmit timed out, resetting Oct 2 10:02:15 localhost kernel: tg3: tg3_stop_block timed out, ofs=2c00 enable _bit=2 Oct 2 10:02:15 localhost kernel: tg3: tg3_stop_block timed out, ofs=1400 enable _bit=2 Oct 2 10:02:15 localhost kernel: tg3: tg3_stop_block timed out, ofs=c00 enable_ bit=2 Oct 2 10:02:15 localhost kernel: tg3: eth0: Link is down. Oct 2 10:02:19 localhost kernel: tg3: eth0: Link is up at 1000 Mbps, full duple x. Oct 2 10:02:19 localhost kernel: tg3: eth0: Flow control is off for TX and off for RX. This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you. |