Bug 102792
Summary: | Kernel 2.4.20-20.7 dies under very heavy network traffic | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | abs01 |
Component: | kernel | Assignee: | Dave Jones <davej> |
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.3 | CC: | gary.mansell, pfrields, riel, snielsen |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-09-30 15:41:27 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
abs01
2003-08-21 05:14:51 UTC
I see this bug too - I have a Dell PE2650 running Redhat Linux 7.3. The machine is an NFS server that has been running for at least 6 months on kernel 2.4.18-27.7.xsmp quite happily. I tested the new 2.4.20-20.7smp kernel on my test machine for a week with no problems (admittedly I did not hammer it with NFS traffic) and presumed it OK. When I then up2dated my production server to this kernel, the machine crashed within one hour. I then rebooted and it crashed after about 4 hours. Since rebooting into the old 2.4.18-27.7.xsmp kernel, the machine has been fine again. The symptoms of the crash were that the server just hung/locked up - it would not respond to pings. There were no messages in the log files. Is anything being done about this ??? I too have been experiencing crashes under heavy network/disk io load. I am (was) using kernel 2.4.20-20.7smp on a dell 2450 with a 10/100 ethernet card adaptor. When I get a crash the box is completely dead to the world and no log messages related to the crash are written to syslog or the console. Is there a way to increase verbosity so something will get logged? Anyway, To work around the crashing issue I now boot off the kernel that was stable for me before the udpate (2.4.18-10smp). I am not using any binary only modules. I am taint free. If you need more info from me please let me know. Redhat isn't going to do anything about it. After all they can't wait to drop support for 7.3 December 31st and try and push you to Enterprise or something else. I'm moving away from Redhat and compiling my own from sources at kernel.org. I can't wait to get away from all the rpm crap anyway. tg3 is not the problem, it's *aacraid* driver. If you think it's the *aacraid* driver then I guess you can download the most recent one at: http://domsch.com/linux/ There's quite a bit of discussion here and to be honest with you, I'm not sure what to do or how to fix my current box. Currently I'm running kernel 2.4.20-24.7smp without a crash, but cpu is eaten by kscand which is the #1 cpu usage on the box running almost constantly at 5% on a dual XEON(TM) CPU 1.80GHz. If anyone has any suggestions on calming this beast down without breaking the box, let me know. Xose, why do you say its the aacraid driver ? Are there settings where I can have the kernel print something out when things crash ? Currently I am not getting anything written to the console or to syslog. Thanks, Steve abs01: kscand bug is another thing, and that bug is already open in bugzilla. dave jones is going to release a new kernel errata very soon: try the *beta* release http://people.redhat.com/davej/rhl-errata/2.4.20-27.7/ Steve Nielsen: a colleague of mine has some dell-2650. She updated to latest BIOS, BackPlane firmware, RAID firmware and RHL-kernel(2.4.20-20.x) and the systems are stable. Other people in the dell mailing list has problems with aacraid driver, but tg3 driver use to be stable. If you have any doubt try bcm-5700 instead of tg3 (danger!! unsupported by Red Hat): http://www.broadcom.com/drivers/downloaddrivers.php But I am sure that the problem is aacraid. to catch the bug -> /usr/src/linux-2.4/Documentation/nmi_watchdog.txt I tried using the beta errata kernel from Dave Jones and my system crashed again after a couple of days. Same symptoms (no syslog, no text printed to the console). I am using rh7.3, a dell 2450, 10/100 ethernet cards, no hardware raid only software raid. Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |