From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2) Gecko/20021127 Description of problem: Kernel 2.4.18-17+ causes the machine to lock after a few hours of running. I'm running a Dell PowerEdge 2650 with dual P4 Xeon 2.2gHz processors, 4gb of RAM, PERC configured as RAID-5, and dual tg3 ethernet controllers. When I upgraded the kernel to the latest set of errata packages, the problem started happening. I tried kernel-bigmem as well as the smp and non-smp kernels of the last few updates (kernel-2.4.18-17.7.x, kernel-2.4.18-18.7.x, and kernel-2.4.18-19.7.x). The problem doesn't seem to be SMP-related. Since the processors are HT-enabled, cat /proc/cpuinfo shows 4 of them. I am also using CIPE, in case this is the problem. Nothing is shown in syslog. There is normal operation, then the messages of the next boot. this is the output of lsmod: Module Size Used by Not tainted ip_conntrack_irc 3840 0 (unused) ip_conntrack_ftp 5056 0 (unused) ip_nat_irc 3680 0 (unused) ip_nat_ftp 4320 0 (unused) cipcb 33600 1 (autoclean) tg3 44128 2 ipt_REJECT 4096 2 (autoclean) ipt_MASQUERADE 2464 1 (autoclean) iptable_nat 21012 3 (autoclean) [ip_nat_irc ip_nat_ftp ipt_MASQUERADE] ip_conntrack 21164 3 (autoclean) [ip_conntrack_irc ip_conntrack_ftp ip_nat_irc ip_nat_ftp ipt_MASQUERADE iptable_nat] iptable_filter 2752 1 (autoclean) ip_tables 13984 6 [ipt_REJECT ipt_MASQUERADE iptable_nat iptable_filter] usb-ohci 20768 0 (unused) usbcore 73152 1 [usb-ohci] ext3 67136 4 jbd 49400 4 [ext3] aacraid 27380 6 sd_mod 12864 12 scsi_mod 108576 2 [aacraid sd_mod] I will be happy to provide any additional information. This is a production machine, so the testing I can do is limited, but I will try to set up a spare similarly-configured machine as well. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Install RH 7.3 and all updates. 2. boot into the updated kernel 3. configure services Actual Results: The machine locks after an average of 4-8 hours Expected Results: Normal, crash-free operation Additional info: shown in the description
CORRECTION: this occurs on the original RH 7.3 kernel as well. I cannot tell what is causing it. -TF
Running kernel 2.4.18-10smp works fine without lockups. Later kernels seem to lock the machine.
A friend of mine has the same problem: dell 2650+rhl 7.3+all_erratas+all firmware(bios,backplane, PERC3) to the last level. And the system hangs without reason. hardware tests like memtest86 and dell_tests are passed without problems! My advise : to use the 2.4.9-12.e kernel of AS-2.1, while it is solved in 7.3. Because there are a lot of erros with previous 2.4.18 kernels: https://rhn.redhat.com/errata/RHSA-2002-206.html https://rhn.redhat.com/errata/RHSA-2002-262.html https://rhn.redhat.com/errata/RHBA-2002-292.html https://rhn.redhat.com/errata/RHSA-2003-025.html or to try it with a beta/rawhide(danger!!!) kernel
I'm experiencing the same issue with a Dell PE-2650 w/6GB. Using e1000 only, broadcoms are disabled. all kernels up to and including 2.4.18-27 bigmem have locked anywhere from 1 day to 3 weeks. I will try the noapic option tonight (3/27/03) and cross my fingers while we await a bug fix.
It's 4/10/03 and under moderate load while I was at the console the box hung while running under the noapic option. No dump, no log, just a complete freeze while I was in Gnome setting up a new virtual machine using VMware GSX. I'm dumping Redhat 7.3 for VMware's ESX server ASAP.
Dell has a server install CD which creates a utility partition on your drive. There are actually rpm's there which when installed on your system, greatly enhances the stability of your system. Without those rpm's I used to have 2650's that would lock up every four hours but with them, that's not an issue.
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/