Red Hat Bugzilla – Bug 82664
random lockups on Dell PowerEdge 2650
Last modified: 2007-04-18 12:50:18 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2) Gecko/20021127
Description of problem:
Kernel 2.4.18-17+ causes the machine to lock after a few hours of running. I'm
running a Dell PowerEdge 2650 with dual P4 Xeon 2.2gHz processors, 4gb of RAM,
PERC configured as RAID-5, and dual tg3 ethernet controllers. When I upgraded
the kernel to the latest set of errata packages, the problem started happening.
I tried kernel-bigmem as well as the smp and non-smp kernels of the last few
updates (kernel-2.4.18-17.7.x, kernel-2.4.18-18.7.x, and kernel-2.4.18-19.7.x).
The problem doesn't seem to be SMP-related. Since the processors are HT-enabled,
cat /proc/cpuinfo shows 4 of them. I am also using CIPE, in case this is the
problem. Nothing is shown in syslog. There is normal operation, then the
messages of the next boot.
this is the output of lsmod:
Module Size Used by Not tainted
ip_conntrack_irc 3840 0 (unused)
ip_conntrack_ftp 5056 0 (unused)
ip_nat_irc 3680 0 (unused)
ip_nat_ftp 4320 0 (unused)
cipcb 33600 1 (autoclean)
tg3 44128 2
ipt_REJECT 4096 2 (autoclean)
ipt_MASQUERADE 2464 1 (autoclean)
iptable_nat 21012 3 (autoclean) [ip_nat_irc ip_nat_ftp ipt_MASQUERADE]
ip_conntrack 21164 3 (autoclean) [ip_conntrack_irc ip_conntrack_ftp
ip_nat_irc ip_nat_ftp ipt_MASQUERADE iptable_nat]
iptable_filter 2752 1 (autoclean)
ip_tables 13984 6 [ipt_REJECT ipt_MASQUERADE iptable_nat
usb-ohci 20768 0 (unused)
usbcore 73152 1 [usb-ohci]
ext3 67136 4
jbd 49400 4 [ext3]
aacraid 27380 6
sd_mod 12864 12
scsi_mod 108576 2 [aacraid sd_mod]
I will be happy to provide any additional information. This is a production
machine, so the testing I can do is limited, but I will try to set up a spare
similarly-configured machine as well.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Install RH 7.3 and all updates.
2. boot into the updated kernel
3. configure services
Actual Results: The machine locks after an average of 4-8 hours
Expected Results: Normal, crash-free operation
shown in the description
CORRECTION: this occurs on the original RH 7.3 kernel as well. I cannot tell
what is causing it.
Running kernel 2.4.18-10smp works fine without lockups. Later kernels seem to
lock the machine.
A friend of mine has the same problem: dell 2650+rhl 7.3+all_erratas+all
firmware(bios,backplane, PERC3) to the last level. And the system hangs without
reason. hardware tests like memtest86 and dell_tests are passed without problems!
My advise : to use the 2.4.9-12.e kernel of AS-2.1, while it is solved in 7.3.
Because there are a lot of erros with previous 2.4.18 kernels:
or to try it with a beta/rawhide(danger!!!) kernel
I'm experiencing the same issue with a Dell PE-2650 w/6GB. Using e1000 only,
broadcoms are disabled. all kernels up to and including 2.4.18-27 bigmem have
locked anywhere from 1 day to 3 weeks. I will try the noapic option tonight
(3/27/03) and cross my fingers while we await a bug fix.
It's 4/10/03 and under moderate load while I was at the console the box hung
while running under the noapic option. No dump, no log, just a complete freeze
while I was in Gnome setting up a new virtual machine using VMware GSX.
I'm dumping Redhat 7.3 for VMware's ESX server ASAP.
Dell has a server install CD which creates a utility partition on your
drive. There are actually rpm's there which when installed on your
system, greatly enhances the stability of your system. Without those
rpm's I used to have 2650's that would lock up every four hours but
with them, that's not an issue.
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases,
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/