Bug 82664 - random lockups on Dell PowerEdge 2650
Summary: random lockups on Dell PowerEdge 2650
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.3
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-01-24 17:56 UTC by Tim Fournet
Modified: 2007-04-18 16:50 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-09-30 15:40:27 UTC
Embargoed:


Attachments (Terms of Use)

Description Tim Fournet 2003-01-24 17:56:40 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2) Gecko/20021127

Description of problem:
Kernel 2.4.18-17+ causes the machine to lock after a few hours of running. I'm
running a Dell PowerEdge 2650 with dual P4 Xeon 2.2gHz processors, 4gb of RAM,
PERC configured as RAID-5, and dual tg3 ethernet controllers. When I upgraded
the kernel to the latest set of errata packages, the problem started happening.
 I tried kernel-bigmem as well as the smp and non-smp kernels of the last few
updates (kernel-2.4.18-17.7.x, kernel-2.4.18-18.7.x, and kernel-2.4.18-19.7.x).
The problem doesn't seem to be SMP-related. Since the processors are HT-enabled,
cat /proc/cpuinfo shows 4 of them.  I am also using CIPE, in case this is the
problem.  Nothing is shown in syslog. There is normal operation, then the
messages of the next boot.


this is the output of lsmod:
Module                  Size  Used by    Not tainted
ip_conntrack_irc        3840   0  (unused)
ip_conntrack_ftp        5056   0  (unused)
ip_nat_irc              3680   0  (unused)
ip_nat_ftp              4320   0  (unused)
cipcb                  33600   1  (autoclean)
tg3                    44128   2
ipt_REJECT              4096   2  (autoclean)
ipt_MASQUERADE          2464   1  (autoclean)
iptable_nat            21012   3  (autoclean) [ip_nat_irc ip_nat_ftp ipt_MASQUERADE]
ip_conntrack           21164   3  (autoclean) [ip_conntrack_irc ip_conntrack_ftp
ip_nat_irc ip_nat_ftp ipt_MASQUERADE iptable_nat]
iptable_filter          2752   1  (autoclean)
ip_tables              13984   6  [ipt_REJECT ipt_MASQUERADE iptable_nat
iptable_filter]
usb-ohci               20768   0  (unused)
usbcore                73152   1  [usb-ohci]
ext3                   67136   4
jbd                    49400   4  [ext3]
aacraid                27380   6
sd_mod                 12864  12
scsi_mod              108576   2  [aacraid sd_mod]

I will be happy to provide any additional information. This is a production
machine, so the testing I can do is limited, but I will try to set up a spare
similarly-configured machine as well.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Install RH 7.3 and all updates.
2. boot into the updated kernel
3. configure services
    

Actual Results:  The machine locks after an average of 4-8 hours

Expected Results:  Normal, crash-free operation

Additional info:

shown in the description

Comment 1 Tim Fournet 2003-01-24 19:17:00 UTC
CORRECTION: this occurs on the original RH 7.3 kernel as well.  I cannot tell
what is causing it.

-TF



Comment 2 Tim Fournet 2003-01-29 18:45:54 UTC
Running kernel 2.4.18-10smp works fine without lockups. Later kernels seem to 
lock the machine. 


Comment 3 acount closed by user 2003-03-01 06:03:06 UTC
A friend of mine has the same problem: dell 2650+rhl 7.3+all_erratas+all
firmware(bios,backplane, PERC3) to the last level. And the system hangs without
reason. hardware tests like memtest86 and dell_tests are passed without problems!

My advise : to use the 2.4.9-12.e kernel of AS-2.1, while it is solved in 7.3.
Because there are a lot of erros with  previous 2.4.18 kernels:
https://rhn.redhat.com/errata/RHSA-2002-206.html
https://rhn.redhat.com/errata/RHSA-2002-262.html
https://rhn.redhat.com/errata/RHBA-2002-292.html
https://rhn.redhat.com/errata/RHSA-2003-025.html

or to try it with a beta/rawhide(danger!!!) kernel 


Comment 4 Henry Cross 2003-03-28 00:53:03 UTC
I'm experiencing the same issue with a Dell PE-2650 w/6GB.  Using e1000 only, 
broadcoms are disabled. all kernels up to and including 2.4.18-27 bigmem have 
locked anywhere from 1 day to 3 weeks.  I will try the noapic option tonight 
(3/27/03) and cross my fingers while we await a bug fix.

Comment 5 Henry Cross 2003-04-11 01:10:50 UTC
It's 4/10/03 and under moderate load while I was at the console the box hung 
while running under the noapic option.  No dump, no log, just a complete freeze 
while I was in Gnome setting up a new virtual machine using VMware GSX.
I'm dumping Redhat 7.3 for VMware's ESX server ASAP.

Comment 6 John I Wang 2004-03-24 22:32:08 UTC
Dell has a server install CD which creates a utility partition on your
drive.   There are actually rpm's there which when installed on your
system, greatly enhances the stability of your system.   Without those
rpm's I used to have 2650's that would lock up every four hours but
with them, that's not an issue.

Comment 7 Bugzilla owner 2004-09-30 15:40:27 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/



Note You need to log in before you can comment on or make changes to this bug.