Bug 77694 - RH8 on Compaq DL386 G2 hangs after one Hour
RH8 on Compaq DL386 G2 hangs after one Hour
Status: CLOSED WORKSFORME
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
8.0
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-11-12 02:59 EST by Need Real Name
Modified: 2007-04-18 12:48 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2002-12-05 14:48:37 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Need Real Name 2002-11-12 02:59:05 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 
1.0.3705)

Description of problem:
RH 8 is installed on an Compaq DL386 G2 with 2 Processors and 2 Gig. Memory, 
also Raid 1 18GB with the onborard Controller 5i.
The installation works fine, and there are no Problems. We also installed all 
Bugfixes. But after one hour the Systems hangs up. NOthing works. Only the 
power Down solves the Problem. We tryed it on an other Modell, same Type, but 
we have the same situation.
We don`t think it is a hardware problem, because we try it with other Linux 
distributions, and we don`t have this problem.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.install System, after that install all Fixes
2.Configure Apache 2, PHP and mySQL
3.after one Hour system hangs
	

Actual Results:  No mouse no konections nothing works. The flash on the HDD 
lights permanetnly. May Be an Controller Problem ?

Additional info:
Comment 1 Need Real Name 2002-11-12 11:57:18 EST
I have one Compaq DL360 G2 with RH8 and two Compaq DL360 G2s with RH7.3 that 
are having the same problem.  They have 2 processors, two Hard Disks, and Smart 
Array 5i Controller.  They don't always lock up after one hour, but they do 
lock up once a day, with the only recourse being a hard reboot.  I have run the 
Compaq Smart Start for Servers diagnostics, and the Compaq Server diagnostics, 
and found no errors in the diagnostics reports.  I have the installed the 
latest P26 BIOS on the machines, and that did not solve the problem.  I have 
just installed the latest Smart Array 5i Controllers, and will let you know if 
that makes a difference.  I got the machines in mid August.
Comment 2 Arjan van de Ven 2002-11-12 12:00:05 EST
since I don't know what exact hardware a DL386 is, can you post the output of the
lspci
and 
lsmod
commands ?
Comment 3 Need Real Name 2002-11-12 12:10:19 EST
The lspci command on my Compaq DL360 G2 with RH8 yields:

00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (rev 23)
00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01)
00:00.2 Host bridge: ServerWorks: Unknown device 0006 (rev 01)
00:00.3 Host bridge: ServerWorks: Unknown device 0006 (rev 01)
00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:05.0 System peripheral: Compaq Computer Corporation: Unknown device b203 
(rev 01)
00:05.2 System peripheral: Compaq Computer Corporation: Unknown device b204 
(rev 01)
00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 92)
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 92)
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 USB Controller (rev 05)
00:0f.3 Host bridge: ServerWorks: Unknown device 0230
01:04.0 RAID bus controller: Compaq Computer Corporation Smart Array 5i/532 
(rev 01)
01:05.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit 
Ethernet (rev 15)
01:06.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit 
Ethernet (rev 15)


The lsmod command on my Compaq DL360 G2 with RH8 yields:

Module                  Size  Used by    Not tainted
autofs                 13700   0  (autoclean) (unused)
pcmcia_core            59904   0
tg3                    52072   1
iptable_filter          2412   0  (autoclean) (unused)
ip_tables              15640   1  [iptable_filter]
mousedev                5688   0  (unused)
keybdev                 2976   0  (unused)
hid                    22404   0  (unused)
input                   6240   0  [mousedev keybdev hid]
usb-ohci               22088   0  (unused)
usbcore                80512   1  [hid usb-ohci]
ext3                   73024   1
jbd                    56752   1  [ext3]
cciss                  41732   2
sd_mod                 13552   0  (unused)
scsi_mod              110408   1  [cciss sd_mod]
Comment 4 david 2002-11-26 08:00:09 EST
I may have additional info, seems that ASR on my Compaq Proliant DL360 seems to
confuse kernel somehow. I'm running Redhat Linux 8.0 with latest kernel patches
and also latest cpqhealth driver from Compaq. That server reboots for no reason
couple times a day and had no error messages on logs. I started logging kernel
messages over serial console and ctached this:

casm:  NMI Handler has been called on processor 0!
WARNING: casm: NMI - Automatic Server Recovery timer expiration - Hour 11 - 11/2
6/2002

WARNING: casm: Attempting to shutdown due to ASR timer expiration!

Uhhuh. NMI received for unknown reason 31.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?

I will disable ASR in the evening. I don't know yet, if that ASR feature is the
culprit here.
Comment 5 Arjan van de Ven 2002-11-26 08:28:59 EST
 david@apollos.ttu.ee: you are using binary only kernel modules. THat means you
run an unsupported configuration
Comment 6 david 2002-11-26 09:49:42 EST
no, i'm not using binary only kernel modules :) The source is available and the
cpqhealth package recompiles kernel modules during installation.
Comment 7 david 2002-12-05 14:48:31 EST
More info, it was a faulty motherboard. Interrupt controller was broken. It may
very well be a typical problem in DL360 G2 servers. Everything is ok, after that
motherboard was changed. You should try testing the server with diagnostics
downloadable form compaq website.

Note You need to log in before you can comment on or make changes to this bug.