Red Hat Bugzilla – Bug 77694
RH8 on Compaq DL386 G2 hangs after one Hour
Last modified: 2007-04-18 12:48:22 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR
Description of problem:
RH 8 is installed on an Compaq DL386 G2 with 2 Processors and 2 Gig. Memory,
also Raid 1 18GB with the onborard Controller 5i.
The installation works fine, and there are no Problems. We also installed all
Bugfixes. But after one hour the Systems hangs up. NOthing works. Only the
power Down solves the Problem. We tryed it on an other Modell, same Type, but
we have the same situation.
We don`t think it is a hardware problem, because we try it with other Linux
distributions, and we don`t have this problem.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.install System, after that install all Fixes
2.Configure Apache 2, PHP and mySQL
3.after one Hour system hangs
Actual Results: No mouse no konections nothing works. The flash on the HDD
lights permanetnly. May Be an Controller Problem ?
I have one Compaq DL360 G2 with RH8 and two Compaq DL360 G2s with RH7.3 that
are having the same problem. They have 2 processors, two Hard Disks, and Smart
Array 5i Controller. They don't always lock up after one hour, but they do
lock up once a day, with the only recourse being a hard reboot. I have run the
Compaq Smart Start for Servers diagnostics, and the Compaq Server diagnostics,
and found no errors in the diagnostics reports. I have the installed the
latest P26 BIOS on the machines, and that did not solve the problem. I have
just installed the latest Smart Array 5i Controllers, and will let you know if
that makes a difference. I got the machines in mid August.
since I don't know what exact hardware a DL386 is, can you post the output of the
The lspci command on my Compaq DL360 G2 with RH8 yields:
00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (rev 23)
00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01)
00:00.2 Host bridge: ServerWorks: Unknown device 0006 (rev 01)
00:00.3 Host bridge: ServerWorks: Unknown device 0006 (rev 01)
00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:05.0 System peripheral: Compaq Computer Corporation: Unknown device b203
00:05.2 System peripheral: Compaq Computer Corporation: Unknown device b204
00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 92)
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 92)
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 USB Controller (rev 05)
00:0f.3 Host bridge: ServerWorks: Unknown device 0230
01:04.0 RAID bus controller: Compaq Computer Corporation Smart Array 5i/532
01:05.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit
Ethernet (rev 15)
01:06.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit
Ethernet (rev 15)
The lsmod command on my Compaq DL360 G2 with RH8 yields:
Module Size Used by Not tainted
autofs 13700 0 (autoclean) (unused)
pcmcia_core 59904 0
tg3 52072 1
iptable_filter 2412 0 (autoclean) (unused)
ip_tables 15640 1 [iptable_filter]
mousedev 5688 0 (unused)
keybdev 2976 0 (unused)
hid 22404 0 (unused)
input 6240 0 [mousedev keybdev hid]
usb-ohci 22088 0 (unused)
usbcore 80512 1 [hid usb-ohci]
ext3 73024 1
jbd 56752 1 [ext3]
cciss 41732 2
sd_mod 13552 0 (unused)
scsi_mod 110408 1 [cciss sd_mod]
I may have additional info, seems that ASR on my Compaq Proliant DL360 seems to
confuse kernel somehow. I'm running Redhat Linux 8.0 with latest kernel patches
and also latest cpqhealth driver from Compaq. That server reboots for no reason
couple times a day and had no error messages on logs. I started logging kernel
messages over serial console and ctached this:
casm: NMI Handler has been called on processor 0!
WARNING: casm: NMI - Automatic Server Recovery timer expiration - Hour 11 - 11/2
WARNING: casm: Attempting to shutdown due to ASR timer expiration!
Uhhuh. NMI received for unknown reason 31.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
I will disable ASR in the evening. I don't know yet, if that ASR feature is the
firstname.lastname@example.org: you are using binary only kernel modules. THat means you
run an unsupported configuration
no, i'm not using binary only kernel modules :) The source is available and the
cpqhealth package recompiles kernel modules during installation.
More info, it was a faulty motherboard. Interrupt controller was broken. It may
very well be a typical problem in DL360 G2 servers. Everything is ok, after that
motherboard was changed. You should try testing the server with diagnostics
downloadable form compaq website.