Bug 77694
Summary: | RH8 on Compaq DL386 G2 hangs after one Hour | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Need Real Name <andreas.steiner> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED WORKSFORME | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 8.0 | CC: | dan, david |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2002-12-05 19:48:37 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Need Real Name
2002-11-12 07:59:05 UTC
I have one Compaq DL360 G2 with RH8 and two Compaq DL360 G2s with RH7.3 that are having the same problem. They have 2 processors, two Hard Disks, and Smart Array 5i Controller. They don't always lock up after one hour, but they do lock up once a day, with the only recourse being a hard reboot. I have run the Compaq Smart Start for Servers diagnostics, and the Compaq Server diagnostics, and found no errors in the diagnostics reports. I have the installed the latest P26 BIOS on the machines, and that did not solve the problem. I have just installed the latest Smart Array 5i Controllers, and will let you know if that makes a difference. I got the machines in mid August. since I don't know what exact hardware a DL386 is, can you post the output of the lspci and lsmod commands ? The lspci command on my Compaq DL360 G2 with RH8 yields: 00:00.0 Host bridge: ServerWorks CNB20HE Host Bridge (rev 23) 00:00.1 Host bridge: ServerWorks CNB20HE Host Bridge (rev 01) 00:00.2 Host bridge: ServerWorks: Unknown device 0006 (rev 01) 00:00.3 Host bridge: ServerWorks: Unknown device 0006 (rev 01) 00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 00:05.0 System peripheral: Compaq Computer Corporation: Unknown device b203 (rev 01) 00:05.2 System peripheral: Compaq Computer Corporation: Unknown device b204 (rev 01) 00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 92) 00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 92) 00:0f.2 USB Controller: ServerWorks OSB4/CSB5 USB Controller (rev 05) 00:0f.3 Host bridge: ServerWorks: Unknown device 0230 01:04.0 RAID bus controller: Compaq Computer Corporation Smart Array 5i/532 (rev 01) 01:05.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet (rev 15) 01:06.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 Gigabit Ethernet (rev 15) The lsmod command on my Compaq DL360 G2 with RH8 yields: Module Size Used by Not tainted autofs 13700 0 (autoclean) (unused) pcmcia_core 59904 0 tg3 52072 1 iptable_filter 2412 0 (autoclean) (unused) ip_tables 15640 1 [iptable_filter] mousedev 5688 0 (unused) keybdev 2976 0 (unused) hid 22404 0 (unused) input 6240 0 [mousedev keybdev hid] usb-ohci 22088 0 (unused) usbcore 80512 1 [hid usb-ohci] ext3 73024 1 jbd 56752 1 [ext3] cciss 41732 2 sd_mod 13552 0 (unused) scsi_mod 110408 1 [cciss sd_mod] I may have additional info, seems that ASR on my Compaq Proliant DL360 seems to confuse kernel somehow. I'm running Redhat Linux 8.0 with latest kernel patches and also latest cpqhealth driver from Compaq. That server reboots for no reason couple times a day and had no error messages on logs. I started logging kernel messages over serial console and ctached this: casm: NMI Handler has been called on processor 0! WARNING: casm: NMI - Automatic Server Recovery timer expiration - Hour 11 - 11/2 6/2002 WARNING: casm: Attempting to shutdown due to ASR timer expiration! Uhhuh. NMI received for unknown reason 31. Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? I will disable ASR in the evening. I don't know yet, if that ASR feature is the culprit here. david.ee: you are using binary only kernel modules. THat means you run an unsupported configuration no, i'm not using binary only kernel modules :) The source is available and the cpqhealth package recompiles kernel modules during installation. More info, it was a faulty motherboard. Interrupt controller was broken. It may very well be a typical problem in DL360 G2 servers. Everything is ok, after that motherboard was changed. You should try testing the server with diagnostics downloadable form compaq website. |