Bug 16748

Summary: CPQ Proliant 1600 freezes after ~7 days
Product: [Retired] Red Hat Linux Reporter: James Ringland <jdr>
Component: kernelAssignee: Alan Cox <alan>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 6.2CC: ekanter
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2002-12-15 01:20:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description James Ringland 2000-08-22 20:21:01 UTC
CPQ1600R will completely freeze after several days of operation. System 
temp. always within norms, Internal Rack Temp 70 deg. maintained. Two 
seperate systems have been tried, both 1600R's but of two different 
varieties the new 1" UltraWide model and the older 1.6 Ultra2 model. The 
system console will be black and register as "Powered" on the KVM, but the 
machine must be restarted by a power cycle. RT Clock also does not update 
during these outages, no errors are logged in the EISA logs.

Comment 1 Michael K. Johnson 2000-08-22 20:30:17 UTC
Are you running the 2.2.16-3 errata kernel?

Comment 2 James Ringland 2000-09-11 14:02:22 UTC
Yes I am. I finally was able to trap the error using the playback on the Remote 
Insight server management board. It reads as follows: 

       Uhhuh. NMI received for unknown reason 20
       Dazed and confused, but trying to continue
       Do you have a strange power saving mode enabled?


Comment 3 Alan Cox 2000-09-15 18:01:00 UTC
 Uhhuh. NMI received for unknown reason 20

NMI is normally issued for things like ECC memory errors or bus errors. 20 is a
compaq specific error code so I don't know what it means. It certainly looks to
me like the hardware waved the white flag and surrendered rather than a Linux
crash.

If you can find out from compaq what NMI error code 20 is on these boxes I'd
love to know and can then try and help further.


Comment 4 James Ringland 2000-09-15 18:15:35 UTC
Thanks. I have placed a call to technical support. Also, I had another 
<SARCASM>Graceful Shutdown</SARCASM> with an NMI 21 this morning.

Comment 5 Eugene Kanter 2005-05-30 05:44:01 UTC
(In reply to comment #4)
> Thanks. I have placed a call to technical support. Also, I had another 
> <SARCASM>Graceful Shutdown</SARCASM> with an NMI 21 this morning.

James,

just wondering if you figured out what Coompaq NMI errors mean. I have seen
similar issues.

Comment 6 James Ringland 2005-05-31 20:36:18 UTC
That system is now out of service, but IIRC, the NMI's stopped for no apparent
reason. We never found a definitive cause for the problem.