CPQ1600R will completely freeze after several days of operation. System temp. always within norms, Internal Rack Temp 70 deg. maintained. Two seperate systems have been tried, both 1600R's but of two different varieties the new 1" UltraWide model and the older 1.6 Ultra2 model. The system console will be black and register as "Powered" on the KVM, but the machine must be restarted by a power cycle. RT Clock also does not update during these outages, no errors are logged in the EISA logs.
Are you running the 2.2.16-3 errata kernel?
Yes I am. I finally was able to trap the error using the playback on the Remote Insight server management board. It reads as follows: Uhhuh. NMI received for unknown reason 20 Dazed and confused, but trying to continue Do you have a strange power saving mode enabled?
Uhhuh. NMI received for unknown reason 20 NMI is normally issued for things like ECC memory errors or bus errors. 20 is a compaq specific error code so I don't know what it means. It certainly looks to me like the hardware waved the white flag and surrendered rather than a Linux crash. If you can find out from compaq what NMI error code 20 is on these boxes I'd love to know and can then try and help further.
Thanks. I have placed a call to technical support. Also, I had another <SARCASM>Graceful Shutdown</SARCASM> with an NMI 21 this morning.
(In reply to comment #4) > Thanks. I have placed a call to technical support. Also, I had another > <SARCASM>Graceful Shutdown</SARCASM> with an NMI 21 this morning. James, just wondering if you figured out what Coompaq NMI errors mean. I have seen similar issues.
That system is now out of service, but IIRC, the NMI's stopped for no apparent reason. We never found a definitive cause for the problem.