From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003 Description of problem: On random occasions (no pattern has been established yet), the kernel first oopses, services will stop responding, until finally the machine panics and needs to be rebooted. The box had similar problems two weeks ago; suspecting hardware problems (we have 2.4.18-24.7 and -27.7 running on other boxen without problems I changed the whole box with a spare. Yesterday the machine hung twice, today again. relevant oopses are attached, as is the boot log from the last boot, for hardware information. I was running -24 on this box as I rely on /proc/cmdline (see #88047) and do not have any users on the box so ptrace was not an issue for me. As of today I gave -27 a try, albeit I do not expect this to help. Please notice in the latest dump that now the lmsensors modules were loaded - I supervised temperature and fan rpm to rule out temperature related problems. Values were all fine. Also, RAM is ECC, so a failure is - albeit always possible - unlikely. Version-Release number of selected component (if applicable): 2.4.18-24.7.x How reproducible: Sometimes Steps to Reproduce: 1.Reboot the box to get it working 2.Wait 3. Actual Results: Sometimes, kernel oopses, as detailed above. Automatic service monitor pages me in the middle of the night. Expected Results: Flawless performance without oops; good night of uninterrupted sleep. Additional info:
Created attachment 91185 [details] Several kernel oopses from /var/log/messages
As a side note: I wanted to look into the 2.4.18-24.7.x source rpm again to notice that all the mirrors have already erased it. Is there some publically accessible ftp site which has all the old updates ? (google is futile - it finds all the mirrors but the files have gone by now)
Closing due to proven hardware defect. The mainboard in question finally died due to faulty electrolytic capacitors. The other one had RAM problems. I guess PC hardware currently is in such a sorry state of affairs that failover for any application needs to become standard....