From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461; .NET CLR 1.0.3705; .NET CLR 1.1.4322) Description of problem: We have a large disk array 1.7 TB (hardware raid 5) with only one partition on it, formatted with EXT3FS. See the attached log file for kernel error which appeared. After this error appeared, the filesystem was no more accessible - every process trying to access it froze. Version-Release number of selected component (if applicable): kernel-smp-2.4.20-18.7.i686.rpm How reproducible: Couldn't Reproduce Steps to Reproduce: Don't know how to reproduce. Additional info:
Please provide the kernel log showing the assertion failures.
Created attachment 93045 [details] kernel log showing the assertion failure I am sorry, I obviously forgot to attach the most important part of the bug report. Your bugzilla's "new bug wizard" scared me a lot. :-)
This is a known problem. There's a debug check in ext3 which triggers when we're using a page which is marked uninitialised. Unfortunately, that same condition can be triggered by IO failures. Please check your logs --- I suspect you'll find IO failures prior to the panic. A recent patch to upstream kernels relaxes this check in ext3 to be a warning, not a panic, so we won't do the impolite kernel oops in this case, and future releases will use that new behaviour.
Yes, you are right - I found a lot of messages like "kernel: cciss: cmd c2de6078 has CHECK CONDITION, sense key = 0x3" prior to the panic, which I did not notice before; cciss is the name of the driver for the raid controller (Compaq Smart Array). The errors appeared in the log 22 hours prior to the ext3 panic. I am going to check how to get the hardware behaving properly. BTW - is that relaxed ext3 patch already included in 2.4.20-19.7?
*** This bug has been marked as a duplicate of 86035 ***
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.