Red Hat Bugzilla – Bug 57592
disk corruption on 2.4.9-13smp w/adaptec SCSI
Last modified: 2007-03-26 23:50:26 EDT
Description of Problem:
I've been getting corruption on my SCSI drive from time to time and errors in
the system log since upgrading from 7.1. I have a 2-way Pentium II system
with 512 MB RAM, and Adaptec SCSI.
Version-Release number of selected component (if applicable):
randomly, I get oopses and other strange ext3 errors in my logs.
Here is the ksymoops info from the latest problem:
Attached is the latest oops decoded with ksymoops.
Created attachment 40787 [details]
The BUG() is coming from
which is somewhere deep in the journaling layer getting upset about the fact
that there have been IO failures elsewhere. That shouldn't happen, but the
failure is in an inode table, which is a bit hard to recover from if the block
goes bad after you've already started using it.
Part of the underlying problem here is the stupid block device layer, which only
has one bit of error state and which heavy-handedly marks blocks as being
non-uptodate if a write error occurs. That should be fixed for 2.5, but all
filesystems will have the problem in 2.4 that they cannot reliably tell what
blocks are actually uptodate in the presence of write errors. So the ext3
assert fail should probably be relaxed: I'll reproduce this and fix.
Just so that I can decode the trace a little more accurately, can you tell me
which kernel version this is? We have 3 different 2.4.9-13smp kernels: one each
for i586, i686 and athlon.
I don't know if this answers your problem: it's not clear from the report
whether you are wanting the SCSI IO errors or the filesystem fixed.
this is with the i686 -13smp kernel.
I hadn't had any I/O errors for several days leading up to this crash, and the
disk had been fsck'd, so I was assuming this might not be due to those
previous SCSI I/O problems. I will update the report if anything else happens
I'm seeing basically the exact same problem. IBM xSeries 350 with integrated
Adapted U160 SCSI with a secondary /data filesystem on an external RAID using
the Adaptec SCSI. Primary boot filesystems on an IBM RaidServ 4LX RAID
controller and are fine. The filesystem on the Adaptec is getting I/O errors,
hangs, load average goes to 9+, etc... Running 2.4.9-13smp RedHat kernel, 2 x
700Mhz P3 XEON procs, 1.5GB RAM. Am planning on booting back to the stock 2.4.7
kernel to see if it's more stable.
The server I described has an _exact_ twin sitting right beside it as a load-
balanced fault-tolerant redundant server and it's experiencing the exact same
problem. I'm fairly certain it's not faulty hardware.
Wendell, what sort of errors are being reported in your logs? If there are no
scsi errors being reported then this may be a new fs bug; otherwise it's likely
to be the fs getting confused by errors coming back from the scsi layer, and
that may indicate a driver or controller fault.
No reply in over a year - closing