Bug 159499

Summary: MD RAID5 fails on sync w/ Adaptec AAR-1210SA
Product: [Fedora] Fedora Reporter: Mike Perry <mikepery>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4CC: davej, pfrields, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-05-04 13:49:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mike Perry 2005-06-03 01:47:07 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4

Description of problem:
When syncing my RAID5 array after a failed disk, I get all sorts of SATA errors
of the following form in dmesg:


ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x01 { AddrMarkNotFound }
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x01 { AddrMarkNotFound }
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x01 { AddrMarkNotFound }
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x01 { AddrMarkNotFound }                                           ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x01 { AddrMarkNotFound }
SCSI error : <1 0 0 0> return code = 0x8000002
sdb: Current: sense key: Medium Error
    Additional sense: Address mark not found for data field
end_request: I/O error, dev sdb, sector 469511743
raid5: Disk failure on sdb1, disabling device. Operation continuing on 2 devices
md: md1: sync done.                                                             ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x01 { AddrMarkNotFound }
SCSI error : <1 0 0 0> return code = 0x8000002                                  sdb: Current: sense key: Medium Error
    Additional sense: Address mark not found for data field
end_request: I/O error, dev sdb, sector 469511751
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x01 { AddrMarkNotFound }
SCSI error : <1 0 0 0> return code = 0x8000002
sdb: Current: sense key: Medium Error
    Additional sense: Address mark not found for data field
end_request: I/O error, dev sdb, sector 469511759
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: error=0x01 { AddrMarkNotFound }
SCSI error : <1 0 0 0> return code = 0x8000002
sdb: Current: sense key: Medium Error
    Additional sense: Address mark not found for data field

and so on until sector 469512279, at which point MD declares the device failed.

My RAID5 array consisted of the following disks:
sda1 - 293049666 1k blocks
sdb1 - 293049666 1k blocks
hda2 - same
hdc2 - same (attempting to resync)

The sync of hdc2 gets to about the 70% mark, and then starts to give those errors until it gives up at sector 469512279.

I run cryptofs on top of the md device, and then ext3 on top of that.

My IDE controller is a VIA VT82C586A (or some variety that lspci lumps in there with it), and the SATA controller is the Adaptec AAR-1210SA running in JBOD mode.

Neither the Adaptec BIOS disk scan diagnostic nor 'badblocks -ws /dev/sdb' report any errors, so I don't think it's a disk issue. Note that I performed the badblocks check on a knoppix disc running their patched 2.6.9.



Version-Release number of selected component (if applicable):
kernel-2.6.11-1.1341_FC4

How reproducible:
Always

Steps to Reproduce:
1. Create the array as described above
2. Fail HDC, replace it
3. Allow sync to run. 

I did this several times (if I use mdadm -f, I can force sdb1 back into the array and try to sync hdc2 again), once even in single user mode with the FS mounted readonly (it wouldn't let me unmount it for some reason.. said the mountpoint was busy, but lsof didn't report any processes).

Actual Results:  ata2 errors listed above

Expected Results:  Proper sync.

Additional info:

Comment 1 Dave Jones 2005-06-27 23:28:50 UTC
Mass update of -test bugs to update version to fc4.
(Please retest on final release, and report results if you have not already done
so).

Thanks.

Comment 2 Dave Jones 2005-09-30 07:23:10 UTC
Mass update to all FC4 bugs:

An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream
kernel (2.6.13.2). As there were ~3500 changes upstream between this and the
previous kernel, it's possible your bug has been fixed already.

Please retest with this update, and update this bug if necessary.

Thanks.


Comment 3 Dave Jones 2005-11-10 21:58:52 UTC
Mass update to all FC4 bugs:

An update has been released (2.6.14-1.1637_FC4) which rebases to a new upstream
kernel (2.6.13.2). As there were ~3500 changes upstream between this and the
previous kernel, it's possible your bug has been fixed already.

Please retest with this update, and update this bug if necessary.

Thanks.



Comment 4 Dave Jones 2006-02-03 07:31:25 UTC
This is a mass-update to all currently open kernel bugs.

A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

Thank you.


Comment 5 John Thacker 2006-05-04 13:49:58 UTC
Closing per previous comment.