Bug 171354 - Data/FS corruption caused by FS activity during RAID6 resync
Data/FS corruption caused by FS activity during RAID6 resync
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Alasdair Kergon
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-10-20 20:15 EDT by Joshua Baker-LePain
Modified: 2011-02-09 20:15 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-02-09 20:15:20 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Log snippet showing FS errors (hostname removed). (3.76 KB, text/plain)
2005-10-20 20:20 EDT, Joshua Baker-LePain
no flags Details

  None (edit)
Description Joshua Baker-LePain 2005-10-20 20:15:58 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050811 CentOS/1.0.6-1.4.1.centos3 Firefox/1.0.6

Description of problem:
I'm testing a 15 drive RAID6 using the system tester at <http://people.redhat.com/dledford/memtest.html>.  Running this test on a clean array worked fine.  I then failed a drive in the array with the script running.  The resync started as it should.  Soon, though, the script started returning errors, and then I saw several EXT3-fs errors in the logs (I'll put examples in an attachment).

Version-Release number of selected component (if applicable):
kernel-2.6.9-22.ELsmp

How reproducible:
Always

Steps to Reproduce:
1. Create and format RAID6 array:
mdadm -C /dev/md0 -c 128 -l 6 -n 15 -x 1 /dev/sd[a-e]2 /dev/sd[f-p]1
mke2fs -b 4096 -j -m 0 -R stride=32 -T largefile /dev/md0
2. Mount array and run memtest.sh found at the web address above.
3. Fail a drive:
mdadm /dev/md0 -f /dev/sdi1
  

Actual Results:  Data, and eventually FS, corruption.

Expected Results:  The array should rebuild without corrupting anything.

Additional info:

I've tried this on 2 similar servers.  Both have 2 3ware 7500-8 controllers in JBOD mode.  One server has a Supermicro P4DPE-G2 motherboard, 4GB RAM, dual 2.2GHz  Xeons, and 16 Maxtor 160GB drives.  The other has a Supermicro X5DPE-G2 board, 2GB RAM, dual 2.4GHz Xeons, and 16 IBM 180GB drives.
Comment 1 Joshua Baker-LePain 2005-10-20 20:20:03 EDT
Created attachment 120227 [details]
Log snippet showing FS errors (hostname removed).
Comment 2 Philippe Troin 2007-05-23 21:10:52 EDT
I'm seeing the same problem with RHEL4.4 as well.
Actually, the RAID fs is corrupted right out of install.

This patch might fix the problem:
http://linux.bkbits.net:8080/linux-2.6/?PAGE=gnupatch&REV=1.1938.340.65

Phil.
Comment 3 Philippe Troin 2007-05-25 21:26:40 EDT
Fixed in 4.5.
Thanks.
Phil.
Comment 4 Philippe Troin 2007-05-30 16:49:11 EDT
Actually, no, I am still witnessing corruption with 2.6.9-55.ELsmp.
RAID-6 is a no-go.
Phil.
Comment 5 Alasdair Kergon 2011-02-09 20:15:20 EST
This got routed to the wrong place, I'm afraid - it's not my area.  If you still need support please use https://www.redhat.com/support/ referencing this bugzilla to obtain more personal attention.

Note You need to log in before you can comment on or make changes to this bug.