Red Hat Bugzilla – Bug 171354
Data/FS corruption caused by FS activity during RAID6 resync
Last modified: 2011-02-09 20:15:20 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050811 CentOS/1.0.6-1.4.1.centos3 Firefox/1.0.6
Description of problem:
I'm testing a 15 drive RAID6 using the system tester at <http://people.redhat.com/dledford/memtest.html>. Running this test on a clean array worked fine. I then failed a drive in the array with the script running. The resync started as it should. Soon, though, the script started returning errors, and then I saw several EXT3-fs errors in the logs (I'll put examples in an attachment).
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create and format RAID6 array:
mdadm -C /dev/md0 -c 128 -l 6 -n 15 -x 1 /dev/sd[a-e]2 /dev/sd[f-p]1
mke2fs -b 4096 -j -m 0 -R stride=32 -T largefile /dev/md0
2. Mount array and run memtest.sh found at the web address above.
3. Fail a drive:
mdadm /dev/md0 -f /dev/sdi1
Actual Results: Data, and eventually FS, corruption.
Expected Results: The array should rebuild without corrupting anything.
I've tried this on 2 similar servers. Both have 2 3ware 7500-8 controllers in JBOD mode. One server has a Supermicro P4DPE-G2 motherboard, 4GB RAM, dual 2.2GHz Xeons, and 16 Maxtor 160GB drives. The other has a Supermicro X5DPE-G2 board, 2GB RAM, dual 2.4GHz Xeons, and 16 IBM 180GB drives.
Created attachment 120227 [details]
Log snippet showing FS errors (hostname removed).
I'm seeing the same problem with RHEL4.4 as well.
Actually, the RAID fs is corrupted right out of install.
This patch might fix the problem:
Fixed in 4.5.
Actually, no, I am still witnessing corruption with 2.6.9-55.ELsmp.
RAID-6 is a no-go.
This got routed to the wrong place, I'm afraid - it's not my area. If you still need support please use https://www.redhat.com/support/ referencing this bugzilla to obtain more personal attention.