Bug 159590

Summary: corrupted data using software-raid (md)
Product: [Fedora] Fedora Reporter: Arian Prins <hgaprins>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-06-05 13:47:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Arian Prins 2005-06-05 13:08:32 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; nl-NL; rv:1.7.5) Gecko/20041202 Firefox/1.0

Description of problem:
My system was a updated version of FC3. It had 3 180Gb drives (P.ATA) that I combined using software raid, level 5 (as /dev/md0). I created them at install time using the default partitioning tools. On top of that I used LVM and on top of that I had a few partitions (ext3), including root-dir (booting from a seperate harddisk that was not included in the raid-set).

After a few months of usage all the filesystems were suddenly completely corrupted. The system could not boot anymore (it couldn't find init) and when I tried mount the partitions using a rescue-CD, live-CD or a complete new install on a seperate drive I could not get /dev/md0 mounted.

I tried reinstalling everything and now at install-time when the installer is formatting the partitions it fails and reboots (after giving a message that something serious happened).

I have now reinstalled FC3 on the seperate harddisk (not part of the 3 180 Gb drives) without creating the raid-array at all. When I try to create a raid-5 set (after the install, using mdadm), the newly created /dev/md0 partition corrupts after a few hours of usage. After unmounting, it won't remount. Because I wanted to rule out the possibility of drive (or controller) failure I fdisk-ed the seperate drives and put an ext3 partition directly on each of them. I filled the 3 drives up with 1Gb files. No problem. Reading back a few of them (eg. cat < 1gbfile > /dev/null) gives no problem either.

This means I have tried the following "chains":
direct partitions on the drives: no problems
combine the drives using raid-5: corruption
combine the drives using raid-5 and then using LVM on top of that: corruption.

This problem may be related to bug-nr 152162 but I'm not sure.


Version-Release number of selected component (if applicable):
kernel-2.6.9-1.667 (but upgrades probably too)

How reproducible:
Always

Steps to Reproduce:
Scenario 1:
1. Start installation of FC3
2. Create raid-5 set using three 180Gb disks (for all three disks: partition 1: 256Mb swap, partition 2: remaining size of disk for software RAID)
3. continue installation process.
4. at formatting-time, just before the formating's finished the installer gives an error-message indicating something serious went wrong and reboots.

Scenario 2:
1. Install FC3 on a 40Gb harddrive, leave the 180 Gb disks empty (no partitions).  
2. After the systems runs, create a partition on each 180 Gb drive (type 0xfd).
3. use mdadm to create a raid-5 set of the 3 partitions
4. mount the /dev/md0 at a dir.
5. start adding random data.
6. After a few Gb's, the data corrupts the filesystem (ls displays irregularities).
7. Unmound /dev/md0
8. reboot
9. mount /dev/md0 gives an error-message

Actual Results:  see steps

Expected Results:  the installer should have finished formatting/no curruption 

Additional info:

I get emails from smartd with subject:
SMART error (CurrentPendingSector) detected on host: bio.lan
.........
Device: /dev/hdh, 11 Currently unreadable (pending) sectors

and in another mail:
Device: /dev/hdg, 2 Currently unreadable (pending) sectors

Comment 1 Arian Prins 2005-06-05 13:47:50 UTC
After more investigation it seems that the hardware was faulty after all
(filling the harddisks up with data didn't give any problems but I now dumped
all data to /dev/nu// and did get errors). Apologies.