Red Hat Bugzilla – Bug 172711
mdadm RAID 5 bug(s) when failing scsi device HDDs
Last modified: 2007-11-30 17:07:08 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3) Gecko/20040924
Description of problem:
This problem only occurs on the 2.4 (AS3) kernel and has also been tested on the 2.6 (AS4) kernel with no problems.
NOTE - RAID 0s and 1s work just fine with no problems.
Essentially what happens is that when you create a RAID 5 array using the mdadm utility a lot of bugs exist. First of all...let me explain my setup. I have a RAID-head populated by 12 SATA drives connect directly (direct connect) to the host through a QLogic HBA (qla2340). Using the RAID-head utilities I create 12 NRAID arrays (or a single RAID 5 array and partition it to 4+ separate partitions) and map all the Logical Drives/LUNs created from the arrays to the HBA on the host. When I modprobe to re-initialize the driver, the OS picks up all the /dev/sd(x) with no problems. I then quickly use the sfdisk utility to partition at least 4 of the LUNs as a Linux RAID Partition. No problem. Using the mdadm function I write the following:
mdadm --create /dev/md0 --level=5 --raid-devices=3 --spare-devices=1 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
The array done through mdadm begins (with 3 devices and one hotspare) and can be monitored through /proc/mdstat.
Whether you wait for it to initialize or not....the ALMOST same results are obtained. md0 is formatted and mounted. I then proceed to run 8 processes of IO to the mounted devices. As routine practice I must fail (physically pulling it out of the RAID-head enclosure or removing the LUN mapping) the array in order to see it basic functionality.
If the array is NOT initialized 100% - all 8 processes of IO get killed and the hotspare does not take over. The array has failed and is not being rebuilt.
If the array IS initialized 100% - all 8 processes of IO are still active and the /proc/mdstat function does not update properly:
[root@rochester root]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [multipath]
read_ahead 1024 sectors
md0 : active raid5 sdc1 sdd1 sdb1 sda1
76196096 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
ALSO when checking the details through the mdadm utility I also noticed:
[root@rochester root]# mdadm --detail /dev/md0
Version : 00.90.00
Creation Time : Tue Nov 8 08:41:33 2005
Raid Level : raid5
Array Size : 76196096 (72.67 GiB 78.02 GB)
Device Size : 38098048 (36.33 GiB 39.01 GB)
Raid Devices : 3
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Tue Nov 8 09:45:05 2005
State : dirty, no-errors
Active Devices : 3
Working Devices : 4
Failed Devices : 1
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 8 49 3 spare /dev/sdd1
UUID : ddd00a22:b39a6436:16400bc5:8be85b74
Events : 0.2
I GET SOME unexpected results....even though I created the array with 4 devices (raid-devices=3 and spare-devices=1) but the mdadm details shows a psuedo device equaling to a total of 5 and automatically fails the 5th.
STILL no instance of known failed device and hotspare taking over. The OS is not picking it up.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. create (4+) NRAID arrays or a single RAID 5 array and partition it to 4+ separate partitions. Map the LUNs to the host's HBA and modprobe for device changes.
2. Partion, RAID, format and mount the the newly created array using the mdadm utility.
3. Fail a drive.
Actual Results: Read description above.
Expected Results: Hotspare should have taken over and RAID 5 array should have been rebuilding.
I forgot to add that this is an issue with mdadm version v.1.5.0 - 22 Jan 2004;
as for version v.1.0.1 - 20 May 2002 and v.1.6.0 - 4 June 2004...these work just
I have also noticed that on SOME instances it would flag two spare devices when
I only set (ex.) 3 raid devices and 1 spare. It would create the array (RAID 5)
with just 2 disks and mark the other two as spares (obviously this is physically
and logically incorrect).
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
For more information of the RHEL errata support policy, please visit:
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.