From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3) Gecko/20040924 Description of problem: This problem only occurs on the 2.4 (AS3) kernel and has also been tested on the 2.6 (AS4) kernel with no problems. NOTE - RAID 0s and 1s work just fine with no problems. Essentially what happens is that when you create a RAID 5 array using the mdadm utility a lot of bugs exist. First of all...let me explain my setup. I have a RAID-head populated by 12 SATA drives connect directly (direct connect) to the host through a QLogic HBA (qla2340). Using the RAID-head utilities I create 12 NRAID arrays (or a single RAID 5 array and partition it to 4+ separate partitions) and map all the Logical Drives/LUNs created from the arrays to the HBA on the host. When I modprobe to re-initialize the driver, the OS picks up all the /dev/sd(x) with no problems. I then quickly use the sfdisk utility to partition at least 4 of the LUNs as a Linux RAID Partition. No problem. Using the mdadm function I write the following: mdadm --create /dev/md0 --level=5 --raid-devices=3 --spare-devices=1 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 The array done through mdadm begins (with 3 devices and one hotspare) and can be monitored through /proc/mdstat. Whether you wait for it to initialize or not....the ALMOST same results are obtained. md0 is formatted and mounted. I then proceed to run 8 processes of IO to the mounted devices. As routine practice I must fail (physically pulling it out of the RAID-head enclosure or removing the LUN mapping) the array in order to see it basic functionality. If the array is NOT initialized 100% - all 8 processes of IO get killed and the hotspare does not take over. The array has failed and is not being rebuilt. If the array IS initialized 100% - all 8 processes of IO are still active and the /proc/mdstat function does not update properly: [root@rochester root]# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid5] [multipath] read_ahead 1024 sectors Event: 41 md0 : active raid5 sdc1[2] sdd1[3] sdb1[1] sda1[0] 76196096 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] unused devices: <none> ALSO when checking the details through the mdadm utility I also noticed: [root@rochester root]# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.00 Creation Time : Tue Nov 8 08:41:33 2005 Raid Level : raid5 Array Size : 76196096 (72.67 GiB 78.02 GB) Device Size : 38098048 (36.33 GiB 39.01 GB) Raid Devices : 3 Total Devices : 5 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Tue Nov 8 09:45:05 2005 State : dirty, no-errors Active Devices : 3 Working Devices : 4 Failed Devices : 1 Spare Devices : 1 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 2 8 33 2 active sync /dev/sdc1 3 8 49 3 spare /dev/sdd1 UUID : ddd00a22:b39a6436:16400bc5:8be85b74 Events : 0.2 I GET SOME unexpected results....even though I created the array with 4 devices (raid-devices=3 and spare-devices=1) but the mdadm details shows a psuedo device equaling to a total of 5 and automatically fails the 5th. STILL no instance of known failed device and hotspare taking over. The OS is not picking it up. Version-Release number of selected component (if applicable): 2.4.21-27 How reproducible: Always Steps to Reproduce: 1. create (4+) NRAID arrays or a single RAID 5 array and partition it to 4+ separate partitions. Map the LUNs to the host's HBA and modprobe for device changes. 2. Partion, RAID, format and mount the the newly created array using the mdadm utility. 3. Fail a drive. Actual Results: Read description above. Expected Results: Hotspare should have taken over and RAID 5 array should have been rebuilding. Additional info:
I forgot to add that this is an issue with mdadm version v.1.5.0 - 22 Jan 2004; as for version v.1.0.1 - 20 May 2002 and v.1.6.0 - 4 June 2004...these work just fine. I have also noticed that on SOME instances it would flag two spare devices when I only set (ex.) 3 raid devices and 1 spare. It would create the array (RAID 5) with just 2 disks and mark the other two as spares (obviously this is physically and logically incorrect).
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.