From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050623 Fedora/1.0.4-5 Firefox/1.0.4 Description of problem: lsraid and /proc/mdstat shows different things about an array. I'm not sure how reproducible that is - but it's the 3rd time i encounter this... mdstat : [root@Linux201 root]# cat /proc/mdstat Personalities : [raid1] read_ahead 1024 sectors Event: 7 md6 : active raid1 sdd3[1] sdb3[0] 19486720 blocks [2/2] [UU] md5 : active raid1 sdd2[1] sdb2[0] 11727360 blocks [2/2] [UU] md4 : active raid1 sdd1[1] sdb1[0] 39069952 blocks [2/2] [UU] md3 : active raid1 sdc4[2] sda4[0] 3911744 blocks [2/1] [U_] md2 : active raid1 sdc3[2] sda3[0] 3911744 blocks [2/1] [U_] md1 : active raid1 sdc2[2] sda2[0] 62508800 blocks [2/1] [U_] md0 : active raid1 sdc1[1] sda1[0] 24000 blocks [2/2] [UU] unused devices: <none> [root@Linux201 root]# lsraid -a /dev/md2 [dev 9, 2] /dev/md2 4B509E8E.1ED7E78D.40F6EE70.6B0D0F7A online [dev 8, 3] /dev/sda3 4B509E8E.1ED7E78D.40F6EE70.6B0D0F7A good [dev ?, ?] (unknown) 00000000.00000000.00000000.00000000 missing And the question is why doesnt /dev/md2 show the spare device? md1 shows this : [root@Linux201 root]# lsraid -a /dev/md1 [dev 9, 1] /dev/md1 096AE65D.AD8A29EE.C7BDADA3.3A2784B9 online [dev 8, 2] /dev/sda2 096AE65D.AD8A29EE.C7BDADA3.3A2784B9 good [dev ?, ?] (unknown) 00000000.00000000.00000000.00000000 missing [dev 8, 34] /dev/sdc2 096AE65D.AD8A29EE.C7BDADA3.3A2784B9 spare Which is fine. So if i want to monitor a md device, what should i trust? /proc/mdstat? lsraid? both? or none? Another things i'd like to note - in the current state - running raidstop /dev/md2 will crash the system (kernel panic). Version-Release number of selected component (if applicable): raidtools-1.00.3-7 How reproducible: Sometimes Steps to Reproduce: 1. When the array is in that state cat /proc/mdstat and lsraid a device 2. lsraid will not report the spare device although it should 3. Try to stop the 'problematic' md device and receive a panic Actual Results: /proc/mdstat and lsraid show different outputs instead of reporting the same. Expected Results: This is what i wanted to see in lsraid. [root@Linux201 root]# lsraid -a /dev/md2 [dev 9, 2] /dev/md2 4B509E8E.1ED7E78D.40F6EE70.6B0D0F7A online [dev 8, 3] /dev/sda3 4B509E8E.1ED7E78D.40F6EE70.6B0D0F7A good [dev ?, ?] (unknown) 00000000.00000000.00000000.00000000 missing [dev 8, 34] /dev/sdc3 4B509E8E.1ED7E78D.40F6EE70.6B0D0F7A spare Needless to say I didn't want to recieve that panic either... Additional info: I'm currently running on RHEL3U2 but this problem happened to me twice on RHEL3U5 as well.
First, lsraid is part of the deprecated raidtools package and likely will not be updated regardless of whether there is a bug there or not. Mdadm is the preferred tool to use, and mdadm -E --brief will probably give you what you are looking for. Now, the issue of lsraid not seeing the spare is probably because the drive hasn't started the resync process yet. Since md1 is already resyncing, the process of adding sdc3 to md2 would be delayed and therefore the disk would not be registered as an active spare yet. As to which to trust? They are both correct. The mdstat file shows the current kernel state. The kernel knows about the new spare disks, but isn't doing anything with them yet. When lsraid queries the disk array, it doesn't see the spare because the spare isn't live yet. The oops on shutting down the md2 array is another matter entirely and I believe I already know what that problem is. I believe you'll find that the oops has already been reported in bz #134736. Since I don't see a bug in this report, other than the oops which is already reported elsewhere, I'm closing this bug report out. Thank you for the report and please add yourself to the above mentioned bugzilla if you would like to be kept abreast of when the "shutting down a raid1 array while a rebuild is taking place" bug gets fixed.