Description of problem: After upgrade to mdadm-2.6.7.1-1.fc9 the following command: mdadm --assemble /dev/md0 /dev/sda4 /dev/sdb4 fails with Device or resource busy. a downgrade to mdadm-2.6.4-4.fc9 restores ability to assemble the volume (after a reboot). Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
kernel: 2.6.27.5-41.fc9.i686 mdadm: 2.6.7.1-1.fc9.i386 # mdadm --assemble /dev/md0 /dev/sda4 /dev/sdb4 mdadm: cannot open device /dev/sda4: Device or resource busy mdadm: /dev/sda4 has no superblock - assembly aborted downgrade to mdadm 2.6.4-4.fc9.i386 # mdadm --assemble /dev/md0 /dev/sda4 /dev/sdb4 # mount /dev/md0 /shared/hltdir1/disk1 # df -k /shared/hltdir1/disk1 Filesystem 1K-blocks Used Available Use% Mounted on /dev/md0 532768096 151304328 354400680 30% /shared/hltdir1/disk1
Can you get me the output of /proc/mdstat when it's failing?
Also, the output of mdadm -E /dev/sda4 would help too.
with mdadm-2.6.7.1-1.fc9.i386 # dmesg | grep md | grep -v bmdma md: bind<sda4> md: md0 stopped. # cat /proc/mdstat Personalities : md_d0 : inactive sda4[0](S) 270630912 blocks unused devices: <none> # mdadm -E /dev/sda4 | more /dev/sda4: Magic : a92b4efc Version : 0.90.00 UUID : 7fac9e67:54bef915:6385bb5a:d6009b91 Creation Time : Mon Jul 7 13:45:56 2008 Raid Level : raid0 Used Dev Size : 0 Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Mon Jul 7 13:45:56 2008 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : 480306d9 - correct Events : 1 Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 4 0 active sync /dev/sda4 0 0 8 4 0 active sync /dev/sda4 1 1 8 20 1 active sync /dev/sdb4 with mdadm-2.6.4-4.fc9.i386: # dmesg | grep md | grep -v bmdma md: md0 stopped. md: bind<sdb4> md: bind<sda4> md: raid0 personality registered for level 0 md0: setting max_sectors to 128, segment boundary to 32767 raid0 : md_size is 541261824 blocks. EXT3 FS on md0, internal journal # cat /proc/mdstat Personalities : [raid0] md0 : active raid0 sda4[0] sdb4[1] 541261824 blocks 64k chunks unused devices: <none> *** Note the dmesg output differences for a possible clue. working setup says md0 stopped then two binds happen. failing setup does a bind and then a stop. after that we can't read the disks properly.
A little more information, can you get me the output of ls -l /dev/md* after the failed attempt to assemble. Also, if you create an /etc/mdadm.conf file with the single line: DEVICE partitions and then run mdadm -Eb /dev/sda4 >> /etc/mdadm.conf and then edit the array line to have the right device name, does mdadm -As /dev/md0 work with the new mdadm? Another test, without the mdadm.conf file, does mdadm -A --auto=md /dev/md0 /dev/sda4 /dev/sdb4 work?
# ls -l md* brw-rw---- 1 root disk 9, 0 2008-11-20 20:45 md0 brw-rw---- 1 root disk 254, 0 2008-11-20 20:44 md_d0 lrwxrwxrwx 1 root root 7 2008-11-20 20:44 md_d0p1 -> md/d0p1 lrwxrwxrwx 1 root root 7 2008-11-20 20:44 md_d0p2 -> md/d0p2 lrwxrwxrwx 1 root root 7 2008-11-20 20:44 md_d0p3 -> md/d0p3 lrwxrwxrwx 1 root root 7 2008-11-20 20:44 md_d0p4 -> md/d0p4 md: total 0 brw------- 1 root root 254, 0 2008-11-20 20:44 d0 brw------- 1 root root 254, 1 2008-11-20 20:44 d0p1 brw------- 1 root root 254, 2 2008-11-20 20:44 d0p2 brw------- 1 root root 254, 3 2008-11-20 20:44 d0p3 brw------- 1 root root 254, 4 2008-11-20 20:44 d0p4 /etc/mdadm.conf: DEVICE /dev/sda4 /dev/sdb4 ARRAY /dev/md0 level=raid0 num-devices=2 UUID=7fac9e67:54bef915:6385bb5a:d6009b9 1 # mdadm -As /dev/md0 mdadm: /dev/md0 assembled from 1 drive - not enough to start the array. # rm /etc/mdadm.conf # mdadm -A --auto=md /dev/md0 /dev/sda4 /dev/sdb4 mdadm: cannot open device /dev/sda4: Device or resource busy mdadm: /dev/sda4 has no superblock - assembly aborted
Cat you also post the output of mdadm -E /dev/sdb4 so I can compare it with sda4's superblock? Also, I can't reproduce here, everything works for me with or without an entry in mdadm.conf. The distinctive point though is that I don't get either of the two errors we've seen on your system: # mdadm -As /dev/md0 mdadm: /dev/md0 assembled from 1 drive - not enough to start the array. This one shows that even with the drive identified in mdadm.conf (which increases mdadm's ability to assemble devices due to greater confidence it has the right array members), we still didn't think /dev/sdb4 was a valid member of the array and ignored it for some reason. # mdadm -A --auto=md /dev/md0 /dev/sda4 /dev/sdb4 mdadm: cannot open device /dev/sda4: Device or resource busy mdadm: /dev/sda4 has no superblock - assembly aborted And this one looks like we can't actually read /dev/sda4 on this attempt to open. However, that may just because it was locked from an earlier failed attempt to assemble the device. When you post the contents of /proc/mdstat I saw this: # cat /proc/mdstat Personalities : md_d0 : inactive sda4[0](S) 270630912 blocks Which looks to me like sda4 is being held exclusively by a *different* raid device than /dev/md0, /dev/md_d0 instead (a partitionable device). So, what I think is happening, is because your array isn't listed in your mdadm.conf at reboot, and because you don't use it for any device required at initrd time, the array isn't being started by either of the calls to mdadm that exist on other systems (one in the initrd, and another in rc.sysinit that starts any devices listed in the mdadm.conf that aren't started by the initrd). As a result, the udev rule, from 70-mdadm.rules, is kicking in when the system processes the partitions on /dev/sda and /dev/sdb. Since the udev rule doesn't know if you want a partitioned device or not, it's creating a partitioned raid array and then attempting incremental assembly of the array. This is how md_d0 is getting created. Once sda4 is in md_d0 waiting for the rest of the array members to be found, it's locked out from being used as part of /dev/md0. Now, this would all be fine if the incremental assembly finished, but for some reason, /dev/sdb4 is not being considered a valid array member, so the assembly fails. Once that happens, things are locked out. If you call mdadm -S /dev/md_d0 though, the hand assembly on the command line should work again without needing a reboot. So, the real thing to figure out is why your array won't assemble automatically, and that should solve your problem. However, you'll be better off in the end if you have the array listed in mdadm.conf so mdadm doesn't accidentally choose the wrong type of array to create (partitioned or not partitioned) during the udev triggered assembly.
I prevented mdadm from being run by udev and the mdadm --assemble is successful with the new mdadm. So my problem is definetly the grab of sda4 by udev - it won't let go of it if it can't figure out what to do. I think the older mdadm WAS letting go of sda4 if it couldn't decide what to do and then my manual approach would work. After creating an mdadm.conf like so: # more /etc/mdadm.conf DEVICE /dev/sda4 /dev/sdb4 ARRAY /dev/md0 devices=/dev/sda4,/dev/sdb4 the udev grab works because it has a better hint, it assembles md0 at boot, and I am able to mount the device OK. So I think a better description of the bug is that something in the new mdadm isn't letting go of sda4 when it is confused, whereas the old mdadm did let go. I think that is still a problem worth correcting. # mdadm -E /dev/sdb4 /dev/sdb4: Magic : a92b4efc Version : 0.90.00 UUID : 7fac9e67:54bef915:6385bb5a:d6009b91 Creation Time : Mon Jul 7 13:45:56 2008 Raid Level : raid0 Used Dev Size : 0 Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Mon Jul 7 13:45:56 2008 State : active Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : 480306eb - correct Events : 1 Chunk Size : 64K Number Major Minor RaidDevice State this 1 8 20 1 active sync /dev/sdb4 0 0 8 4 0 active sync /dev/sda4 1 1 8 20 1 active sync /dev/sdb4
This message is a reminder that Fedora 9 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 9. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '9'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 9's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 9 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.