Bug 472400 - mdadm --assemble fails after mdadm-2.6.7.1-1.fc9 update
Summary: mdadm --assemble fails after mdadm-2.6.7.1-1.fc9 update
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: mdadm
Version: 9
Hardware: i386
OS: Linux
medium
low
Target Milestone: ---
Assignee: Doug Ledford
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-11-20 17:21 UTC by Mark Hittinger
Modified: 2009-07-14 14:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-07-14 14:07:19 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Mark Hittinger 2008-11-20 17:21:27 UTC
Description of problem:

After upgrade to mdadm-2.6.7.1-1.fc9 the following command:

mdadm --assemble /dev/md0 /dev/sda4 /dev/sdb4

fails with Device or resource busy.

a downgrade to mdadm-2.6.4-4.fc9 restores ability to assemble
the volume (after a reboot).

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Mark Hittinger 2008-11-20 22:36:29 UTC
kernel: 2.6.27.5-41.fc9.i686   mdadm: 2.6.7.1-1.fc9.i386

# mdadm --assemble /dev/md0 /dev/sda4 /dev/sdb4
mdadm: cannot open device /dev/sda4: Device or resource busy
mdadm: /dev/sda4 has no superblock - assembly aborted

downgrade to mdadm 2.6.4-4.fc9.i386

# mdadm --assemble /dev/md0 /dev/sda4 /dev/sdb4
# mount /dev/md0 /shared/hltdir1/disk1
# df -k /shared/hltdir1/disk1
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md0             532768096 151304328 354400680  30% /shared/hltdir1/disk1

Comment 2 Doug Ledford 2008-11-20 22:42:56 UTC
Can you get me the output of /proc/mdstat when it's failing?

Comment 3 Doug Ledford 2008-11-20 22:44:11 UTC
Also, the output of mdadm -E /dev/sda4 would help too.

Comment 4 Mark Hittinger 2008-11-21 01:20:38 UTC
with mdadm-2.6.7.1-1.fc9.i386

# dmesg | grep md | grep -v bmdma
md: bind<sda4>
md: md0 stopped.

# cat /proc/mdstat
Personalities :
md_d0 : inactive sda4[0](S)
      270630912 blocks

unused devices: <none>

# mdadm -E /dev/sda4 | more
/dev/sda4:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7fac9e67:54bef915:6385bb5a:d6009b91
  Creation Time : Mon Jul  7 13:45:56 2008
     Raid Level : raid0
  Used Dev Size : 0
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
 
    Update Time : Mon Jul  7 13:45:56 2008
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 480306d9 - correct
         Events : 1
 
     Chunk Size : 64K
 
      Number   Major   Minor   RaidDevice State
this     0       8        4        0      active sync   /dev/sda4
 
   0     0       8        4        0      active sync   /dev/sda4
   1     1       8       20        1      active sync   /dev/sdb4
 
with mdadm-2.6.4-4.fc9.i386:

# dmesg | grep md | grep -v bmdma
md: md0 stopped.
md: bind<sdb4>
md: bind<sda4>
md: raid0 personality registered for level 0
md0: setting max_sectors to 128, segment boundary to 32767
raid0 : md_size is 541261824 blocks.
EXT3 FS on md0, internal journal
 
# cat /proc/mdstat
Personalities : [raid0]
md0 : active raid0 sda4[0] sdb4[1]
      541261824 blocks 64k chunks
 
unused devices: <none>

*** Note the dmesg output differences for a possible clue.

working setup says md0 stopped then two binds happen.  failing setup does
a bind and then a stop.  after that we can't read the disks properly.

Comment 5 Doug Ledford 2008-11-21 02:22:24 UTC
A little more information, can you get me the output of 

ls -l /dev/md*

after the failed attempt to assemble.

Also, if you create an /etc/mdadm.conf file with the single line:
DEVICE partitions
and then run mdadm -Eb /dev/sda4 >> /etc/mdadm.conf and then edit the array line to have the right device name, does mdadm -As /dev/md0 work with the new mdadm?

Another test, without the mdadm.conf file, does mdadm -A --auto=md /dev/md0 /dev/sda4 /dev/sdb4 work?

Comment 6 Mark Hittinger 2008-11-21 02:58:41 UTC
# ls -l md*
brw-rw---- 1 root disk   9, 0 2008-11-20 20:45 md0
brw-rw---- 1 root disk 254, 0 2008-11-20 20:44 md_d0
lrwxrwxrwx 1 root root      7 2008-11-20 20:44 md_d0p1 -> md/d0p1
lrwxrwxrwx 1 root root      7 2008-11-20 20:44 md_d0p2 -> md/d0p2
lrwxrwxrwx 1 root root      7 2008-11-20 20:44 md_d0p3 -> md/d0p3
lrwxrwxrwx 1 root root      7 2008-11-20 20:44 md_d0p4 -> md/d0p4

md:
total 0
brw------- 1 root root 254, 0 2008-11-20 20:44 d0
brw------- 1 root root 254, 1 2008-11-20 20:44 d0p1
brw------- 1 root root 254, 2 2008-11-20 20:44 d0p2
brw------- 1 root root 254, 3 2008-11-20 20:44 d0p3
brw------- 1 root root 254, 4 2008-11-20 20:44 d0p4

/etc/mdadm.conf:
DEVICE /dev/sda4 /dev/sdb4
ARRAY /dev/md0 level=raid0 num-devices=2 UUID=7fac9e67:54bef915:6385bb5a:d6009b9
1

# mdadm -As /dev/md0
mdadm: /dev/md0 assembled from 1 drive - not enough to start the array.

# rm /etc/mdadm.conf
# mdadm -A --auto=md /dev/md0 /dev/sda4 /dev/sdb4
mdadm: cannot open device /dev/sda4: Device or resource busy
mdadm: /dev/sda4 has no superblock - assembly aborted

Comment 7 Doug Ledford 2008-11-21 13:30:57 UTC
Cat you also post the output of mdadm -E /dev/sdb4 so I can compare it with sda4's superblock?  Also, I can't reproduce here, everything works for me with or without an entry in mdadm.conf.  The distinctive point though is that I don't get either of the two errors we've seen on your system:

# mdadm -As /dev/md0
mdadm: /dev/md0 assembled from 1 drive - not enough to start the array.

This one shows that even with the drive identified in mdadm.conf (which increases mdadm's ability to assemble devices due to greater confidence it has the right array members), we still didn't think /dev/sdb4 was a valid member of the array and ignored it for some reason.

# mdadm -A --auto=md /dev/md0 /dev/sda4 /dev/sdb4
mdadm: cannot open device /dev/sda4: Device or resource busy
mdadm: /dev/sda4 has no superblock - assembly aborted

And this one looks like we can't actually read /dev/sda4 on this attempt to open.  However, that may just because it was locked from an earlier failed attempt to assemble the device.  When you post the contents of /proc/mdstat I saw this:

# cat /proc/mdstat
Personalities :
md_d0 : inactive sda4[0](S)
      270630912 blocks

Which looks to me like sda4 is being held exclusively by a *different* raid device than /dev/md0, /dev/md_d0 instead (a partitionable device).

So, what I think is happening, is because your array isn't listed in your mdadm.conf at reboot, and because you don't use it for any device required at initrd time, the array isn't being started by either of the calls to mdadm that exist on other systems (one in the initrd, and another in rc.sysinit that starts any devices listed in the mdadm.conf that aren't started by the initrd).  As a result, the udev rule, from 70-mdadm.rules, is kicking in when the system processes the partitions on /dev/sda and /dev/sdb.  Since the udev rule doesn't know if you want a partitioned device or not, it's creating a partitioned raid array and then attempting incremental assembly of the array.  This is how md_d0 is getting created.  Once sda4 is in md_d0 waiting for the rest of the array members to be found, it's locked out from being used as part of /dev/md0.  Now, this would all be fine if the incremental assembly finished, but for some reason, /dev/sdb4 is not being considered a valid array member, so the assembly fails.  Once that happens, things are locked out.  If you call mdadm -S /dev/md_d0 though, the hand assembly on the command line should work again without needing a reboot.  So, the real thing to figure out is why your array won't assemble automatically, and that should solve your problem.  However, you'll be better off in the end if you have the array listed in mdadm.conf so mdadm doesn't accidentally choose the wrong type of array to create (partitioned or not partitioned) during the udev triggered assembly.

Comment 8 Mark Hittinger 2008-11-21 16:36:25 UTC
I prevented mdadm from being run by udev and the mdadm --assemble is
successful with the new mdadm.  So my problem is definetly the grab of
sda4 by udev - it won't let go of it if it can't figure out what to do.
I think the older mdadm WAS letting go of sda4 if it couldn't decide
what to do and then my manual approach would work.

After creating an mdadm.conf like so:
# more /etc/mdadm.conf
DEVICE /dev/sda4 /dev/sdb4
ARRAY /dev/md0 devices=/dev/sda4,/dev/sdb4

the udev grab works because it has a better hint, it assembles md0 at boot,
and I am able to mount the device OK.

So I think a better description of the bug is that something in the new mdadm
isn't letting go of sda4 when it is confused, whereas the old mdadm did let go.
I think that is still a problem worth correcting.

# mdadm -E /dev/sdb4
/dev/sdb4:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7fac9e67:54bef915:6385bb5a:d6009b91
  Creation Time : Mon Jul  7 13:45:56 2008
     Raid Level : raid0
  Used Dev Size : 0
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Mon Jul  7 13:45:56 2008
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 480306eb - correct
         Events : 1

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       20        1      active sync   /dev/sdb4

   0     0       8        4        0      active sync   /dev/sda4
   1     1       8       20        1      active sync   /dev/sdb4

Comment 9 Bug Zapper 2009-06-10 03:19:15 UTC
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 10 Bug Zapper 2009-07-14 14:07:19 UTC
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.