Bug 1018272 - md devices get incorrect names, system stops booting
md devices get incorrect names, system stops booting
Status: CLOSED DUPLICATE of bug 1015204
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
18
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Jes Sorensen
Fedora Extras Quality Assurance
:
: 1020134 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-11 11:22 EDT by David Woodhouse
Modified: 2013-10-20 06:40 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-10-14 09:57:54 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description David Woodhouse 2013-10-11 11:22:32 EDT
My RAID devices, which have always been /dev/md0, /dev/md1 and /dev/md2, are now appearing as md125, md126 and md127. Which means that since my fstab referred to them by name, my machine no longer boots.

I booted back into the old 3.9.9-201 kernel, and 'mdadm --detail /dev/md0' shows:

Preferred Minor : 0
    Persistence : Superblock is persistent

I changed my fstab to mount by UUID, and now when I boot the *new* 3.10.14-100 kernel, the same device (now /dev/md126) says:

Preferred Minor : 126
    Persistence : Superblock is persistent

The output of 'mdadm --examine /dev/sda1' *also* changes when in the new kernel, and now reports:

Preferred Minor : 126

If I boot back into the old kernel, the same 'mdadm --examine' command gives:

Preferred Minor : 1


I think we're using RAID autorun, aren't we? So this is a kernel problem rather than dracut or mdadm?

The 3.9.9 kernel says:
[    4.938535] md: bind<sda3>
[    4.964794] md: bind<sdb1>
[    4.966724] md: bind<sda2>
[    4.976304] md: bind<sda1>
[    4.978911] md: raid1 personality registered for level 1
[    4.979066] md/raid1:md0: active with 2 out of 2 mirrors
[    4.979097] md0: detected capacity change from 0 to 518979584
[    4.980154] md: bind<sdb2>
[    4.981249] md/raid1:md1: active with 2 out of 2 mirrors
[    4.981282] md1: detected capacity change from 0 to 210232082432
[    4.981740]  md0: unknown partition table
[    4.983541]  md1: unknown partition table
[    4.992081] md: bind<sdb3>
[    4.993175] md/raid1:md2: active with 2 out of 2 mirrors
[    4.993205] md2: detected capacity change from 0 to 3787862245376
[    4.994269]  md2: unknown partition table


While the 3.10.14 kernel says:

[    5.032557] md: bind<sda3>
[    5.035050] md: bind<sda2>
[    5.037492] md: bind<sda1>
[    5.111210] md: bind<sdb3>
[    5.113929] md: raid1 personality registered for level 1
[    5.114112] md/raid1:md127: active with 2 out of 2 mirrors
[    5.114147] md127: detected capacity change from 0 to 3787862245376
[    5.114206] RAID1 conf printout:
[    5.114208]  --- wd:2 rd:2
[    5.114210]  disk 0, wo:0, o:1, dev:sdb3
[    5.114211]  disk 1, wo:0, o:1, dev:sda3
[    5.119245]  md127: unknown partition table
[    5.150050] md: bind<sdb2>
[    5.151286] md/raid1:md126: active with 2 out of 2 mirrors
[    5.151332] md126: detected capacity change from 0 to 210232082432
[    5.151607] RAID1 conf printout:
[    5.151609]  --- wd:2 rd:2
[    5.151611]  disk 0, wo:0, o:1, dev:sdb2
[    5.151612]  disk 1, wo:0, o:1, dev:sda2
[    5.205191]  md126: unknown partition table
[    5.221123] md: bind<sdb1>
[    5.222510] md/raid1:md125: active with 2 out of 2 mirrors
[    5.222546] md125: detected capacity change from 0 to 518979584
[    5.222637] RAID1 conf printout:
[    5.222638]  --- wd:2 rd:2
[    5.222640]  disk 0, wo:0, o:1, dev:sdb1
[    5.222641]  disk 1, wo:0, o:1, dev:sda1
[    5.310186]  md125: unknown partition table

/etc/mdadm.conf contains:

# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR root

ARRAY /dev/md1 level=raid1 num-devices=2 UUID=4ec0bafa:acd9dd7f:933cad26:7eb047ee
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=f8d8cf6d:800aeedb:c802007b:12a7127d
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=20166bf1:24f712c0:57559a00:6d37fb89
Comment 1 Josh Boyer 2013-10-11 12:09:41 EDT
This sounds familiar, but I don't recall why and I can't find a previous bug that matches this.

David noted in IRC that both dracut and mdadm were updated as well.
Comment 2 Jes Sorensen 2013-10-14 04:52:07 EDT
David,

Can you provide the following info please:

mdadm version
/proc/mdstat out
mdadm --examine </dev/sdX> output of the devices of the arrays

Thanks,
Jes
Comment 3 David Woodhouse 2013-10-14 05:15:11 EDT
Under the 3.10.14-100 kernel:

[root@twosheds ~]# mdadm --examine /dev/sda1
/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 20166bf1:24f712c0:57559a00:6d37fb89
  Creation Time : Thu Jul  3 03:29:41 2008
     Raid Level : raid1
  Used Dev Size : 506816 (495.02 MiB 518.98 MB)
     Array Size : 506816 (495.02 MiB 518.98 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 125

    Update Time : Sun Oct 13 01:00:06 2013
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 4d944be9 - correct
         Events : 2010


      Number   Major   Minor   RaidDevice State
this     1       8        1        1      active sync   /dev/sda1

   0     0       8       17        0      active sync   /dev/sdb1
   1     1       8        1        1      active sync   /dev/sda1
[root@twosheds ~]# mdadm --examine /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 20166bf1:24f712c0:57559a00:6d37fb89
  Creation Time : Thu Jul  3 03:29:41 2008
     Raid Level : raid1
  Used Dev Size : 506816 (495.02 MiB 518.98 MB)
     Array Size : 506816 (495.02 MiB 518.98 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 125

    Update Time : Sun Oct 13 01:00:06 2013
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 4d944bf7 - correct
         Events : 2010


      Number   Major   Minor   RaidDevice State
this     0       8       17        0      active sync   /dev/sdb1

   0     0       8       17        0      active sync   /dev/sdb1
   1     1       8        1        1      active sync   /dev/sda1


As described in comment 0, running 'mdadm --examine' under the older kernel (with the initramfs which was build with older mdadm and dracut) shows *different* results for 'Preferred Minor'.
Comment 4 Jes Sorensen 2013-10-14 05:38:59 EDT
Ok,

So old 0.9 metadata arrays. This does sound a bit similar to BZ#1015204
where mdadm.conf isn't copied into the initramfs and dracut assembles the
arrays as it sees fit.

I am not sure whether dracut respects the preferred minor number in the
superblock when assembling the array.

Harald, do you have any input on this?

Thanks,
Jes
Comment 5 David Woodhouse 2013-10-14 09:21:09 EDT
Indeed, if I unpack each initramfs image, only the old one has a copy of mdadm.conf

But why would it not respect the preferred minor number? And why in $DEITY's name is the preferred minor number *different* under different kernels?

When we boot the new kernel/initramfs, does it not only ignore the preferred minor but also *update* the preferred minor in each individual device of the array to match the new numbers it pulled out of its posterior?
Comment 6 Jes Sorensen 2013-10-14 09:35:50 EDT
What happens if you manually add the mdadm.conf to the initramfs? Do you get
the old behaviour back?

The preferred minor number is just a hint and not guaranteed for anything.
In fact, there is no such thing in v1.2 metadata superblocks, not sure about
v1.0/v1.1

Then again, I suspect this is happening in dracut/mdadm not in the kernel.
I don't know if dracut explicitly picks a device name when it assembles the
array or if it relies on mdadm for picking one for it.
Comment 7 Jes Sorensen 2013-10-14 09:57:54 EDT
Looks like this is a dupe of 1015204

*** This bug has been marked as a duplicate of bug 1015204 ***
Comment 8 Josh Boyer 2013-10-17 07:37:05 EDT
*** Bug 1020134 has been marked as a duplicate of this bug. ***
Comment 9 Ta-chang 2013-10-20 06:40:44 EDT
I'm a person who encountered the same issue as this and had already
reported in duplicate bug of 1020134.

Although I don't know it is real reference of this report very much,
I've tried to renew existed initramfs with dracut --force option.
But no luck has come...

Note You need to log in before you can comment on or make changes to this bug.