Red Hat Bugzilla – Bug 503443
boot hangs with root=/dev/md/root (raid1 device)
Last modified: 2010-01-12 10:31:34 EST
Description of problem:
The root partition of a server lives in a raid1 device with metadata format 1.0 named "root", built off two primary partitions.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.mdadm --create /dev/md/root --metadata=1.0 --level=raid1 /dev/sda1 /dev/sdb1
2.mdadm --examine --scan /dev/md0 >>/etc/mdadm.conf
3.edit grub.conf and set "root=/dev/md/root"
boot hangs just after auto-assembling the md device. no error messages are printed.
boot works if I specify "root=/dev/sda1" or "root=/dev/sdb1".
There are several other md devices in the system, including one with 0.90 metadata.
You have to trick mkinitrd into doing what you want when you are switching your root device like this. Specifically, you missed the step where you needed to update /etc/fstab. You also missed the fact that mkinitrd tries to determine what your root filesystem needs to boot, and it uses a combination of information from /etc/fstab and the current list of found devices to do that. I've never been able to switch things like this without either hand editing the script that mkinitrd puts into the initrd image to make it do what I want when I did this on a running system, or booting a CD into rescue mode, making all the changes to all the files, then running mkinitrd.
Regardless though, I'm about 100% positive this isn't an mdadm issue. I don't think you could even call it an issue with any one component, there are simply too many steps needed to make this work and you have to get all of them right.
Thanks for the extensive explanation.
For the record, the fstab entry is correct, so it must be something else.
The init script in mkinitrd also looks fine (note that swap is also on md):
mount -t proc /proc /proc
echo Mounting proc filesystem
echo Mounting sysfs filesystem
mount -t sysfs /sys /sys
echo Creating /dev
mount -o mode=0755 -t tmpfs /dev /dev
mount -t devpts -o gid=5,mode=620 /dev/pts /dev/pts
echo Creating initial device nodes
mknod /dev/null c 1 3
mknod /dev/zero c 1 5
mknod /dev/systty c 4 0
mknod /dev/tty c 5 0
mknod /dev/console c 5 1
mknod /dev/ptmx c 5 2
mknod /dev/fb c 29 0
mknod /dev/hvc0 c 229 0
mknod /dev/tty0 c 4 0
mknod /dev/tty1 c 4 1
mknod /dev/tty2 c 4 2
mknod /dev/tty3 c 4 3
mknod /dev/tty4 c 4 4
mknod /dev/tty5 c 4 5
mknod /dev/tty6 c 4 6
mknod /dev/tty7 c 4 7
mknod /dev/tty8 c 4 8
mknod /dev/tty9 c 4 9
mknod /dev/tty10 c 4 10
mknod /dev/tty11 c 4 11
mknod /dev/tty12 c 4 12
mknod /dev/ttyS0 c 4 64
mknod /dev/ttyS1 c 4 65
mknod /dev/ttyS2 c 4 66
mknod /dev/ttyS3 c 4 67
daemonize --ignore-missing /bin/plymouthd
echo "Loading i2c-core module"
modprobe -q i2c-core
echo "Loading i2c-algo-bit module"
modprobe -q i2c-algo-bit
echo "Loading drm module"
modprobe -q drm
echo "Loading radeon module"
modprobe -q radeon
echo Setting up hotplug.
echo Creating block device nodes.
echo Creating character device nodes.
echo "Loading raid1 module"
modprobe -q raid1
echo "Loading pata_acpi module"
modprobe -q pata_acpi
echo "Loading ata_generic module"
modprobe -q ata_generic
mdadm -As --auto=yes --run /dev/md127
mdadm -As --auto=yes --run /dev/md126
echo Creating root device.
mkrootdev -t ext4 -o noatime,data=ordered,acl,user_xattr,ro /dev/md/root
echo Mounting root filesystem.
cond -ne 0 plymouth --hide-splash
echo Setting up other filesystems.
echo Switching to new root and running init.
echo Booting has failed.
Created attachment 346152 [details]
Fix for named md device bringup in mkinitrd
I forgot that you also need this patch to mkinitrd (I don't know if it will apply cleanly to a rawhide system, I did the patch against the latest F9 mkinitrd package, but the intent should be obvious).
There are two things to keep in mind. Named md devices (as opposed to numbered devices) are always symlinks to whatever random number was used to assemble the device *this* time (aka /dev/md127). That number can change, but the name will be the same and regardless of the number the name will always point to the right device. The reason we still have to deal with the numbered device (and intentionally ignore the numbered device as this patch does) is because it's transient. Unfortunately, the output of /proc/mdstat uses the transient numbered device instead of the name symlink. So does the contents of /sys/block. So, we need to detect in mkinitrd when the md device name is a symlink, then work back to the real device, use the real device for resolution of device drivers and such, but use the name of the array for assembly. This also implies that you need to make sure that the mdadm.conf file has an array line that uses the name of the device, and not the transient md device number.
So, make sure that your array lines are of the type:
ARRAY /dev/md/root <blah>
ARRAY /dev/md127 <blah>
then make sure the fstab uses the name (which you already said it does), then rerun mkinitrd after applying this patch (or hand applying it if it doesn't apply cleanly) and you should be good to go. In the meantime, you should also be able to pass root=/dev/md127 (or /dev/md126) and get the system to boot with the existing initrd, however it will drop to a "Repair filesystem" prompt when it gets to checking the filesystems since /etc/fstab is looking for something else as root and it won't exist (mdadm won't create the /dev/md/* symlinks when it's called with mdadm -As --auto=yes --run /dev/md127 like the boot log messages show, it has to be called without specifying a name or with the symlink name if you want the symlink created...the mkinitrd patch corrects that).
Thanks, you are the best maintainer ever!
I'm attaching a version of the patch rebased against Fedora 11. Unfortunately, I couldn't test the fix yet because today I'm not physically near that particular server.
Created attachment 346156 [details]
patch rebased against the F11 version of mkinitrd
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.
More information and reason for this action is here:
This is a mass edit of all mkinitrd bugs.
Thanks for taking the time to file this bug report (and/or commenting on it).
As you may have heard in Fedora 12 mkinitrd has been replaced by dracut. In Fedora 12 the mkinitrd package is still around as some programs depend on
certain libraries it provides, but mkinitrd itself is no longer used.
In Fedora 13 mkinitrd will be removed completely. This means that all work
on initrd has stopped.
Rather then keeping mkinitrd bugs open and giving false hope they might get fixed we are mass closing them, so as to clearly communicate that no more work will be done on mkinitrd. We apologize for any inconvenience this may cause.
If you are using Fedora 11 and are experiencing a mkinitrd bug you cannot work around, please upgrade to Fedora 12. If you experience problems with the initrd in Fedora 12, please file a bug against dracut.