I can't get recent kernels to boot because they don't seem to find necessary devices ("no devices found for /dev/mdX"). The status is as follows: works -> kernel-2.6.25-0.87.rc3.git4.fc9 fails -> kernel-2.6.25-0.101.rc4.git3.fc9 fails -> kernel-2.6.25-0.105.rc5.fc9 fails -> kernel-2.6.25-0.113.rc5.git2.fc9 (mkinitrd was at version 6.0.35-1.fc9 when installing the 0.113 version of the kernel) these are simple unencrypted raid-1 devices: [root@nexus t]# uname -a Linux nexus 2.6.25-0.78.rc3.git1.fc9 #1 SMP Fri Feb 29 02:19:15 EST 2008 i686 athlon i386 GNU/Linux [root@nexus t]# rpm -q mkinitrd mkinitrd-6.0.34-1.fc9.i386 [root@nexus t]# cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md0 : active raid1 sda2[0] sdb2[1] 104320 blocks [2/2] [UU] md1 : active raid1 sda3[0] sdb3[1] 1052160 blocks [2/2] [UU] md2 : active raid1 sda5[0] sdb5[1] 307628544 blocks [2/2] [UU] unused devices: <none> Relevant entries from fstab: /dev/md2 / ext3 defaults,noatime 1 1 /dev/md0 /boot ext3 defaults,noatime 1 2 /dev/md1 swap swap defaults 0 0 Relevant entries from grub.conf: title Fedora (2.6.25-0.105.rc5.fc9) root (hd0,1) kernel /vmlinuz-2.6.25-0.105.rc5.fc9 ro root=/dev/md2 initrd /initrd-2.6.25-0.105.rc5.fc9.img title Fedora (2.6.25-0.78.rc3.git1.fc9) root (hd0,1) kernel /vmlinuz-2.6.25-0.78.rc3.git1.fc9 ro root=/dev/md2 initrd /initrd-2.6.25-0.78.rc3.git1.fc9.img [dennis@nexus ~]$ cat /etc/mdadm.conf # mdadm.conf written out by anaconda DEVICE partitions MAILADDR root ARRAY /dev/md2 level=raid1 num-devices=2 uuid=e76e2ddd:0704b19b:5f2c9cac:8880bf5c ARRAY /dev/md0 level=raid1 num-devices=2 uuid=ec127fd1:2f891ce6:3bdd9733:c16732b0 ARRAY /dev/md1 level=raid1 num-devices=2 uuid=a5cd1ca6:d38b9418:949ca280:86ec588e
Created attachment 297883 [details] dmesg output of kernel-2.6.25-0.87.rc3.git4.fc9
I may be seeing a similar problem, though I have a more complicated setup in that I am running encryption over raid. I can get things to work using a rescue disk (with the .105 kernel), the normal boot doesn't work with a similar message. What I would like to try next is figuring out if mkinitrd loads the proper modules for my sata disk drives, though I am not sure which ones those should be.
My home machine seems like it might also be having this issue and it is just using raid 1 (no encryption). It failed to boot with a .105 kernel, but did boot with a .95 kernel. However my old work machine is also running raid 1, but is working fine with a .105 kernel. One potential difference is that the old work machine has one file system that isn't on top of raid. Both of the other machines only have filesystems that run on top of raid devices (in some cases with an intervening enctyption layer).
I just tested reinstalling the .95 kernel on my new work machine (using a recent mkinitrd) and it failed to find devices when booting up. So in my case it looks like the problem is more likely tied to mkinitrd than the kernel.
I tried adding in all of the modles that were loaded by the rescue image. This resulted in a more normal prompt for the swap and root luks keys. There wasn't an error with the root pivot. However right afterwards there was a problem finding /bin/sh . Then there was a kernel panic. I expect loading a bunch of modules in a random order could cause problems, but it does suggest there is an issue with some needed modules not getting added by mkinitrd.
This seems almost certainly a mkinitrd problem. Installing kernel-2.6.25-0.113.rc5.git2.fc9 with mkinitrd-6.0.34 installed results in a kernel installation that fails to boot. I downgraded to mkinitrd-6.0.33 and reinstalled the kernel-2.6.25-0.113.rc5.git2.fc9 and it boots fine.
There may be a problem with properly walking back through the device list (since to use raid, you need to be able to use the underlying devices and similarly for luks). I am using encryption over raid and some modules are not getting included. I have tried several recent versions of mkinitrd, including 6.0.33 and am consistantly seeing that problem. My next to latest test was munging /etc/fstab to see how that effected things. Changing the line for swap to indicate it was on /dev/sdb2 instead of a luks device resulted in the init file including at least some of the missing modules. I then tried saying both / and swap were directly on raid devices and the modules went missing again. (This last one was with an unmodified 6.0.34.)
I would really like to ssh into an affected machine and attempt to figure out what's going on here. I would need root access and the ability to reboot. Willing to work with you over IRC while I do it. http://togami.com/~warren/id_dsa.pub.asc Here is my ssh public key, GPG signed so you can verify that it is really me. If you want to give me ssh access please send me an e-mail.
Ah nevermind, pjones said he might have fixed this. http://koji.fedoraproject.org/packages/mkinitrd/ please test the latest builds here and report back
Looks like there is a bug in that version: Running Transaction Installing: kernel-devel ######################### [1/4] Updating : kernel-headers ######################### [2/4] Installing: kernel ######################### [3/4] /sbin/mkinitrd: line 1559: [: too many arguments Cleanup : kernel-headers ######################### [4/4]
trivial patch: diff -Naur old/sbin/mkinitrd new/sbin/mkinitrd --- old/sbin/mkinitrd 2008-03-15 02:11:55.000000000 +0100 +++ new/sbin/mkinitrd 2008-03-15 02:11:40.000000000 +0100 @@ -69,6 +69,7 @@ PREMODS="" DMDEVS="" ncryptodevs=0 +nlatecryptodevs=0 NET_LIST="" LD_SO_CONF=/etc/ld.so.conf
Unfortunately the resulting installed kernel still doesn't boot. :( (I can confirm though that kernels installed with mkinitrd 6.0.33 do boot correctly. Not sure why I haven't thought about *downgrading* mkinitrd myself. Doh!)
The machine I am playing with has an encrypted root and swap so you can't reboot it without someone being present to enter the keys and if things go south boot again with the rescue disk. I won't be able to provide that ability until Monday. In principal though I don't have a problem letting you try stuff on it, since it is a fresh install with yum updates, there is nothing confidential on it yet. The only other tricky thing is that I have been hanging it off my old desktop machine and it doesn't currently have a direct connection to the internet. But I can do something if we go there. In the meantime I'll take a look at the latest mkinitrd and see if the initrd images that are created look reasonable.
I tried out 6.0.36 and 6.0.36 with the above fix and neither generated an initrd that loaded the modules need to read my disk drives. So this needs more work, at least for the encryption over raid case.
I tried out 6.0.36 with the above fix on a different machine that has both / and swap on software raid devices and booting falied similarly to above. However the kernel modules for accessing the disk drives looked to be included in initrd image, so there might be two separate problems going on.
Created attachment 298326 [details] Diff of initrd contents from 6.0.33 to 6.0.37 mkinitrd-6.0.33 you reported above as working. mkinitrd-6.0.37 you reported above as broken. However running both versions of mkinitrd on your box, I don't see how 6.0.33 could work as it is missing everything encryption related. Am I missing something?
Bruno, look in /tmp/mkinitrd-rpms. That is where I put together the above comparison between 6.0.33 and 6.0.37. How did you get 6.0.33 to produce a working initrd earlier?
*** From 6.0.37 *** echo Setting up disk encryption: /dev/md2 cryptsetup luksOpen /dev/md2 luks-md2 echo Setting up disk encryption: /dev/md1 cryptsetup luksOpen /dev/md1 luks-md1 resume mapper/luks-md1 echo Creating root device. mkrootdev -t ext3 -o defaults,ro /dev/mapper/luks-md2 resume mapper/luks-md1 <--- Is this line supposed to be missing the preceding "/dev/"?
# find the first swap dev which would get used for swsusp swsuspdev=$(awk '/^[ \t]*[^#]/ { if ($3 == "swap") { print $1; exit }}' $fstab) if [[ "$swsuspdev" =~ ^(UUID=|LABEL=) ]]; then swsuspdev=$(resolve_device_name "$swsuspdev") fi @ suspdev=$(findblockdevinsys "$swsuspdev") @ suspdev=${suspdev##*/dev/} if [ -n "$suspdev" ]; then swsuspdev="$suspdev" fi unset suspdev if [ -n "$swsuspdev" ]; then handlelvordev "$swsuspdev" fi fi The second line beginning in @ sets suspdev to mapper/luks-md1, handlelvordev does nothing with the resulting $swsuspdev because it is plain RAID (not LVM or raw devices) meaning an invalid name without "/dev/" is later emitted as the resume device. This is likely a different bug though?
Found what might be the actual problem. This initrd is devoid of any disk controller. Too tired to think of a fix now and I can't reboot your machine to test it anyway. Please test the following workarounds: mkinitrd --preload=mptsas /tmp/initrd-test.img 2.6.25-0.121.rc5.git4.fc9 mkinitrd --with-avail=block /tmp/initrd-test.img 2.6.25-0.121.rc5.git4.fc9 The resulting initrd of the first command seems to have mptsas and many required modules. It is lacking sd_mod, not sure if that's needed. The second command pulls in all possible block devices into the initrd and will attempt to load drivers that the system's devices after it finishes loading all the RAID drivers. Do either of these initrd's work any better?
pjones might have fixed it in 6.0.39 which I installed on your box. Please try running it and reboot to see how it goes.
My first try with the 121 kernel failed, but I am not sure that this was initrd's fall. I saw this failure on another machine. I forgot to run mkinitrd for both kernels, but I should have time to test the 113 kernel before I leave. The error message was: /bin/sh: ro: No such file or directory Kernel panic - not syncing: Attempted to kill init! Sorry about not getting to this earlier today, but I had some other stuff keeping me busy. If this next test fails, I'll have more time tomorrow to do test reboots.
A second test with the 113 kernel failed. The buses don't run very late this week because of spring break here, so that is about all I can do before I have to leave for reboot testing. I'll get the machine back up in rescue mode so that you can try other stuff. I'll try 6.0.39 at home, since I am not sure the problem is encryption now, and I can see what it does on a raid system without it.
No, the problem is specifically with the combination of encryption and RAID.
After installing 6.0.39 new kernels boot properly on my machine.
I tried 6.0.39 at home on raid, no encryption, no lvm system and it worked OK. I look at the init file on the encryption over raid machine to see if I see anything odd, before going back to sleep.
I didn't see anything obviously odd. I should have some time to try things later today, but could use some suggestions as to what.
I am running rpm -Va to check to see if anything has gotten messed up in that maybe only part of bash is installed. This seems unlikely since it works in rescue mode, but seems easy enough to check.
Bruno, you didn't state if 6.0.39 works for you with raid and encryption. Does it?
No it didn't. That was what I was trying to say in comments 22 and 23. But things were better in that I got asked for swap and root's keys and then switchroot seemed to happen. But it looks like it was unable to run /bin/sh for some reason.
I ran rpm -Va and saw some diffs. While none particularly looked like a problem, I am going to reinstall the affect packages and then see if that makes a difference. If that doesn't fix things, I'll try a reinstall with the latest boot.iso file and see how things work after that.
Yeah! After reinstalling all packages flagged by rpm -Va and the kernel (for good measure), rebooting worked. I suspect I got a corrupted copy of something or an update didn't work correctly. So I think this one is really fixed now. Thanks! I am off to do a fresh install and to unroot Warren.
I completed a fresh install from this morning's boot iso and encryption over raid is working. Thanks for making this feature work.
I can confirm this works with the Beta release of Fedora 9.
6.0.40 works for me with two RAID-1 devices and LVM.
6.0.40 is emitting a warning for me every time a new kernel is installed: "resolveDevice: device spec expected" Here's how I can reproduce it: sudo /sbin/mkinitrd /tmp/test 2.6.25-0.150.rc6.git7.fc9 resolveDevice: device spec expected
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Rob, I think your problem with 6.0.40 is unrelated and also fixed in a later release. Is it still a problem for you?
I think this bug has can be counted as fixed => closing