Bug 241949
Summary: | (With patch to fix problem) F7 setup installs bad initrd, fails to boot after install | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jan Hlavaty <hlavac> | ||||||||
Component: | mkinitrd | Assignee: | Peter Jones <pjones> | ||||||||
Status: | CLOSED NEXTRELEASE | QA Contact: | |||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | urgent | ||||||||||
Version: | 7 | CC: | amk, amlau, bill.gertz, bobgus, donaldp, jannes.faber, markf78, mishu, nerijus, noldoli, piskozub, redhat, redhat-bugzilla, ron, russell, saul, zing | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i386 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2007-10-29 18:27:51 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Jan Hlavaty
2007-05-31 21:19:03 UTC
See also bug 242043 I guess I have to rebuild my initrd.. Do you have any specific hints on this process? I rebuilt my initrd with the following command: mkinitrd /boot/initrd-2.6.21-1.3194.fc7-fixed.img 2.6.21-1.3194.fc7 Then edited grub/grub.conf to load the modified file. The result is 6 bytes smaller than the original - and it also does not boot. Created attachment 155927 [details]
initrd.log from mkinitrd -v ...
I used the command: mkinitrd -v --force-raid-probe --force-lvm-probe /boot/initrd-2.6.21-1.3194.fc7-fixed2.img 2.6.21-1.3194.fc7 2>&1 | tee /boot/initrd.log and <b>still no successful boot</b>. Attached is the file 'initrd.log' It does contain the mdadm and ld-lsb.so.3 files The output of the mkinitrd is shown as the attachment above See bug 237415 Problem may be with bad mkinitrd (kernel > 2.6.20-1.3069) See 3rd from last comment in bug 237415 Solve due to bbaetz I'm not using mdadm (no RAID), therefore the "adding mdadm -Es output to /etc/mdadm.conf" solution has no effect. I rebuilt the initrd as above ... also no effect, I still can't boot. kernel-2.6.21-1.3194.fc7 as above, and using LVM. The same setup boots fine with FC5. All filesystems do have e2labels and the correct entries in fstab. The system in question is a server, so I'm very keen to get this working ASAP, as you can imagine. Further deatils if required. One of the comments made on one of the referenced bugs seemed to indicate that fstab files with LABELs were a problem for a piece of software in the initial install chain (which ?? - anaconda?, mkinitrd as used by anaconda - may be different from mkinitrd used 'in the wild') You might try editing out the LABELs in fstab and then do mkinitrd (and change name in grub/grub.conf to reflect the changed name of initrd<..>.img What are your symptoms? Are you getting /dev/rootvg/root/ not found? I get: Creating root device. Mounting root filesystem. mount: could not find filesystem '/dev/root' Ref: grub.conf. I've tried using either labels or device names, no luck either way. I've finally managed to get my server to boot F7, but it's an ugly hack, however it is also quite revealing. I had the foresight to back up my (FC5) /boot filesys from the previous install on the the same machine (and exactly the same disk layout), so I thought I'd try something a bit hacky. I --force installed kernel-2.6.20-1.2316.fc5 then rebooted. Same problem couldn't find /dev/root. So then I copied my backup of the original initrd-2.6.20-1.2316.fc5.img over the current one and ... success. The system will now boot, albeit with an old kernel. IOW mkinitrd is b0rked ... badly - at least with LVM systems. And I can confirm that the mkinitrd usee *post install* is also farked, so it's not just Anaconda. I could do a fresh initrd and post it, along with the working one, as an attachment here, if you'd like to examine them. Look through bug 237415 for background, it is filed under mkinitrd rather than anaconda. ------------------------------ I think your problem is due to some config file that is just not written out correctly by anaconda prior to the creation of initrd for your system. In my case, it was the /etc/mdadm.conf file, but your case is different as you are not running raid. Look closely at /etc/lvm/lvm.conf, particularly the differences between the file generated by anaconda and the file in your FC5 system. --------------------------- See the process by Andy Baumhauer in the note http://fcp.surfsite.org/modules/newbb/viewtopic.php?viewmode=threaded&order=DESC&topic_id=36690&forum=12&move=prev&topic_time=1178039157 Below is his debugging process. You can use this process to unpeel the contents of the initrd file. "I debugged my problem by: cd /tmp cp /boot/initrd-<kernel release version>.img /tmp/initrd-<kernel release version>.img.gz gunzip initrd-<kernel release version>.gz mkdir initrd cd initrd cpio -cid -I ../initrd-<kernel release version>.img now examine the init nash script against a working script (from CentOS). My bug is https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=237415 The module to file against is most likely mkinitrd, if the problem I had is causing your problem. I hope that this points you in the right direction. Andy" Here's the diff for lvm.conf. This is all a foreign language to me, so maybe you can make sense of it: --- /etc/lvm/lvm.conf 2007-03-19 21:54:11.000000000 +0000 +++ /mnt/WD_Passport/sky.backup/etc/lvm/lvm.conf 2007-01-02 23:07:29.000000000 +0000 @@ -56,14 +56,10 @@ # filter = [ "a|^/dev/hda8$|", "r/.*/" ] # The results of the filtering are cached on disk to avoid - # rescanning dud devices (which can take a very long time). - # By default this cache is stored in the /etc/lvm/cache directory - # in a file called '.cache'. - # It is safe to delete the contents: the tools regenerate it. - # (The old setting 'cache' is still respected if neither of - # these new ones is present.) - cache_dir = "/etc/lvm/cache" - cache_file_prefix = "" + # rescanning dud devices (which can take a very long time). By + # default this cache file is hidden in the /etc/lvm directory. + # It is safe to delete this file: the tools regenerate it. + cache = "/etc/lvm/.cache" # You can turn off writing this cache file by setting this to 0. write_cache_state = 1 @@ -83,12 +79,6 @@ # software RAID (md) devices by looking for md superblocks. # 1 enables; 0 disables. md_component_detection = 1 - - # If, while scanning the system for PVs, LVM2 encounters a device-mapper - # device that has its I/O suspended, it waits for it to become accessible. - # Set this to 1 to skip such devices. This should only be needed - # in recovery situations. - ignore_suspended_devices = 0 } # This section that allows you to configure the nature of the @@ -192,9 +182,6 @@ # command. Defaults to off. test = 0 - # Default value for --units argument - units = "h" - # Whether or not to communicate with the kernel device-mapper. # Set to 0 if you want to use the tools to manipulate LVM metadata # without activating any logical volumes. Well here's the answer ... --- working/initrd/init 2007-06-03 23:56:32.000000000 +0100 +++ borked/initrd/init 2007-06-03 23:57:29.000000000 +0100 [snip] -echo Scanning logical volumes -lvm vgscan --ignorelockingfailure -echo Activating logical volumes -lvm vgchange -ay --ignorelockingfailure cumulous echo Creating root device. -mkrootdev -t ext3 -o defaults,ro /dev/cumulous/Eagle +mkrootdev -t ext3 -o defaults,ro dm-0 mkinitrd is not creating the necessary lvm commands in nash, and that's even *with* the --force-lvm-probe switch. Also it looks like the mkrootdev entry is wrong. I take it that I can simply edit this, then cpio/gzip it back up again and use it to boot with, right? Hey, give it a try (it's not my system :-) ) Seriously - check the version numbers of lvm between FC5 and F7. (That is a big gap). If different, check the man pages for both versions (if these are still available..) There may be syntax and command variations between the two versions which would mess up your simple insert. It does appear that the steps preceeded by '-' are necessary. Also the '+' step appears to be for a raid system (dm-0). I wonder how that got there? Good luck. It worked!!! Here's a summary of what I did: diff -ur borked-fc7/initrd working-fc7/initrd | grep -v "special file" Only in working-fc7/initrd/bin: lvm Only in working-fc7/initrd/etc: lvm diff -ur borked-fc7/initrd/init working-fc7/initrd/init --- borked-fc7/initrd/init 2007-06-04 01:49:52.000000000 +0100 +++ working-fc7/initrd/init 2007-06-04 01:07:35.000000000 +0100 @@ -75,12 +75,18 @@ insmod /lib/dm-zero.ko echo "Loading dm-snapshot.ko module" insmod /lib/dm-snapshot.ko +echo Making device-mapper control node +mkdmnod insmod /lib/scsi_wait_scan.ko rmmod scsi_wait_scan mkblkdevs +echo Scanning logical volumes +lvm vgscan --ignorelockingfailure +echo Activating logical volumes +lvm vgchange -ay --ignorelockingfailure cumulous resume LABEL=SWAPSPACE2 echo Creating root device. -mkrootdev -t ext3 -o defaults,ro dm-0 +mkrootdev -t ext3 -o defaults,ro /dev/cumulous/Eagle echo Mounting root filesystem. mount /sysroot echo Setting up other filesystems. Only in working-fc7/initrd/sbin: lvm I used the lvm.static in /sbin, and the lvm.conf in /etc/lvm/ to replace those missing files in the initrd (presumably that is what mkinitrd is supposed to do anyway). I edited the init as indicated above, then cpio'ed and gzipped the file, then moved it to boot, and edited grub. Voila! Now, if someone could please investigate why mkinitrd is not doing this automatically ... pretty please. This has got to be the longest OS install I've ever done. What I initially thought would take me about 30 minutes, has actually taken me 3 days!!! Should this bug get moved to mkinitrd, or should I file a new one? Congratulations. You could probably add a reference to this bug to bug 237415 with the surrounding details (no raid, but lvm). I wonder whether the real problem is not in mkinitrd, but somewhere in the chain of commands performed by anaconda to prepare the files for mkinitrd to pack up. Anaconda is run very infrequently by the user classes. It is so hard to get good testers these days. Will have a look at bug 237415. Also note my comment 9 above, this mkinitrd problem happens *post-install* as well, so this is *not* Anaconda specific. Reassigning to proper component. Read ya, Phil I was able to use mkinitrd to rebuild my initrd - no problem. But only after I supplied the correct mdadm.conf with the command 'mdadm -Es' That is why I think the problem is outside of mkinitrd. It is also quite possible that there are more than one buggy component. I had a brief look at mkinitrd just out of curiosity, but there's a lot to absorb so it'll take me some time. To state the obvious, it seems mkinitrd has a problem with both LVM *and* mdadm. Whether this is because of a change to the mkinitrd script itself, or something it relies on, is a matter for investigation. One titbit of info I neglected to reveal before, is that during my "hack" session, one of the things I tried was --force installing the FC5 version of mkinitrd. I can tell you that an initrd made on F7 with the FC5 version of mkinitrd also produces the same problem (no LVM components in the initrd). I'm going to look at what mkinitrd uses externally, and see what's happening. I hope it's not python though. I suck at python :) Your comment " I can tell you that an initrd made on F7 with the FC5 version of mkinitrd also produces the same problem (no LVM components in the initrd)." Could indicate a long-standing bug in mkinitrd. But, if mkinitrd just packs up data and code placed by other parts of the anaconda chain, if that data was garbage, a totally correct mkinitrd would still pack up that garbage. The old GIGO. I too, am having the very same problem (running RAID1 on /boot, RAID1 + LVM for other partitions). This worked without a problem with fc6 but on f7 it chokes as Jan as describes and Bob and Keith have discussed. So I didn't see this problem with mkinitrd with fc6 using the very same setup - fc6 installed and ran without a hitch. I'll give a go at unpacking, hacking and repacking my initrd based on the discussion above. Unfortunately, I chalked up my problems as my install error/ upgrade error and did a fresh install (fc6 was a new install so not much was lost - little did I realize....) so have no working initrd to compare with. I am surprised at bug priority of LOW; it seems inappropriate since this pretty much kills f7 as a server platform using RAID + LVM. Yes, the priority should be higher, but that does not seem to be a field that can be easily changed by peons. I had problems with RAID + LVM on FC6 as well, but only because anaconda would not accept a degraded array (one disk, not two for RAID1). It ran perfectly well in a degraded state. I found a disk on the internet that matched my dead drive and voile, I was able to install FC6. The F7 situation is much more serious. The Fedora folks were talking about various different 'spins' of the install set for F7. They need to get one into circulation that works on RAID + LVM pretty quick. The RAID + LVM problem puts a cloud over Fedora7 when it competes with $$$ software for the big server market. There may be a different bug for this, but I have a related but different problem. I'm not using raid or LVM. I had a FC6 system which I upgraded using 'yum update' after installing the new fedora-release. My / is on /dev/hda2 and my /boot is on /dev/hda1 I got the familiar error: mount: could not find filesystem '/dev/root' I tried editing the 'init' to mount things differently, and now just get: mount: could not findfilesystem '/dev/hda2' Not sure if it is helpful for tracking this down to know that it isn't tied to RAID or LVM. Please ignore the last comment. I found the note about the ata module, and the fact that references to /dev/hda needed to be changed to /dev/sda An F7 LVM system installed using Fedora 7 KDE Live CD i686. After the install wouldn't boot, kernel panic about being unable to mount rootfs, trying to kill init etc. Booted with the same Live CD again, mounted the /boot of the installed unbootable system. Ran mkinitrd with --force-lvm to generate a new initrd image, with which now the F7 system is able to boot just fine. Did a fresh install of Fedora 7 from DVD today with two harddisks and RAID-1 (/dev/md*) setup. After the first reboot, the machine kernel panics because it cannot handle the RAID-1. As a lot of people correctly said, initrd is missing some files. mkinitrd on the official Fedora 7 DVD is broken for RAID/LVM systems. This can be fixed with booting from DVD in rescue mode, but there's an easier workaround. When the package installation has finished and you are expected to click on the "Reboot" button, do NOT reboot, but switch to the console shell (Ctrl-Alt-F3) and fix initrd right now: chroot /mnt/sysimage /sbin/mkinitrd -f /boot/initrd-$(uname -r).img $(uname -r) Don't know about LVM, but regular RAID (/dev/md*) doesn't need any additional options. LVM users or users with LVM+RAID may want to add options "--force-raid-probe --force-lvm-probe --with=lvm --with=raid". The new initrd now includes /bin/mdadm and other stuff needed to handle RAID devices. Now switch back to GUI (Ctrl-Alt-F5?) and click on "Reboot". Works. Because this is a very severe bug, I'm confused why this hasn't been mentioned on http://fedoraproject.org/wiki/Bugs/F7Common yet. Good idea To: sundaram See bug numbers 247415 241949 242043 They are all independent encounters with problems installing F7 on systems with RAID and or LVM Kernel panic unless you follow the instructions given in the bug reports. Bob G (In reply to comment #26) To slightly clarify Comment #26 for others who will need to follow this valuable advice (use Ctrl+Alt+F2, not ...F3 as originally posted): > but switch to the console shell (Ctrl-Alt-F2) and fix initrd right now: Yes: chroot /mnt/sysimage /sbin/mkinitrd -f --force-lvm-probe /boot/initrd-$(uname -r).img $(uname -r) will produce the desired results for a system with only LVM, e.g. as a previous Fedora install. : > Now switch back to GUI (Ctrl-Alt-F5?) and click on "Reboot". Hmmm. I found none of [Ctrl-Alt-F1] through [Ctrl-Alt-F7] led back to the installer GUI, so I tried typing "reboot" in F2's shell, and when that failed settled for "sync" followed by the power button: sh-3.2# reboot WARNING: could not determine runlevel - doing soft reboot (it's better to use shutdown instead of reboot from the command line) shutdown: /dev/initctl: No such file or directory init: /dev/initctl: No such file or directory sh-3.2# /sbin/shutdown -r now shutdown: /dev/initctl: No such file or directory init: /dev/initctl: No such file or directory sh-3.2# sync sh-3.2# sync sh-3.2# sync Then pressed the power button. System booted correctly with the repaired initrd. (In reply to comment #28) > Hmmm. I found none of [Ctrl-Alt-F1] through [Ctrl-Alt-F7] led back > to the installer GUI, so I tried typing "reboot" in F2's shell, and Thats because it's Alt-F7 not Ctrl-Alt-F7 when you are already in text console mode. You have to use use Ctrl-Alt-f# from X/graphical mode and just Alt-f# from text mode to switch to a console. is there a solution for someone who has already installed f7 and rebooted? i did not find this posting until after installing and kernel panic'ing. *** Bug 243900 has been marked as a duplicate of this bug. *** perhaps someone could post a patched initrd img? Re: Coment #30. Yes, that is how we all found our way here. Re: Comment #32. Yes, we are waiting for a respin of the distribution with fixes. Hopefully it will arrive before F8. > is there a solution for someone who has already installed f7 and rebooted? The same as in comment #26, but you boot from rescue CD first. > perhaps someone could post a patched initrd img? No, initrd is specific to your system. You should regenerate it yourself. I've upgrade from Core 6 to 7 with mirrored disk on a logical volume using an Fedora 7 DVD. I've used mdadm -Es to add the 2nd disk to mdadm.conf. I've rebuilt the initrd using the mkinitrd command (with all the --force.. and - -with=... as suggested in Bug 241949. There are no LABEL= entries in fstab but the system still can't boot. I'm getting Unable to access resume device (/dev/VolGroup00/LogVol00) mount: could not find filesystem '/dev/root' setupproot: moving /dev failed: No such file or directotry ... ... Kernal panic - not syncing: Attempted to kill init! Using the rescue image on the DVD I can determine the following: fstab has the following entries /dev/VolGroup00/LogVol00 swap swap defaults 0 0 /dev/VolGroup00/LogVol01 /home ext3 defaults 1 2 /dev/VolGroup00/LogVol02 /var ext3 defaults 1 2 /dev/VolGroup00/LogVol03 / ext3 defaults 1 1 /dev/VolGroup00/LogVol04 /backup ext3 defaults 1 2 which is how the original Core 6 configuration was configured. but in dev/VolGroup00 there are no entries for LogVol00, LogVol01, LogVol02 or LogVol04, only an entry for /dev/VolGroup00/LogVol03 Is this likely to be caused by the same sort of problem or is it another bug? Any ideas how to fix it? Check back over this bug report. Comment #10 shows a debugging process from Andy Baumhauer. Comment #12 and #14 may also be useful. Those folks edited the lvm.conf to make sure that it has the right information about virtual disks. This was necessary even though the --force-lvm-probe switch was used. Comment #25 might also be a shortcut. There appears to be several problems - related to missing config information for RAID and LVM volumes. Unwrap the initrd, check the conf files (diff against an older working file), edit your file(s) and rewrap the initrd file. I would like to move to F7 from F5 but am waiting only because of this bug. Any news on a respin date would be appreciated. In the meantime, could someone explain the work around of comment $28? I like to know what I am doing. #28 suggests: chroot /mnt/sysimage (I understand the command above but not why it is necessary) /sbin/mkinitrd -f --force-lvm-probe /boot/initrd-$(uname -r).img $(uname -r) (I don't see "--force-lvm-probe" in my man pages for mkinitrd, I don't understand the $(uname -r) syntax and it isn't documented in the man page.) The 'chroot /mnt/sysimage' command just changes your root from '/' to '/mnt/sysimage/' This means that when you do a command like vim /boot/grub/grub.conf It is actually doing vim /mnt/sysimage/boot/grub/grub.conf You can go out of the chroot state by just doing an extra 'exit' ----- the above is an oversimplification ---- Your mkinitrd is most probably too elderly to have the --force-lvm-probe option implemented. Anyway, the --force-lvm-probe option did not work for me. Do some experiments with your install disks or the live-cd, but just don't go all the way. The live-cd should have the uptodate mkinitrd which will recognize more options. There are respins available (I don't have the links at the moment), but I don't know if any have been blessed by the Fedora folks. (all the folks cc'ed to this bug are already running F7 and don't need a respin..) At least this problem is on the Fedora list of known problems... -- If you do (from your terminal now) echo "initrd-$(uname -r).img" you will see the logic of that syntax I installed Fedora7 in my other system (400 Mhz, 768mB, 2 SCSI 36G disks in RAID 1) Following the procedure in comment #10 above Install Fedora7 as an update to existing system When you come to the final boot up, choose the rescue mode. When you get to the # prompt, do the suggested 'chroot /mnt/sysimage' command. Then, follow the (slightly modified) instructions below: cd /tmp cp /boot/initrd-2.6.21-1.3194.fc7.img /tmp/initrd-2.6.21-1.3194.fc7.img.gz # Note, the .img file IS gzipped, but without the .gz extension.. gunzip initrd-2.6.21-1.3194.fc7.img.gz mkdir initrd cd initrd cpio -cid -I ../initrd-2.6.21-1.3194.fc7.img # Note the etc below is within the unpeeled initrd image cd etc cat mdadm.conf # Note only 2 lines - missing last disk description line mdadm -Es >> mdadm.conf vi mdadm.conf # Delete the duplicate first two disk description lines. # Make sure there are 3 disk lines instead of two. # (my system has /dev/md0 as /boot, /dev/md1 as swap, /dev/md2 as /dev/rootvg/root # Now copy over the mdadm.conf to the main /etc directory cp mdadm.conf /etc/mdadm.conf # Now navigate to the /boot directory and recreate the initrd file cd /boot mv initrd-2.6.21-1.3194.fc7.img initrd-2.6.21-1.3194.orig.fc7.img mkinitrd initrd-2.6.21-1.3194.fc7.img 2.6.21-1.3194.fc7 # Now exit out of chroot and the rescue shell and reboot. # Note that you can reboot without changing the /boot/grub/grub.conf file # but you may have to reboot twice and be quick about selecting the correct # image. exit exit ========= it worked for me.. I get the same symptoms because I have no swap partition but a swap file on the ext3 partition instead. (which was a mistake in hindsight because it also makes suspend to disk impossible.) At first (after setup) I fixed it using the mkinitrd commands I found in earlier comments. The problem reoccurs every time there's a kernel update however. To prevent that from happening during an update, I comment out the swap line in /etc/fstab and then do: swapoff -a yum update After the kernel has been updated, I restore /etc/fstab and reenable swap (or reboot). Hope it helps someone... What does your /etc/mdadm.conf look like? Does it make sense? Maybe a touchup there would eliminate the need for your /etc/fstab fooling around. This bug is mislabeled. In my experience mkinitrd works fine. It is just that the information supplied to mkinitrd (/etc/mdadm.conf) is bad. Gi Go Sorry but mkinitrd does *not* work on my server. I need to manually rebuild the initrd using gzip/cpio, editing "init" to include the lvm command, and manually adding the lvm.conf and lvm binary to the initrd. I have to do this every time, from the original Fedora 7 distributed kernel, up to and including 2.6.22.1-41.fc7 and mkinitrd-6.0.9-7.1. This was a clean install, as was the last two versions of Fedora on this server (FC6 and FC5), neither of which exhibited this problem. I do not have, nor have I ever had an "/etc/mdadm.conf". If I'm supposed to, then it was not created during the install, and I have no idea what it should contain (I'm not using any sort of RAID). Perhaps this is the root of the problem. The "/etc/lvm/lvm.conf" is correct and does work, once it is manually added to the initrd. Using the --with=lvm and --force-lvm-probe flags with mkinitrd has zero effect on the problem. If I manually rebuild the initrd in this way, I can get the server to boot, otherwise (i.e. with every kernel update) the kernel panics trying to find the rootfs. Whatever it is that mkinitrd uses to grok LVM, it simply isn't working on my server (my guess would be a bug in nash). I can confirm that the lvm (and lvm.static) commands do correctly identify the LVM, obviously since it does actually work (once manually inserted). Specifically, I need to manually add the following commands to init: mkdmnod lvm vgscan --ignorelockingfailure lvm vgchange -ay --ignorelockingfailure cumulous And change the following line from: mkrootdev -t ext3 -o defaults,ro dm-0 To: mkrootdev -t ext3 -o defaults,ro /dev/cumulous/Eagle Then I need to: cp /sbin/lvm.static initrd/sbin/lvm mkdir initrd/etc/lvm cp /etc/lvm/lvm.conf initrd/etc/lvm/ Then I cpio/gzip it up, copy it to /boot and edit grub.conf accordingly. This is very frustrating. That is curious. One would think that the most 'tested' configuration would be a bare disk, default partitions, default install. This results in an lvm volume for root. How much different can your system be from the case where you 'upgrade' an existing default disk configuration (with lvm root partition). Your statement "The "/etc/lvm/lvm.conf" is correct and does work, once it is manually added to the initrd." indicates that whatever process is used to create /etc/lvm/lvm.conf prior to the mkinitrd stage, is not working. Do you have an old system around there that you can load up F7 on? Tell anaconda to wipe disk and create default partition layout. Then compare resulting configuration files (/etc/lvm/lvm.conf..) with your system. You could also :-) install a default FC6 and then upgrade it to F7 to see if that works. If not, it is a sad day for Fedora. I tried all the above several times (upgrade, clean, check diffs on configs, etc), it made no difference. AFAICT the "/etc/lvm/lvm.conf" file is a boilerplate that does not differ from one system to another. I havn't modified it from the original (installed by Anaconda/RPM). Within the same version of the LVM package, the lvm.conf files are all identical. The version that comes with Fedora 7 only differs from the FC6 version by two or three lines, all just comment lines AFAICT. Somehow I don't think the lvm.conf file has anything to do with this problem. Like I said, LVM works on this system, it's just mkinitrd that stubbornly refuses to include the necessary LVM components when creating the initrd. The "process used to create lvm.conf" is (AFAIK) Anaconda/RPM during install, and does not change after that. This has worked on this server for the two previous versions of Fedora, but now doesn't. Your words from comment #44 above: "Somehow I don't think the lvm.conf file has anything to do with this problem. Like I said, LVM works on this system, it's just mkinitrd that stubbornly refuses to include the necessary LVM components when creating the initrd. The "process used to create lvm.conf" is (AFAIK) Anaconda/RPM during install, and does not change after that. This has worked on this server for the two previous versions of Fedora, but now doesn't." Yes, I think you have it. mkinitrd cannot include lvm.conf if it is not in the pot of data that mkinitrd wraps up into initrd. I had somewhat the same problem: Background: Installed FC7 x64 from Live CD and upon rebooting I got an error that read: device-mapper: table: 253:0:0 striped: Couldn't parse stripe destination and from here other errors about how LVM can't find the volumes, etc. My setup: running a 3 drive ICH7-based hardware RAID-0 with the default partition layout created during install After hours of trying everything I could think of to fix it (including recompiling the newest kernel, trying an older kernel (2.6.20-1), messing with mkinitrd [cpio,gzip...], repartitioning the system to run / on a non-LVM partition...etc), what solved it for me was installing FC7 again, but this time from DVD instead of the Live CD. Hopefully this will help others running into the same problem. I can't explain why it works, but the DVD install did the trick for me while nothing else worked. Good luck. I did use the i386 DVD. The LiveCD simply doesn't work at all on this server (sponteneous reboot, no error message). I've triple checked the hardware, which I seriously doubt is the issue. Like I said, this system worked (and still works) perfectly with FC5 and 6. It also works perfectly with Fedora 7 ... once I take the manual steps required above. IMHO nash is broken (at least for this configuration). What I would like to do, is step trace through the mkinitrd script, in verbose mode, carefully examining the output from each stage (i.e. the intermediate nash probes), to see exactly what results are being returned from from nash. The only problem is that I would need to essentially "disassemble" the script and rewrite it to "probe and echo" only, and this is a time-consuming and involved process. Merely specifying the "-v" flag will not return the intermediate nash values. I may take a crack at it later in the week, when I'm less busy. As a quicky, you might try sh -x /sbin/mkinitrd <args> And then redirect the debug output to a file, and then look it over and see what is happening. Created attachment 161151 [details]
mkinitrd log file
OK, I've nailed it. Short answer: mkinitrd assumes the LVM group *label* is the same as the LVM *block device* name, which in my case, it isn't (I gave it a name). Longer answer: ++ awk '/^[ \t]*[^#]/ { if ($2 == "/") { print $1; }}' /etc/fstab + rootdev=LABEL=Eagle + '[' ext3 == nfs -a x == x ']' + '[' LABEL=Eagle '!=' Eagle -o LABEL=Eagle '!=' LABEL=Eagle ']' ++ echo defaults ++ sed -e 's/^r[ow],//' -e s/,_netdev// -e s/_netdev// -e 's/,r[ow],$//' -e 's/,r[ow],/,/' -e 's/^r[ow]$/defaults/' -e 's/$/,ro/' + rootopts=defaults,ro ++ resolve_device_name LABEL=Eagle ++ /sbin/nash --forcequiet ++ echo nash-resolveDevice LABEL=Eagle + devname=/dev/mapper/cumulous-Eagle ++ get_numeric_dev dec /dev/mapper/cumulous-Eagle + majmin=253:0 + '[' -n 253:0 ']' ++ findall /sys/block -name dev ++ echo nash-find /sys/block -name dev ++ sed -e 's,.*/\([^/]\+\)/dev,\1,' ++ read device ++ /sbin/nash --force --quiet ++ echo 253:0 [snip lots of block devs] ++ cmp -s /sys/block/dm-0/dev ++ echo /sys/block/dm-0/dev ++ read device ++ echo 253:0 [snip lots more block devs] + dev=dm-0 + '[' -n dm-0 ']' + vecho 'Found root device dm-0 for LABEL=Eagle' + NONL= + '[' 'Found root device dm-0 for LABEL=Eagle' == -n ']' + '[' -n -v ']' + echo 'Found root device dm-0 for LABEL=Eagle' Found root device dm-0 for LABEL=Eagle + rootdev=dm-0 + '[' ext3 '!=' nfs ']' + handlelvordev dm-0 ++ lvshow dm-0 ++ lvm.static lvs --ignorelockingfailure --noheadings -o vg_name dm-0 ++ egrep -v '^ *(WARNING:|Volume Groups with)' + local vg= Oops. IOW: [root@sky ~]# lvs --ignorelockingfailure --noheadings -o vg_name dm-0 Volume group "dm-0" not found There *is* no volume group called dm-0, that is merely the block device name, the LVM group *label* is "cumulous" (in my case): [root@sky ~]# lvs --ignorelockingfailure LV VG Attr LSize Origin Snap% Move Log Copy% Eagle cumulous -wi-ao 10.00G home cumulous -wi-ao 10.00G scratch cumulous -wi-ao 20.00G shared cumulous -wi-ao 174.00G usr cumulous -wi-ao 5.00G var cumulous -wi-ao 76.00G So the value of "vg=" becomes null, and the rest is history. So the question is, why has this behaviour changed in mkinitrd (it used to work)? My LVM group has not changed since FC5, where this setup worked fine. I've attached the full log file, for the curious. Meanwhile I'll have to hardcode the *actual* volume group *label* into mkinitrd, and see if that cures the problem. Very nice. I think you have gone a long way to solving the problem. Following your path, I did some experiments on my system: [root@hoho2 log]# /usr/sbin/lvs --ignorelockingfailure LV VG Attr LSize Origin Snap% Move Log Copy% root rootvg -wi-ao 64.50G [root@hoho2 log]# /usr/sbin/lvs --ignorelockingfailure --noheadings -o vg_name rootvg [root@hoho2 log]# /usr/sbin/lvs --ignorelockingfailure --noheadings -o lv_name root [root@hoho2 log]# Note that when you leave out the dm-0 at the end of the last command, lvs comes up with the right answer. Looking at 'man lvs' it seems as though 'dm-0' is not needed at all (and just messes things up). Maybe that was a change in lvs - previously it may have ignored extra arguments, now it is picky. The (new) code should also consider the case where the user has more than one volume group. I think there may be more than one problem though. Your system exercised the path where you have an LVM setup, but no RAID. I have a RAID system (also LVM). When I corrected the /etc/dmadm.conf file, the mkinitrd code followed a different path and did not stumble on the problem you found. Well that would be a problem on my system, since I don't have dmadm.conf at all, therefore mkinitrd cannot depend on it for the correct path. As I indicated earlier, this may be the root of the problem. Perhaps I *should* have a dmadm.conf, and therefore this problem wouldn't exist (was an assumption made by the maintainers of mkinitrd?). RAID issues asside, I think I've just solved the problem (for LVM anyway, and possibly for RAID / dmadm too): "dm-0" is not a boilerplate, it is groked from the following function: findstoragedriver () { for device in $@ ; do case " $handleddevices " in *" $device "*) continue ;; *) handleddevices="$handleddevices $device" ;; esac if [[ "$device" =~ "md[0-9]+" ]]; then vecho "Found RAID component $device" handleraid "$device" continue fi vecho "Looking for driver for device $device" sysfs=$(findone -type d /sys/block -name $device) [ -z "$sysfs" ] && return pushd $sysfs >/dev/null 2>&1 findstoragedriverinsys popd >/dev/null 2>&1 done } Now check the following carefully: lvshow() { lvm.static lvs --ignorelockingfailure --noheadings -o vg_name \ $1 2>/dev/null | head -n 1 | egrep -v '^ *(WARNING:|Volume Groups with)' } vgdisplay() { lvm.static vgdisplay --ignorelockingfailure -v $1 2>/dev/null | sed -n 's/PV Name//p' } handlelvordev() { local vg=$(lvshow $1) if [ -n "$vg" ]; then vg=`echo $vg` # strip whitespace case " $vg_list " in *" $vg "*) ;; *) vg_list="$vg_list $vg" for device in $(vgdisplay $vg) ; do findstoragedriver ${device##/dev/} done ;; esac else findstoragedriver ${1##/dev/} fi } The problem is, that looking in /sys/block/ will just return the block device dm-0 (from the id number), e.g. in my case it is looking for the block device number "253:0", which *is* dm-0: [root@sky ~]# cat /sys/block/dm-0/dev 253:0 But lvs only works with volume group *names*, not block device names (unless they *happen* to be the same). So it passes "dm-0" to lvs, and naturally it cannot find a volume group with that name (which on my system is actually labeled "cumulous"). How about this?: [root@sky ~]# rootdev=$(lvs | grep $(awk '/^[ \t]*[^#]/ { if ($2 == "/") { print $1; }}' /etc/fstab | sed 's/LABEL=//') | awk '{ print "/dev/"$2"/"$1}') [root@sky ~]# echo $rootdev /dev/cumulous/Eagle Hmm, but that makes two assumptions; 1) that the rootdev is on an LVM and 2) that the rootfs is denoted by a "LABEL=" in fstab. I think the key to this is how the following works: devname=$(resolve_device_name $rootdev) majmin=$(get_numeric_dev dec $devname) if [ -n "$majmin" ]; then dev=$(findall /sys/block -name dev | while read device ; do \ echo "$majmin" | cmp -s $device && echo $device ; done \ | sed -e 's,.*/\([^/]\+\)/dev,\1,' ) if [ -n "$dev" ]; then vecho "Found root device $dev for $rootdev" rootdev=$dev It needs to change to include a check for mapper devices, so you'd replace the last line ("rootdev=$dev") with something like this: ###### mapper=$(lvdisplay -c | grep "$majmin") if [ $(echo "$mapper" | cut -d':' -f12- | grep -q "$majmin"; echo $?) = 0 ] then rootdev=$(echo "$mapper" | cut -d':' -f1) else rootdev=$dev fi ###### ^^^ Is that the fix? ^^^ Perhaps something similar can be added to check for dmadm RAID systems. Sorry, I'm a bit tired. Of course you could replace that with just this: mapper=$(lvdisplay -c | grep "$majmin") if [ $? = 0 ] then rootdev=$(echo "$mapper" | cut -d':' -f1) else rootdev=$dev fi Also, I've just been playing around with mdadm (although I don't have RAID), and I'd guess that the following could be used in a similar fashion to the above: mdadm -D "$dev" It works!!! Now if someone (Bob Gustafson from comment #53) could send me their output from: mdadm -D "$dev" mdadm -Q "$dev" Then I think we can sew this one up. Bump. Any comments on the fix proposed in comment #55? Could this be rolled into a mkinitrd release? In reply to comment #56 for more information: [root@hoho2 ~]# /sbin/mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Wed Apr 26 14:29:30 2006 Raid Level : raid1 Array Size : 104320 (101.89 MiB 106.82 MB) Used Dev Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sat Aug 25 23:56:30 2007 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 47bba70b:b76ffd5f:816f55b8:cf2ee184 Events : 0.1206 Number Major Minor RaidDevice State 0 8 17 0 active sync /dev/sdb1 1 8 1 1 active sync /dev/sda1 [root@hoho2 ~]# [root@hoho2 ~]# /sbin/mdadm --detail /dev/md1 /dev/md1: Version : 00.90.03 Creation Time : Wed Apr 26 14:29:51 2006 Raid Level : raid1 Array Size : 3911744 (3.73 GiB 4.01 GB) Used Dev Size : 3911744 (3.73 GiB 4.01 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Sat Aug 25 23:52:39 2007 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 36c22074:b238a704:85d99d8b:e9dafa99 Events : 0.2814 Number Major Minor RaidDevice State 0 8 18 0 active sync /dev/sdb2 1 8 2 1 active sync /dev/sda2 [root@hoho2 ~]# [root@hoho2 ~]# /sbin/mdadm --detail /dev/md2 /dev/md2: Version : 00.90.03 Creation Time : Wed Apr 26 14:30:10 2006 Raid Level : raid1 Array Size : 67665664 (64.53 GiB 69.29 GB) Used Dev Size : 67665664 (64.53 GiB 69.29 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Sun Aug 26 22:11:18 2007 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 49f39f6e:b3fbca37:a77d34bb:e5975c53 Events : 0.6382018 Number Major Minor RaidDevice State 0 8 19 0 active sync /dev/sdb3 1 8 3 1 active sync /dev/sda3 [root@hoho2 ~]# Also [root@hoho2 ~]# /usr/sbin/lvdisplay --- Logical volume --- LV Name /dev/rootvg/root VG Name rootvg LV UUID ZNEpYD-qv0J-ohPA-27mD-NwnD-M7EO-W9q6pi LV Write Access read/write LV Status available # open 1 LV Size 64.50 GB Current LE 16512 Segments 1 Allocation inherit Read ahead sectors 0 Block device 253:0 [root@hoho2 ~]# (I think someone should send you a pair of disks so you can play with Raid too..) I think you are making progress. Since you are using both RAID and LVM, then AFAICT the fix in comment 55 should work as is, in your case (i.e. "/dev/rootvg/root"). What about the case where someone is using /only/ RAID? I'm flying blind here, since I have no disks spare to play with RAID. I'm just trying to wrap my head around this. Scenario... You have two disks, sda and sdb. You configure them as RAIDx, dm-0 You then create multiple filesystems on dm-0. On systems /without/ LVM, do these partitions get referenced as /dev/mapper/xxx1, /dev/mapper/xxx2, etc.? Or is there some other convention? What does the kernel line look like in grub.conf, on such systems? Can one /label/ the RAID device, like one does with LVM? I guess what I'm really getting at, is there any reason why a wrong/incomplete mdadm.conf file would cause mkinitrd to /not/ find the rootfs? What does a typical mdadm.conf look like? Look again at this piece of code, it is crucial: devname=$(resolve_device_name $rootdev) majmin=$(get_numeric_dev dec $devname) if [ -n "$majmin" ]; then dev=$(findall /sys/block -name dev | while read device ; do \ echo "$majmin" | cmp -s $device && echo $device ; done \ | sed -e 's,.*/\([^/]\+\)/dev,\1,' ) if [ -n "$dev" ]; then vecho "Found root device $dev for $rootdev" rootdev=$dev Under what circumstances would that /not/ be able to find the rootfs on a RAID /only/ system? My gut feeling is that the /correct/ place to look is *still* /dev/mapper, rather than /sys/block, although /sys/block/dm-0/dev would indeed give the correct DevID to compare against /dev/mapper/xxx. So assuming the above code can in fact return the correct DevID of the rootfs, we need to make a check comparison between this and the output of mdadm --detail "$dev". But that returns the UUID rather than the Maj:Min number. So how does one convert a UUID into a Maj:Min DevID? Should we even be doing this, or should we assume that mdadm.conf is always correct (and if it isn't then that's a separate bug)? Too many questions. Stack overflow. [root@hoho2 ~]# ls /sys/block dm-0 md0 md2 ram1 ram11 ram13 ram15 ram3 ram5 ram7 ram9 sdb fd0 md1 ram0 ram10 ram12 ram14 ram2 ram4 ram6 ram8 sda sr0 [root@hoho2 ~]# ls /sys/block/dm-0 capability holders removable slaves subsystem dev range size stat uevent [root@hoho2 ~]# cat /sys/block/dm-0/size 135266304 [root@hoho2 ~]# cat /sys/block/dm-0/dev 253:0 [root@hoho2 ~]# cat /sys/block/md0/dev 9:0 [root@hoho2 ~]# cat /sys/block/md1/dev 9:1 [root@hoho2 ~]# cat /sys/block/md2/dev 9:2 [root@hoho2 ~]# cat /sys/block/sda/dev 8:0 [root@hoho2 ~]# cat /sys/block/sdb/dev 8:16 [root@hoho2 ~]# cat /sys/block/fd0/dev 2:0 [root@hoho2 ~]# ls /dev/mapper control rootvg-root [root@hoho2 ~]# [root@hoho2 ~]# ls -l /dev/mapper/rootvg-root brw-rw---- 1 root disk 253, 0 2007-08-25 23:56 /dev/mapper/rootvg-root [root@hoho2 ~]# ====================== Looks like there is more information in/dev/mapper cat /dev/mapper/rootvg-root isn't so useful though.. Executing your code, I get [root@hoho2 ~]# sh -x bug.sh ++ resolve_device_name bug.sh: line 1: resolve_device_name: command not found + devname= ++ get_numeric_dev dec bug.sh: line 2: get_numeric_dev: command not found + majmin= bug.sh: line 10: syntax error: unexpected end of file [root@hoho2 ~]# Sorry, I didn't make it clear. Those are "nash" commands, part of mkinitrd. To use my changes, you'd need to apply the following patch: --- /sbin/mkinitrd.old 2007-08-13 06:10:50.000000000 +0100 +++ /sbin/mkinitrd 2007-08-13 08:43:11.000000000 +0100 @@ -1020,7 +1020,11 @@ | sed -e 's,.*/\([^/]\+\)/dev,\1,' ) if [ -n "$dev" ]; then vecho "Found root device $dev for $rootdev" - rootdev=$dev + mapper=$(lvdisplay -c | grep "$majmin") + if [ $? = 0 ]; then + rootdev=$(echo "$mapper" | cut -d':' -f1) + else rootdev=$dev + fi fi fi else Then run mkinitrd in the usual way. An examination (gunzip|cpio) of the new initrd's init file should reveal whether or not it works (or just reboot using the new initrd). [root@hoho2 ~]# uname -r 2.6.22.4-65.fc7 [root@hoho2 ~]# ./mkinitrd.new initrd-new.img 2.6.22.4-65.fc7 [root@hoho2 ~]# ls -l initrd-new.img -rw------- 1 root root 3804885 2007-08-27 00:14 initrd-new.img [root@hoho2 ~]# ls -l /boot/initrd-2.6.22.4-65.fc7.img -rw------- 1 root root 3803798 2007-08-25 23:51 /boot/initrd-2.6.22.4-65.fc7.img [root@hoho2 ~]# Looks like there is a size difference. Note - At the moment, I don't have a problem with new kernels. They just work. There is no problem with booting. The initrd in /boot works fine for me. I think just correcting the /etc/mdadm.conf in my original installation (update) of F7 has stuck and is providing enough (correct) information so that the stock mkinitrd works. My /etc/mdadm.conf is below: [root@hoho2 ~]# cat /etc/mdadm.conf # mdadm.conf written out by anaconda DEVICE partitions MAILADDR root ARRAY /dev/md0 level=raid1 num-devices=2 uuid=47bba70b:b76ffd5f:816f55b8:cf2ee184 ARRAY /dev/md1 level=raid1 num-devices=2 uuid=36c22074:b238a704:85d99d8b:e9dafa99 ARRAY /dev/md2 level=raid1 num-devices=2 UUID=49f39f6e:b3fbca37:a77d34bb:e5975c53 [root@hoho2 ~]# The line saying 'written out by anaconda' is of course BS because I had to use mdadm -Es see comment #18 I am going to sleep now. Since my problem stems from a bug in anaconda, I don't think that any patch applied to mkinitrd will help my initial no boot coming out of an install (running anaconda). Fixing the symptom of the anaconda problem (by running mdadm -Es and tucking the output into /etc/mdadm.conf) did the trick for me. Anaconda is the least tested of all of the pieces. It only runs at install. mkinitrd on the other hand probably runs at every update of a kernel for every system out there. Created attachment 173281 [details]
mkinitrd LVM rootfs patch
+1 submit for testing
FYI, this patch fixed my system, which is running both LVM and raid. Thanks! I'm waiting for Fedora 8 :-) I just successfully did an install with root on LVM on raid with current rawhide and everything worked. There have definitely been some fixes in those areas, so closing NEXTRELEASE |