241949 – (With patch to fix problem) F7 setup installs bad initrd, fails to boot after install

Bug 241949 - (With patch to fix problem) F7 setup installs bad initrd, fails to boot after install

Summary: (With patch to fix problem) F7 setup installs bad initrd, fails to boot after...

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	mkinitrd
Sub Component:
Version:	7
Hardware:	i386
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	---
Assignee:	Peter Jones
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	243900 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-05-31 21:19 UTC by Jan Hlavaty
Modified:	2007-11-30 22:12 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2007-10-29 18:27:51 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
initrd.log from mkinitrd -v ... (8.56 KB, text/plain) 2007-06-01 19:37 UTC, Bob Gustafson	no flags	Details
mkinitrd log file (199.43 KB, text/plain) 2007-08-13 04:26 UTC, Keith G. Robertson-Turner	no flags	Details
mkinitrd LVM rootfs patch (577 bytes, patch) 2007-08-27 06:37 UTC, Keith G. Robertson-Turner	no flags	Details \| Diff
View All

Description Jan Hlavaty 2007-05-31 21:19:03 UTC

Description of problem:

After installing F7 on a RAID+LVM root bolume, system will not find something in
initrd, leading to not finding its root volume and kernel panic.

Workaround/fix: Boot the rescue disk, chroot to the system and rebuild initrd.

Version-Release number of selected component (if applicable):

Release version of F7 i386 DVD with kernel 2.6.21-1.3194.fc7

How reproducible:

Install having root on SW RAID1 LVM volume. I actually have quite complex setup
- two volume groups, one being on RAID1 and other on RAID0. RAID1 one contains
/, RAID0 /tmp. Plus there is two RAID1s for /boot and swap.

Steps to Reproduce:
1. Install.
2. Reboot.
3. Panic.
  
Actual results:

Boot hangs in kernel panic having not found the root file system.

Expected results:

Normal boot.

Additional info:

Having diffed the bad and good version of the initrd, I found these differences:

file /bin/mdadm is missing in bad initrd.
lib  /lib/ld-lsb.so.3 is missing in bad initrd.

Maybe it is some problem with installation order? I believe initrd is generated
when installing kernel package. If packages containing the above files (mdadm,
redhat-lsb) were not installed yet at the moment kernel package was installed,
it could result in the above symptoms.

Maybe it would be nice to regenerate the initrd as the last step of the setup?

Comment 1 Bob Gustafson 2007-06-01 15:37:30 UTC

See also bug 242043

I guess I have to rebuild my initrd..

Do you have any specific hints on this process?

Comment 2 Bob Gustafson 2007-06-01 16:21:20 UTC

I rebuilt my initrd with the following command:

mkinitrd /boot/initrd-2.6.21-1.3194.fc7-fixed.img 2.6.21-1.3194.fc7

Then edited grub/grub.conf to load the modified file.

The result is 6 bytes smaller than the original - and it also does not boot.

Comment 3 Bob Gustafson 2007-06-01 19:37:00 UTC

Created attachment 155927 [details]
initrd.log from mkinitrd -v ...

Comment 4 Bob Gustafson 2007-06-01 19:42:16 UTC

I used the command:

mkinitrd -v --force-raid-probe --force-lvm-probe
/boot/initrd-2.6.21-1.3194.fc7-fixed2.img 2.6.21-1.3194.fc7 2>&1 | tee
/boot/initrd.log

and <b>still no successful boot</b>. Attached is the file 'initrd.log'

It does contain the mdadm and ld-lsb.so.3 files

The output of the mkinitrd is shown as the attachment above

Comment 5 Bob Gustafson 2007-06-02 03:43:45 UTC

See bug 237415

Problem may be with bad mkinitrd (kernel > 2.6.20-1.3069)

Comment 6 Bob Gustafson 2007-06-02 12:30:57 UTC

See 3rd from last comment in bug 237415

Solve due to bbaetz

Comment 7 Keith G. Robertson-Turner 2007-06-03 14:14:38 UTC

I'm not using mdadm (no RAID), therefore the "adding mdadm -Es output to
/etc/mdadm.conf" solution has no effect.

I rebuilt the initrd as above ... also no effect, I still can't boot.

kernel-2.6.21-1.3194.fc7 as above, and using LVM.

The same setup boots fine with FC5.

All filesystems do have e2labels and the correct entries in fstab.

The system in question is a server, so I'm very keen to get this working ASAP,
as you can imagine.

Further deatils if required.

Comment 8 Bob Gustafson 2007-06-03 18:09:05 UTC

One of the comments made on one of the referenced bugs seemed to indicate that
fstab files with LABELs were a problem for a piece of software in the initial
install chain (which ?? - anaconda?, mkinitrd as used by anaconda - may be
different from mkinitrd used 'in the wild')

You might try editing out the LABELs in fstab and then do mkinitrd (and change
name in grub/grub.conf to reflect the changed name of initrd<..>.img

What are your symptoms? Are you getting /dev/rootvg/root/ not found?

Comment 9 Keith G. Robertson-Turner 2007-06-03 21:00:12 UTC

I get:

Creating root device.
Mounting root filesystem.
mount: could not find filesystem '/dev/root'

Ref: grub.conf. I've tried using either labels or device names, no luck either way.

I've finally managed to get my server to boot F7, but it's an ugly hack, however
it is also quite revealing.

I had the foresight to back up my (FC5) /boot filesys from the previous install
on the the same machine (and exactly the same disk layout), so I thought I'd try
something a bit hacky. I --force installed kernel-2.6.20-1.2316.fc5 then
rebooted. Same problem couldn't find /dev/root. So then I copied my backup of
the original initrd-2.6.20-1.2316.fc5.img over the current one and ... success.
The system will now boot, albeit with an old kernel.

IOW mkinitrd is b0rked ... badly - at least with LVM systems. And I can confirm
that the mkinitrd usee *post install* is also farked, so it's not just Anaconda.

I could do a fresh initrd and post it, along with the working one, as an
attachment here, if you'd like to examine them.

Comment 10 Bob Gustafson 2007-06-03 22:02:59 UTC

Look through bug 237415 for background, it is filed under mkinitrd rather than
anaconda.

------------------------------

I think your problem is due to some config file that is just not written out
correctly by anaconda prior to the creation of initrd for your system. In my
case, it was the /etc/mdadm.conf file, but your case is different as you are not
running raid.

Look closely at /etc/lvm/lvm.conf, particularly the differences between the file
generated by anaconda and the file in your FC5 system.

---------------------------
See the process by Andy Baumhauer in the note
http://fcp.surfsite.org/modules/newbb/viewtopic.php?viewmode=threaded&order=DESC&topic_id=36690&forum=12&move=prev&topic_time=1178039157

Below is his debugging process. You can use this process to unpeel the contents
of the initrd file.

"I debugged my problem by:

cd /tmp
cp /boot/initrd-<kernel release version>.img /tmp/initrd-<kernel release
version>.img.gz
gunzip initrd-<kernel release version>.gz
mkdir initrd
cd initrd
cpio -cid -I ../initrd-<kernel release version>.img

now examine the init nash script against a working script (from CentOS).

My bug is https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=237415

The module to file against is most likely mkinitrd, if the problem I had
is causing your problem.

I hope that this points you in the right direction.

Andy"

Comment 11 Keith G. Robertson-Turner 2007-06-03 22:51:57 UTC

Here's the diff for lvm.conf. This is all a foreign language to me, so maybe you
can make sense of it:

--- /etc/lvm/lvm.conf   2007-03-19 21:54:11.000000000 +0000
+++ /mnt/WD_Passport/sky.backup/etc/lvm/lvm.conf        2007-01-02
23:07:29.000000000 +0000
@@ -56,14 +56,10 @@
     # filter = [ "a|^/dev/hda8$|", "r/.*/" ]
 
     # The results of the filtering are cached on disk to avoid
-    # rescanning dud devices (which can take a very long time).
-    # By default this cache is stored in the /etc/lvm/cache directory
-    # in a file called '.cache'.
-    # It is safe to delete the contents: the tools regenerate it.
-    # (The old setting 'cache' is still respected if neither of
-    # these new ones is present.)
-    cache_dir = "/etc/lvm/cache"
-    cache_file_prefix = ""
+    # rescanning dud devices (which can take a very long time).  By
+    # default this cache file is hidden in the /etc/lvm directory.
+    # It is safe to delete this file: the tools regenerate it.
+    cache = "/etc/lvm/.cache"
 
     # You can turn off writing this cache file by setting this to 0.
     write_cache_state = 1
@@ -83,12 +79,6 @@
     # software RAID (md) devices by looking for md superblocks.
     # 1 enables; 0 disables.
     md_component_detection = 1
-
-    # If, while scanning the system for PVs, LVM2 encounters a device-mapper
-    # device that has its I/O suspended, it waits for it to become accessible.
-    # Set this to 1 to skip such devices.  This should only be needed
-    # in recovery situations.
-    ignore_suspended_devices = 0
 }
 
 # This section that allows you to configure the nature of the
@@ -192,9 +182,6 @@
     # command.  Defaults to off.
     test = 0
 
-    # Default value for --units argument
-    units = "h"
-
     # Whether or not to communicate with the kernel device-mapper.
     # Set to 0 if you want to use the tools to manipulate LVM metadata 
     # without activating any logical volumes.

Comment 12 Keith G. Robertson-Turner 2007-06-03 23:23:32 UTC

Well here's the answer ...

--- working/initrd/init 2007-06-03 23:56:32.000000000 +0100
+++ borked/initrd/init  2007-06-03 23:57:29.000000000 +0100
[snip]
-echo Scanning logical volumes
-lvm vgscan --ignorelockingfailure
-echo Activating logical volumes
-lvm vgchange -ay --ignorelockingfailure  cumulous
 echo Creating root device.
-mkrootdev -t ext3 -o defaults,ro /dev/cumulous/Eagle
+mkrootdev -t ext3 -o defaults,ro dm-0

mkinitrd is not creating the necessary lvm commands in nash, and that's even
*with* the --force-lvm-probe switch. Also it looks like the mkrootdev entry is
wrong.

I take it that I can simply edit this, then cpio/gzip it back up again and use
it to boot with, right?

Comment 13 Bob Gustafson 2007-06-03 23:58:52 UTC

Hey, give it a try (it's not my system :-) )

Seriously - check the version numbers of lvm between FC5 and F7. (That is a big
gap). If different, check the man pages for both versions (if these are still
available..)

There may be syntax and command variations between the two versions which would
mess up your simple insert.

It does appear that the steps preceeded by '-' are necessary. Also the '+' step
appears to be for a raid system (dm-0). I wonder how that got there?

Good luck.

Comment 14 Keith G. Robertson-Turner 2007-06-04 01:16:19 UTC

It worked!!!

Here's a summary of what I did:

diff -ur borked-fc7/initrd working-fc7/initrd | grep -v "special file"

Only in working-fc7/initrd/bin: lvm
Only in working-fc7/initrd/etc: lvm
diff -ur borked-fc7/initrd/init working-fc7/initrd/init
--- borked-fc7/initrd/init      2007-06-04 01:49:52.000000000 +0100
+++ working-fc7/initrd/init     2007-06-04 01:07:35.000000000 +0100
@@ -75,12 +75,18 @@
 insmod /lib/dm-zero.ko
 echo "Loading dm-snapshot.ko module"
 insmod /lib/dm-snapshot.ko
+echo Making device-mapper control node
+mkdmnod
 insmod /lib/scsi_wait_scan.ko
 rmmod scsi_wait_scan
 mkblkdevs
+echo Scanning logical volumes
+lvm vgscan --ignorelockingfailure
+echo Activating logical volumes
+lvm vgchange -ay --ignorelockingfailure  cumulous
 resume LABEL=SWAPSPACE2
 echo Creating root device.
-mkrootdev -t ext3 -o defaults,ro dm-0
+mkrootdev -t ext3 -o defaults,ro /dev/cumulous/Eagle
 echo Mounting root filesystem.
 mount /sysroot
 echo Setting up other filesystems.
Only in working-fc7/initrd/sbin: lvm

I used the lvm.static in /sbin, and the lvm.conf in /etc/lvm/ to replace those
missing files in the initrd (presumably that is what mkinitrd is supposed to do
anyway). I edited the init as indicated above, then cpio'ed and gzipped the
file, then moved it to boot, and edited grub. Voila!

Now, if someone could please investigate why mkinitrd is not doing this
automatically ... pretty please.

This has got to be the longest OS install I've ever done. What I initially
thought would take me about 30 minutes, has actually taken me 3 days!!!

Should this bug get moved to mkinitrd, or should I file a new one?

Comment 15 Bob Gustafson 2007-06-04 02:03:10 UTC

Congratulations.

You could probably add a reference to this bug to bug 237415 with the
surrounding details (no raid, but lvm).

I wonder whether the real problem is not in mkinitrd, but somewhere in the chain
of commands performed by anaconda to prepare the files for mkinitrd to pack up.

Anaconda is run very infrequently by the user classes.  It is so hard to get
good testers these days.

Comment 16 Keith G. Robertson-Turner 2007-06-04 10:48:50 UTC

Will have a look at bug 237415.

Also note my comment 9 above, this mkinitrd problem happens *post-install* as
well, so this is *not* Anaconda specific.

Comment 17 Phil Knirsch 2007-06-04 12:29:46 UTC

Reassigning to proper component.

Read ya, Phil

Comment 18 Bob Gustafson 2007-06-04 14:42:10 UTC

I was able to use mkinitrd to rebuild my initrd - no problem.

But only after I supplied the correct mdadm.conf with the command 'mdadm -Es'

That is why I think the problem is outside of mkinitrd.

It is also quite possible that there are more than one buggy component.

Comment 19 Keith G. Robertson-Turner 2007-06-04 20:42:04 UTC

I had a brief look at mkinitrd just out of curiosity, but there's a lot to
absorb so it'll take me some time. To state the obvious, it seems mkinitrd has a
problem with both LVM *and* mdadm. Whether this is because of a change to the
mkinitrd script itself, or something it relies on, is a matter for investigation.

One titbit of info I neglected to reveal before, is that during my "hack"
session, one of the things I tried was --force installing the FC5 version of
mkinitrd. I can tell you that an initrd made on F7 with the FC5 version of
mkinitrd also produces the same problem (no LVM components in the initrd).

I'm going to look at what mkinitrd uses externally, and see what's happening. I
hope it's not python though. I suck at python :)

Comment 20 Bob Gustafson 2007-06-05 00:06:12 UTC

Your comment " I can tell you that an initrd made on F7 with the FC5 version of
mkinitrd also produces the same problem (no LVM components in the initrd)."

Could indicate a long-standing bug in mkinitrd.

But, if mkinitrd just packs up data and code placed by other parts of the
anaconda chain, if that data was garbage, a totally correct mkinitrd would still
pack up that garbage. The old GIGO.

Comment 21 Bill Gertz 2007-06-06 11:08:46 UTC

I too, am having the very same problem (running RAID1 on /boot, RAID1 + LVM 
for other partitions). This worked without a problem with fc6 but on f7 it 
chokes as Jan as describes and Bob and Keith have discussed. So I didn't see 
this problem with mkinitrd with fc6 using the very same setup - fc6 installed 
and ran without a hitch. 

I'll give a go at unpacking, hacking and repacking my initrd based on the 
discussion above. Unfortunately, I chalked up my problems as my install error/ 
upgrade error and did a fresh install (fc6 was a new install so not much was 
lost - little did I realize....) so have no working initrd to compare with.

I am surprised at bug priority of LOW; it seems inappropriate since this 
pretty much kills f7 as a server platform using RAID + LVM.

Comment 22 Bob Gustafson 2007-06-06 13:38:07 UTC

Yes, the priority should be higher, but that does not seem to be a field that
can be easily changed by peons.

I had problems with RAID + LVM on FC6 as well, but only because anaconda would
not accept a degraded array (one disk, not two for RAID1). It ran perfectly well
in a degraded state. I found a disk on the internet that matched my dead drive
and voile, I was able to install FC6.

The F7 situation is much more serious. The Fedora folks were talking about
various different 'spins' of the install set for F7. They need to get one into
circulation that works on RAID + LVM pretty quick. The RAID + LVM problem puts a
cloud over Fedora7 when it competes with $$$ software for the big server market.

Comment 23 Russell McOrmond 2007-06-07 20:19:59 UTC

There may be a different bug for this, but I have a related but different
problem.  I'm not using raid or LVM.  I had a FC6 system which I upgraded using
'yum update' after installing the new fedora-release.

My / is on /dev/hda2 and my /boot is on /dev/hda1

I got the familiar error: mount: could not find filesystem '/dev/root'

I tried editing the 'init' to mount things differently, and now just get:
mount: could not findfilesystem '/dev/hda2'


Not sure if it is helpful for tracking this down to know that it isn't tied to
RAID or LVM.

Comment 24 Russell McOrmond 2007-06-07 20:47:08 UTC

Please ignore the last comment.  I found the note about the ata module, and the
fact that references to /dev/hda needed to be changed to /dev/sda

Comment 25 Srihari Vijayaraghavan 2007-06-09 01:16:39 UTC

An F7 LVM system installed using Fedora 7 KDE Live CD i686. After the install
wouldn't boot, kernel panic about being unable to mount rootfs, trying to kill
init etc.

Booted with the same Live CD again, mounted the /boot of the installed
unbootable system. Ran mkinitrd with --force-lvm to generate a new initrd image,
with which now the F7 system is able to boot just fine.

Comment 26 Andreas M. Kirchwitz 2007-06-18 20:24:03 UTC

Did a fresh install of Fedora 7 from DVD today with two harddisks
and RAID-1 (/dev/md*) setup. After the first reboot, the machine
kernel panics because it cannot handle the RAID-1.

As a lot of people correctly said, initrd is missing some files.
mkinitrd on the official Fedora 7 DVD is broken for RAID/LVM systems.
This can be fixed with booting from DVD in rescue mode, but there's
an easier workaround.

When the package installation has finished and you are expected to
click on the "Reboot" button, do NOT reboot, but switch to the
console shell (Ctrl-Alt-F3) and fix initrd right now:

        chroot /mnt/sysimage
        /sbin/mkinitrd -f /boot/initrd-$(uname -r).img $(uname -r)

Don't know about LVM, but regular RAID (/dev/md*) doesn't need any
additional options. LVM users or users with LVM+RAID may want to add
options "--force-raid-probe --force-lvm-probe --with=lvm --with=raid".
The new initrd now includes /bin/mdadm and other stuff needed to
handle RAID devices.

Now switch back to GUI (Ctrl-Alt-F5?) and click on "Reboot". Works.

Because this is a very severe bug, I'm confused why this hasn't
been mentioned on http://fedoraproject.org/wiki/Bugs/F7Common yet.

Comment 27 Bob Gustafson 2007-06-18 21:40:41 UTC

Good idea

To: sundaram

See bug numbers

247415
241949
242043

They are all independent encounters with problems installing F7 on systems with
RAID and or LVM

Kernel panic unless you follow the instructions given in the bug reports.

Bob G

Comment 28 xunilarodef 2007-06-21 04:17:09 UTC

(In reply to comment #26)
  To slightly clarify Comment #26 for others who will need to follow
this valuable advice (use Ctrl+Alt+F2, not ...F3 as originally posted):

> but switch to the console shell (Ctrl-Alt-F2) and fix initrd right now:

Yes:

  chroot /mnt/sysimage
  /sbin/mkinitrd -f --force-lvm-probe /boot/initrd-$(uname -r).img $(uname -r)

will produce the desired results for a system with only LVM, e.g.
as a previous Fedora install.
 :
> Now switch back to GUI (Ctrl-Alt-F5?) and click on "Reboot".

Hmmm.  I found none of [Ctrl-Alt-F1] through [Ctrl-Alt-F7] led back
to the installer GUI, so I tried typing "reboot" in F2's shell, and
when that failed settled for "sync" followed by the power button:

sh-3.2# reboot
WARNING: could not determine runlevel - doing soft reboot
  (it's better to use shutdown instead of reboot from the command line)
shutdown: /dev/initctl: No such file or directory
init: /dev/initctl: No such file or directory
sh-3.2# /sbin/shutdown -r now
shutdown: /dev/initctl: No such file or directory
init: /dev/initctl: No such file or directory
sh-3.2# sync
sh-3.2# sync
sh-3.2# sync

Then pressed the power button.  System booted correctly with the
repaired initrd.

Comment 29 Jan Hlavaty 2007-06-21 18:14:07 UTC

(In reply to comment #28)

> Hmmm.  I found none of [Ctrl-Alt-F1] through [Ctrl-Alt-F7] led back
> to the installer GUI, so I tried typing "reboot" in F2's shell, and

Thats because it's Alt-F7 not Ctrl-Alt-F7 when you are already in text console
mode. You have to use use Ctrl-Alt-f# from X/graphical mode and just Alt-f# from
text mode to switch to a console.

Comment 30 mark 2007-06-21 20:04:41 UTC

is there a solution for someone who has already installed f7 and rebooted? i did
not find this posting until after installing and kernel panic'ing.

Comment 31 mark 2007-06-21 20:06:09 UTC

*** Bug 243900 has been marked as a duplicate of this bug. ***

Comment 32 mark 2007-06-21 20:16:44 UTC

perhaps someone could post a patched initrd img?

Comment 33 Bob Gustafson 2007-06-21 20:22:00 UTC

Re: Coment #30.  Yes, that is how we all found our way here.

Re: Comment #32. Yes, we are waiting for a respin of the distribution with
fixes. Hopefully it will arrive before F8.

Comment 34 Nerijus Baliūnas 2007-06-21 20:57:58 UTC

> is there a solution for someone who has already installed f7 and rebooted?

The same as in comment #26, but you boot from rescue CD first.

> perhaps someone could post a patched initrd img?

No, initrd is specific to your system. You should regenerate it yourself.

Comment 35 Donald Pickett 2007-07-10 21:51:42 UTC

I've upgrade from Core 6 to 7 with mirrored disk on a logical volume using an 
Fedora 7 DVD.

I've used mdadm -Es to add the 2nd disk to mdadm.conf.

I've rebuilt the initrd using the mkinitrd command (with all the --force.. and -
-with=... as suggested in Bug 241949.

There are no LABEL= entries in fstab but the system still can't boot.

I'm getting

Unable to access resume device (/dev/VolGroup00/LogVol00)
mount: could not find filesystem '/dev/root'
setupproot: moving /dev failed: No such file or directotry
...
...
Kernal panic - not syncing: Attempted to kill init!

Using the rescue image on the DVD I can determine the following:

fstab has the following entries
/dev/VolGroup00/LogVol00 swap    swap defaults 0 0
/dev/VolGroup00/LogVol01 /home   ext3 defaults 1 2
/dev/VolGroup00/LogVol02 /var    ext3 defaults 1 2
/dev/VolGroup00/LogVol03 /       ext3 defaults 1 1
/dev/VolGroup00/LogVol04 /backup ext3 defaults 1 2

which is how the original Core 6 configuration was configured.

but in dev/VolGroup00 there are no entries for LogVol00, LogVol01, LogVol02 or 
LogVol04, only an entry for /dev/VolGroup00/LogVol03

Is this likely to be caused by the same sort of problem or is it another bug?

Any ideas how to fix it?

Comment 36 Bob Gustafson 2007-07-11 00:51:51 UTC

Check back over this bug report. Comment #10 shows a debugging process from Andy
Baumhauer. Comment #12 and #14 may also be useful.

Those folks edited the lvm.conf to make sure that it has the right information
about virtual disks. This was necessary even though the --force-lvm-probe switch
was used.

Comment #25 might also be a shortcut.

There appears to be several problems - related to missing config information for
RAID and LVM volumes. Unwrap the initrd, check the conf files (diff against an
older working file), edit your file(s) and rewrap the initrd file.

Comment 37 Mario DePillis Jr. 2007-07-28 00:46:57 UTC

I would like to move to F7 from F5 but am waiting only because of this bug.  Any
news on a respin date would be appreciated.

In the meantime, could someone explain the work around of comment $28?  I like
to know what I am doing.  #28 suggests:
   
     chroot /mnt/sysimage
(I understand the command above but not why it is necessary)
 
  /sbin/mkinitrd -f --force-lvm-probe /boot/initrd-$(uname -r).img $(uname -r)
(I don't see "--force-lvm-probe" in my man pages for mkinitrd, I don't
understand the $(uname -r) syntax and it isn't documented in the man page.)

Comment 38 Bob Gustafson 2007-07-28 12:22:09 UTC

The 'chroot /mnt/sysimage' command just changes your root from '/' to
'/mnt/sysimage/'  This means that when you do a command like

  vim /boot/grub/grub.conf

It is actually doing

  vim /mnt/sysimage/boot/grub/grub.conf

You can go out of the chroot state by just doing an extra 'exit'

----- the above is an oversimplification ----

Your mkinitrd is most probably too elderly to have the --force-lvm-probe option
implemented. Anyway, the --force-lvm-probe option did not work for me.

Do some experiments with your install disks or the live-cd, but just don't go
all the way. The live-cd should have the uptodate mkinitrd which will recognize
more options.

There are respins available (I don't have the links at the moment), but I don't
know if any have been blessed by the Fedora folks. (all the folks cc'ed to this
bug are already running F7 and don't need a respin..)

At least this problem is on the Fedora list of known problems...

--

If you do (from your terminal now)
  echo "initrd-$(uname -r).img"
you will see the logic of that syntax

Comment 39 Bob Gustafson 2007-08-08 23:40:55 UTC

I installed Fedora7 in my other system (400 Mhz, 768mB, 2 SCSI 36G disks in RAID 1)

Following the procedure in comment #10 above

Install Fedora7 as an update to existing system
When you come to the final boot up, choose the rescue mode.

When you get to the # prompt, do the suggested 'chroot /mnt/sysimage' command.

Then, follow the (slightly modified) instructions below:

cd /tmp
cp /boot/initrd-2.6.21-1.3194.fc7.img /tmp/initrd-2.6.21-1.3194.fc7.img.gz
# Note, the .img file IS gzipped, but without the .gz extension..

gunzip initrd-2.6.21-1.3194.fc7.img.gz
mkdir initrd
cd initrd
cpio -cid -I ../initrd-2.6.21-1.3194.fc7.img

# Note the etc below is within the unpeeled initrd image
cd etc
cat mdadm.conf
# Note only 2 lines - missing last disk description line

mdadm -Es >> mdadm.conf

vi mdadm.conf
# Delete the duplicate first two disk description lines.
# Make sure there are 3 disk lines instead of two.
# (my system has /dev/md0 as /boot, /dev/md1 as swap, /dev/md2 as /dev/rootvg/root

# Now copy over the mdadm.conf to the main /etc directory
cp mdadm.conf /etc/mdadm.conf

# Now navigate to the /boot directory and recreate the initrd file

cd /boot
mv initrd-2.6.21-1.3194.fc7.img initrd-2.6.21-1.3194.orig.fc7.img

mkinitrd initrd-2.6.21-1.3194.fc7.img 2.6.21-1.3194.fc7

# Now exit out of chroot and the rescue shell and reboot.
# Note that you can reboot without changing the /boot/grub/grub.conf file
# but you may have to reboot twice and be quick about selecting the correct
# image.

exit
exit

=========
it worked for me..

Comment 40 Jannes Faber 2007-08-09 07:01:50 UTC

I get the same symptoms because I have no swap partition but a swap file on 
the 
ext3 partition instead. (which was a mistake in hindsight because it also 
makes 
suspend to disk impossible.)

At first (after setup) I fixed it using the mkinitrd commands I found in 
earlier comments. The problem reoccurs every time there's a kernel update 
however.

To prevent that from happening during an update, I comment out the swap line 
in 
/etc/fstab and then do:

swapoff -a
yum update

After the kernel has been updated, I restore /etc/fstab and reenable swap (or 
reboot).

Hope it helps someone...

Comment 41 Bob Gustafson 2007-08-09 22:10:58 UTC

What does your /etc/mdadm.conf look like?

Does it make sense? Maybe a touchup there would eliminate the need for your
/etc/fstab fooling around.


This bug is mislabeled. In my experience mkinitrd works fine. It is just that
the information supplied to mkinitrd (/etc/mdadm.conf) is bad. Gi Go

Comment 42 Keith G. Robertson-Turner 2007-08-09 23:18:39 UTC

Sorry but mkinitrd does *not* work on my server. I need to manually rebuild the
initrd using gzip/cpio, editing "init" to include the lvm command, and manually
adding the lvm.conf and lvm binary to the initrd. I have to do this every time,
from the original Fedora 7 distributed kernel, up to and including
2.6.22.1-41.fc7 and mkinitrd-6.0.9-7.1.

This was a clean install, as was the last two versions of Fedora on this server
(FC6 and FC5), neither of which exhibited this problem. I do not have, nor have
I ever had an "/etc/mdadm.conf". If I'm supposed to, then it was not created
during the install, and I have no idea what it should contain (I'm not using any
sort of RAID). Perhaps this is the root of the problem. The "/etc/lvm/lvm.conf"
is correct and does work, once it is manually added to the initrd. Using the
--with=lvm and --force-lvm-probe flags with mkinitrd has zero effect on the problem.

If I manually rebuild the initrd in this way, I can get the server to boot,
otherwise (i.e. with every kernel update) the kernel panics trying to find the
rootfs.

Whatever it is that mkinitrd uses to grok LVM, it simply isn't working on my
server (my guess would be a bug in nash). I can confirm that the lvm (and
lvm.static) commands do correctly identify the LVM, obviously since it does
actually work (once manually inserted).

Specifically, I need to manually add the following commands to init:

mkdmnod
lvm vgscan --ignorelockingfailure
lvm vgchange -ay --ignorelockingfailure  cumulous

And change the following line from:

mkrootdev -t ext3 -o defaults,ro dm-0

To:

mkrootdev -t ext3 -o defaults,ro /dev/cumulous/Eagle

Then I need to:

cp /sbin/lvm.static initrd/sbin/lvm
mkdir initrd/etc/lvm
cp /etc/lvm/lvm.conf initrd/etc/lvm/

Then I cpio/gzip it up, copy it to /boot and edit grub.conf accordingly.

This is very frustrating.

Comment 43 Bob Gustafson 2007-08-09 23:57:55 UTC

That is curious.

One would think that the most 'tested' configuration would be a bare disk,
default partitions, default install. This results in an lvm volume for root.

How much different can your system be from the case where you 'upgrade' an
existing default disk configuration (with lvm root partition).

Your statement "The "/etc/lvm/lvm.conf"
is correct and does work, once it is manually added to the initrd." indicates
that whatever process is used to create /etc/lvm/lvm.conf prior to the mkinitrd
stage, is not working.

Do you have an old system around there that you can load up F7 on? Tell anaconda
to wipe disk and create default partition layout.

Then compare resulting configuration files (/etc/lvm/lvm.conf..) with your system.

You could also :-) install a default FC6 and then upgrade it to F7 to see if
that works. If not, it is a sad day for Fedora.

Comment 44 Keith G. Robertson-Turner 2007-08-10 00:31:08 UTC

I tried all the above several times (upgrade, clean, check diffs on configs,
etc), it made no difference.

AFAICT the "/etc/lvm/lvm.conf" file is a boilerplate that does not differ from
one system to another. I havn't modified it from the original (installed by
Anaconda/RPM). Within the same version of the LVM package, the lvm.conf files
are all identical. The version that comes with Fedora 7 only differs from the
FC6 version by two or three lines, all just comment lines AFAICT.

Somehow I don't think the lvm.conf file has anything to do with this problem.
Like I said, LVM works on this system, it's just mkinitrd that stubbornly
refuses to include the necessary LVM components when creating the initrd.

The "process used to create lvm.conf" is (AFAIK) Anaconda/RPM during install,
and does not change after that. This has worked on this server for the two
previous versions of Fedora, but now doesn't.

Comment 45 Bob Gustafson 2007-08-10 06:46:05 UTC

Your words from comment #44 above:

"Somehow I don't think the lvm.conf file has anything to do with this problem.
Like I said, LVM works on this system, it's just mkinitrd that stubbornly
refuses to include the necessary LVM components when creating the initrd.

The "process used to create lvm.conf" is (AFAIK) Anaconda/RPM during install,
and does not change after that. This has worked on this server for the two
previous versions of Fedora, but now doesn't."

Yes, I think you have it. mkinitrd cannot include lvm.conf if it is not in the
pot of data that mkinitrd wraps up into initrd.

Comment 46 Boris Capitanu 2007-08-11 15:55:08 UTC

I had somewhat the same problem:

Background:  Installed FC7 x64 from Live CD and upon rebooting I got an error
that read: 

device-mapper: table: 253:0:0 striped: Couldn't parse stripe destination

and from here other errors about how LVM can't find the volumes, etc.

My setup:  running a 3 drive ICH7-based hardware RAID-0 with the default
partition layout created during install

After hours of trying everything I could think of to fix it (including
recompiling the newest kernel,  trying an older kernel (2.6.20-1), messing with
mkinitrd [cpio,gzip...], repartitioning the system to run / on a non-LVM
partition...etc),  what solved it for me was installing FC7 again, but this time
from DVD instead of the Live CD.

Hopefully this will help others running into the same problem.
I can't explain why it works, but the DVD install did the trick for me while
nothing else worked.

Good luck.

Comment 47 Keith G. Robertson-Turner 2007-08-12 20:41:57 UTC

I did use the i386 DVD. The LiveCD simply doesn't work at all on this server
(sponteneous reboot, no error message).

I've triple checked the hardware, which I seriously doubt is the issue. Like I
said, this system worked (and still works) perfectly with FC5 and 6. It also
works perfectly with Fedora 7 ... once I take the manual steps required above.

IMHO nash is broken (at least for this configuration).

What I would like to do, is step trace through the mkinitrd script, in verbose
mode, carefully examining the output from each stage (i.e. the intermediate nash
probes), to see exactly what results are being returned from from nash. The only
problem is that I would need to essentially "disassemble" the script and rewrite
it to "probe and echo" only, and this is a time-consuming and involved process.
Merely specifying the "-v" flag will not return the intermediate nash values.

I may take a crack at it later in the week, when I'm less busy.

Comment 48 Bob Gustafson 2007-08-12 21:15:37 UTC

As a quicky, you might try

  sh -x /sbin/mkinitrd <args>

And then redirect the debug output to a file, and then look it over and see what
is happening.

Comment 49 Keith G. Robertson-Turner 2007-08-13 04:26:08 UTC

Created attachment 161151 [details]
mkinitrd log file

Comment 50 Keith G. Robertson-Turner 2007-08-13 04:26:56 UTC

OK, I've nailed it.

Short answer: mkinitrd assumes the LVM group *label* is the same as the LVM
*block device* name, which in my case, it isn't (I gave it a name).

Longer answer:

++ awk '/^[ \t]*[^#]/ { if ($2 == "/") { print $1; }}' /etc/fstab
+ rootdev=LABEL=Eagle
+ '[' ext3 == nfs -a x == x ']'
+ '[' LABEL=Eagle '!=' Eagle -o LABEL=Eagle '!=' LABEL=Eagle ']'
++ echo defaults
++ sed -e 's/^r[ow],//' -e s/,_netdev// -e s/_netdev// -e 's/,r[ow],$//' -e
's/,r[ow],/,/' -e 's/^r[ow]$/defaults/' -e 's/$/,ro/'
+ rootopts=defaults,ro
++ resolve_device_name LABEL=Eagle
++ /sbin/nash --forcequiet
++ echo nash-resolveDevice LABEL=Eagle
+ devname=/dev/mapper/cumulous-Eagle
++ get_numeric_dev dec /dev/mapper/cumulous-Eagle
+ majmin=253:0
+ '[' -n 253:0 ']'
++ findall /sys/block -name dev
++ echo nash-find /sys/block -name dev
++ sed -e 's,.*/\([^/]\+\)/dev,\1,'
++ read device
++ /sbin/nash --force --quiet
++ echo 253:0

[snip lots of block devs]

++ cmp -s /sys/block/dm-0/dev
++ echo /sys/block/dm-0/dev
++ read device
++ echo 253:0

[snip lots more block devs]

+ dev=dm-0
+ '[' -n dm-0 ']'
+ vecho 'Found root device dm-0 for LABEL=Eagle'
+ NONL=
+ '[' 'Found root device dm-0 for LABEL=Eagle' == -n ']'
+ '[' -n -v ']'
+ echo 'Found root device dm-0 for LABEL=Eagle'
Found root device dm-0 for LABEL=Eagle
+ rootdev=dm-0
+ '[' ext3 '!=' nfs ']'
+ handlelvordev dm-0
++ lvshow dm-0
++ lvm.static lvs --ignorelockingfailure --noheadings -o vg_name dm-0
++ egrep -v '^ *(WARNING:|Volume Groups with)'
+ local vg=

Oops.

IOW:

[root@sky ~]# lvs --ignorelockingfailure --noheadings -o vg_name dm-0
  Volume group "dm-0" not found

There *is* no volume group called dm-0, that is merely the block device name,
the LVM group *label* is "cumulous" (in my case):

[root@sky ~]# lvs --ignorelockingfailure
  LV      VG       Attr   LSize   Origin Snap%  Move Log Copy% 
  Eagle   cumulous -wi-ao  10.00G                              
  home    cumulous -wi-ao  10.00G                              
  scratch cumulous -wi-ao  20.00G                              
  shared  cumulous -wi-ao 174.00G                              
  usr     cumulous -wi-ao   5.00G                              
  var     cumulous -wi-ao  76.00G

So the value of "vg=" becomes null, and the rest is history.

So the question is, why has this behaviour changed in mkinitrd (it used to
work)? My LVM group has not changed since FC5, where this setup worked fine.

I've attached the full log file, for the curious.

Meanwhile I'll have to hardcode the *actual* volume group *label* into mkinitrd,
and see if that cures the problem.

Comment 51 Bob Gustafson 2007-08-13 05:37:35 UTC

Very nice. I think you have gone a long way to solving the problem.

Following your path, I did some experiments on my system:

[root@hoho2 log]# /usr/sbin/lvs --ignorelockingfailure
  LV   VG     Attr   LSize  Origin Snap%  Move Log Copy% 
  root rootvg -wi-ao 64.50G                              
[root@hoho2 log]# /usr/sbin/lvs --ignorelockingfailure --noheadings -o vg_name
  rootvg 
[root@hoho2 log]# /usr/sbin/lvs --ignorelockingfailure --noheadings -o lv_name
  root 
[root@hoho2 log]# 


Note that when you leave out the dm-0 at the end of the last command, lvs comes
up with the right answer.

Looking at 'man lvs' it seems as though 'dm-0' is not needed at all (and just
messes things up). Maybe that was a change in lvs - previously it may have
ignored extra arguments, now it is picky.

Comment 52 Bob Gustafson 2007-08-13 05:40:31 UTC

The (new) code should also consider the case where the user has more than one
volume group.

Comment 53 Bob Gustafson 2007-08-13 05:47:17 UTC

I think there may be more than one problem though.

Your system exercised the path where you have an LVM setup, but no RAID.

I have a RAID system (also LVM). When I corrected the /etc/dmadm.conf file, the
mkinitrd code followed a different path and did not stumble on the problem you
found.

Comment 54 Keith G. Robertson-Turner 2007-08-13 07:15:46 UTC

Well that would be a problem on my system, since I don't have dmadm.conf at all,
therefore mkinitrd cannot depend on it for the correct path.

As I indicated earlier, this may be the root of the problem. Perhaps I *should*
have a dmadm.conf, and therefore this problem wouldn't exist (was an assumption
made by the maintainers of mkinitrd?).

RAID issues asside, I think I've just solved the problem (for LVM anyway, and
possibly for RAID / dmadm too):

"dm-0" is not a boilerplate, it is groked from the following function:

findstoragedriver () {
    for device in $@ ; do
        case " $handleddevices " in
            *" $device "*)
                continue ;;
            *) handleddevices="$handleddevices $device" ;;
        esac
        if [[ "$device" =~ "md[0-9]+" ]]; then
            vecho "Found RAID component $device"
            handleraid "$device"
            continue
        fi
        vecho "Looking for driver for device $device"
        sysfs=$(findone -type d /sys/block -name $device)
        [ -z "$sysfs" ] && return
        pushd $sysfs >/dev/null 2>&1
        findstoragedriverinsys
        popd >/dev/null 2>&1
    done
}

Now check the following carefully:

lvshow() {
    lvm.static lvs --ignorelockingfailure --noheadings -o vg_name \
        $1 2>/dev/null | head -n 1 | egrep -v '^ *(WARNING:|Volume Groups with)'
}

vgdisplay() {
    lvm.static vgdisplay --ignorelockingfailure -v $1 2>/dev/null |
        sed -n 's/PV Name//p'
}

handlelvordev() {
    local vg=$(lvshow $1)
    if [ -n "$vg" ]; then
        vg=`echo $vg` # strip whitespace
        case " $vg_list " in
        *" $vg "*) ;;
        *)  vg_list="$vg_list $vg"
            for device in $(vgdisplay $vg) ; do
                findstoragedriver ${device##/dev/}
            done
            ;;
        esac
    else
        findstoragedriver ${1##/dev/}
    fi
}

The problem is, that looking in /sys/block/ will just return the block device
dm-0 (from the id number), e.g. in my case it is looking for the block device
number "253:0", which *is* dm-0:

[root@sky ~]# cat /sys/block/dm-0/dev 
253:0

But lvs only works with volume group *names*, not block device names (unless
they *happen* to be the same). So it passes "dm-0" to lvs, and naturally it
cannot find a volume group with that name (which on my system is actually
labeled "cumulous").

How about this?:

[root@sky ~]# rootdev=$(lvs | grep $(awk '/^[ \t]*[^#]/ { if ($2 == "/") { print
$1; }}' /etc/fstab | sed 's/LABEL=//') | awk '{ print "/dev/"$2"/"$1}')
[root@sky ~]# echo $rootdev
/dev/cumulous/Eagle

Hmm, but that makes two assumptions; 1) that the rootdev is on an LVM and 2)
that the rootfs is denoted by a "LABEL=" in fstab.

I think the key to this is how the following works:

        devname=$(resolve_device_name $rootdev)
        majmin=$(get_numeric_dev dec $devname)
        if [ -n "$majmin" ]; then
            dev=$(findall /sys/block -name dev | while read device ; do \
                  echo "$majmin" | cmp -s $device && echo $device ; done \
                  | sed -e 's,.*/\([^/]\+\)/dev,\1,' )
            if [ -n "$dev" ]; then
                vecho "Found root device $dev for $rootdev"
                rootdev=$dev

It needs to change to include a check for mapper devices, so you'd replace the
last line ("rootdev=$dev") with something like this:

######
mapper=$(lvdisplay -c | grep "$majmin")
if [ $(echo "$mapper" | cut -d':' -f12- | grep -q "$majmin"; echo $?) = 0 ]
   then
      rootdev=$(echo "$mapper" | cut -d':' -f1)
   else
      rootdev=$dev
fi
######

^^^ Is that the fix? ^^^

Perhaps something similar can be added to check for dmadm RAID systems.

Comment 55 Keith G. Robertson-Turner 2007-08-13 07:35:11 UTC

Sorry, I'm a bit tired. Of course you could replace that with just this:

mapper=$(lvdisplay -c | grep "$majmin")
if [ $? = 0 ]
   then
      rootdev=$(echo "$mapper" | cut -d':' -f1)
   else
      rootdev=$dev
fi

Also, I've just been playing around with mdadm (although I don't have RAID), and
I'd guess that the following could be used in a similar fashion to the above:

mdadm -D "$dev"

Comment 56 Keith G. Robertson-Turner 2007-08-13 07:59:58 UTC

It works!!!

Now if someone (Bob Gustafson from comment #53) could send me their output from:

mdadm -D "$dev"
mdadm -Q "$dev"

Then I think we can sew this one up.

Comment 57 Keith G. Robertson-Turner 2007-08-26 20:58:04 UTC

Bump.

Any comments on the fix proposed in comment #55?

Could this be rolled into a mkinitrd release?

Comment 58 Bob Gustafson 2007-08-27 03:15:14 UTC

In reply to comment #56 for more information:

[root@hoho2 ~]# /sbin/mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Wed Apr 26 14:29:30 2006
     Raid Level : raid1
     Array Size : 104320 (101.89 MiB 106.82 MB)
  Used Dev Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sat Aug 25 23:56:30 2007
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 47bba70b:b76ffd5f:816f55b8:cf2ee184
         Events : 0.1206

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8        1        1      active sync   /dev/sda1
[root@hoho2 ~]# 

[root@hoho2 ~]# /sbin/mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90.03
  Creation Time : Wed Apr 26 14:29:51 2006
     Raid Level : raid1
     Array Size : 3911744 (3.73 GiB 4.01 GB)
  Used Dev Size : 3911744 (3.73 GiB 4.01 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Sat Aug 25 23:52:39 2007
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 36c22074:b238a704:85d99d8b:e9dafa99
         Events : 0.2814

    Number   Major   Minor   RaidDevice State
       0       8       18        0      active sync   /dev/sdb2
       1       8        2        1      active sync   /dev/sda2
[root@hoho2 ~]# 

[root@hoho2 ~]# /sbin/mdadm --detail /dev/md2
/dev/md2:
        Version : 00.90.03
  Creation Time : Wed Apr 26 14:30:10 2006
     Raid Level : raid1
     Array Size : 67665664 (64.53 GiB 69.29 GB)
  Used Dev Size : 67665664 (64.53 GiB 69.29 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Sun Aug 26 22:11:18 2007
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 49f39f6e:b3fbca37:a77d34bb:e5975c53
         Events : 0.6382018

    Number   Major   Minor   RaidDevice State
       0       8       19        0      active sync   /dev/sdb3
       1       8        3        1      active sync   /dev/sda3
[root@hoho2 ~]# 

Also

[root@hoho2 ~]# /usr/sbin/lvdisplay
  --- Logical volume ---
  LV Name                /dev/rootvg/root
  VG Name                rootvg
  LV UUID                ZNEpYD-qv0J-ohPA-27mD-NwnD-M7EO-W9q6pi
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                64.50 GB
  Current LE             16512
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:0
   
[root@hoho2 ~]# 

(I think someone should send you a pair of disks so you can play with Raid
too..) I think you are making progress.

Comment 59 Keith G. Robertson-Turner 2007-08-27 04:10:42 UTC

Since you are using both RAID and LVM, then AFAICT the fix in comment 55 should
work as is, in your case (i.e. "/dev/rootvg/root").

What about the case where someone is using /only/ RAID?

I'm flying blind here, since I have no disks spare to play with RAID. I'm just
trying to wrap my head around this.

Scenario...

You have two disks, sda and sdb.
You configure them as RAIDx, dm-0
You then create multiple filesystems on dm-0.

On systems /without/ LVM, do these partitions get referenced as
/dev/mapper/xxx1, /dev/mapper/xxx2, etc.? Or is there some other convention?

What does the kernel line look like in grub.conf, on such systems?

Can one /label/ the RAID device, like one does with LVM?

I guess what I'm really getting at, is there any reason why a wrong/incomplete
mdadm.conf file would cause mkinitrd to /not/ find the rootfs?

What does a typical mdadm.conf look like?

Look again at this piece of code, it is crucial:

devname=$(resolve_device_name $rootdev)
        majmin=$(get_numeric_dev dec $devname)
        if [ -n "$majmin" ]; then
            dev=$(findall /sys/block -name dev | while read device ; do \
                  echo "$majmin" | cmp -s $device && echo $device ; done \
                  | sed -e 's,.*/\([^/]\+\)/dev,\1,' )
            if [ -n "$dev" ]; then
                vecho "Found root device $dev for $rootdev"
                rootdev=$dev

Under what circumstances would that /not/ be able to find the rootfs on a RAID
/only/ system?

My gut feeling is that the /correct/ place to look is *still* /dev/mapper,
rather than /sys/block, although /sys/block/dm-0/dev would indeed give the
correct DevID to compare against /dev/mapper/xxx.

So assuming the above code can in fact return the correct DevID of the rootfs,
we need to make a check comparison between this and the output of mdadm --detail
"$dev". But that returns the UUID rather than the Maj:Min number. So how does
one convert a UUID into a Maj:Min DevID?

Should we even be doing this, or should we assume that mdadm.conf is always
correct (and if it isn't then that's a separate bug)?

Too many questions. Stack overflow.

Comment 60 Bob Gustafson 2007-08-27 04:33:55 UTC

[root@hoho2 ~]# ls /sys/block
dm-0  md0  md2   ram1   ram11  ram13  ram15  ram3  ram5  ram7  ram9  sdb
fd0   md1  ram0  ram10  ram12  ram14  ram2   ram4  ram6  ram8  sda   sr0
[root@hoho2 ~]# ls /sys/block/dm-0
capability  holders  removable  slaves  subsystem
dev         range    size       stat    uevent
[root@hoho2 ~]# cat /sys/block/dm-0/size
135266304
[root@hoho2 ~]# cat /sys/block/dm-0/dev
253:0
[root@hoho2 ~]# cat /sys/block/md0/dev
9:0
[root@hoho2 ~]# cat /sys/block/md1/dev
9:1
[root@hoho2 ~]# cat /sys/block/md2/dev
9:2
[root@hoho2 ~]# cat /sys/block/sda/dev
8:0
[root@hoho2 ~]# cat /sys/block/sdb/dev
8:16
[root@hoho2 ~]# cat /sys/block/fd0/dev
2:0

[root@hoho2 ~]# ls /dev/mapper
control  rootvg-root
[root@hoho2 ~]# 

[root@hoho2 ~]# ls -l /dev/mapper/rootvg-root
brw-rw---- 1 root disk 253, 0 2007-08-25 23:56 /dev/mapper/rootvg-root
[root@hoho2 ~]# 
======================
Looks like there is more information in/dev/mapper

cat /dev/mapper/rootvg-root isn't so useful though..

Comment 61 Bob Gustafson 2007-08-27 04:37:02 UTC

Executing your code, I get

[root@hoho2 ~]# sh -x bug.sh
++ resolve_device_name
bug.sh: line 1: resolve_device_name: command not found
+ devname=
++ get_numeric_dev dec
bug.sh: line 2: get_numeric_dev: command not found
+ majmin=
bug.sh: line 10: syntax error: unexpected end of file
[root@hoho2 ~]#

Comment 62 Keith G. Robertson-Turner 2007-08-27 04:47:25 UTC

Sorry, I didn't make it clear. Those are "nash" commands, part of mkinitrd.

To use my changes, you'd need to apply the following patch:

--- /sbin/mkinitrd.old  2007-08-13 06:10:50.000000000 +0100
+++ /sbin/mkinitrd      2007-08-13 08:43:11.000000000 +0100
@@ -1020,7 +1020,11 @@
                   | sed -e 's,.*/\([^/]\+\)/dev,\1,' )
             if [ -n "$dev" ]; then
                 vecho "Found root device $dev for $rootdev"
-                rootdev=$dev
+                mapper=$(lvdisplay -c | grep "$majmin")
+                if [ $? = 0 ]; then
+                    rootdev=$(echo "$mapper" | cut -d':' -f1)
+                else rootdev=$dev
+                fi
             fi
         fi
     else

Then run mkinitrd in the usual way. An examination (gunzip|cpio) of the new
initrd's init file should reveal whether or not it works (or just reboot using
the new initrd).

Comment 63 Bob Gustafson 2007-08-27 05:23:24 UTC

[root@hoho2 ~]# uname -r
2.6.22.4-65.fc7
[root@hoho2 ~]# ./mkinitrd.new initrd-new.img 2.6.22.4-65.fc7
[root@hoho2 ~]# ls -l initrd-new.img
-rw------- 1 root root 3804885 2007-08-27 00:14 initrd-new.img
[root@hoho2 ~]# ls -l /boot/initrd-2.6.22.4-65.fc7.img
-rw------- 1 root root 3803798 2007-08-25 23:51 /boot/initrd-2.6.22.4-65.fc7.img
[root@hoho2 ~]# 

Looks like there is a size difference.

Note - At the moment, I don't have a problem with new kernels. They just work.
There is no problem with booting. The initrd in /boot works fine for me. I think
just correcting the /etc/mdadm.conf in my original installation (update) of F7
has stuck and is providing enough (correct) information so that the stock
mkinitrd works.

My /etc/mdadm.conf is below:

[root@hoho2 ~]# cat /etc/mdadm.conf

# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR root
ARRAY /dev/md0 level=raid1 num-devices=2 uuid=47bba70b:b76ffd5f:816f55b8:cf2ee184
ARRAY /dev/md1 level=raid1 num-devices=2 uuid=36c22074:b238a704:85d99d8b:e9dafa99
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=49f39f6e:b3fbca37:a77d34bb:e5975c53
[root@hoho2 ~]# 

The line saying 'written out by anaconda' is of course BS because I had to use
mdadm -Es
see comment #18

I am going to sleep now.

Comment 64 Bob Gustafson 2007-08-27 05:43:40 UTC

Since my problem stems from a bug in anaconda, I don't think that any patch
applied to mkinitrd will help my initial no boot coming out of an install
(running anaconda).

Fixing the symptom of the anaconda problem (by running mdadm -Es and tucking the
output into /etc/mdadm.conf) did the trick for me.

Anaconda is the least tested of all of the pieces. It only runs at install.
mkinitrd on the other hand probably runs at every update of a kernel for every
system out there.

Comment 65 Keith G. Robertson-Turner 2007-08-27 06:37:49 UTC

Created attachment 173281 [details]
mkinitrd LVM rootfs patch

+1 submit for testing

Comment 66 saul 2007-08-31 20:57:45 UTC

FYI, this patch fixed my system, which is running both LVM and raid. 


Thanks!

Comment 67 Bob Gustafson 2007-10-09 23:23:35 UTC

I'm waiting for Fedora 8

:-)

Comment 68 Jeremy Katz 2007-10-29 18:27:51 UTC

I just successfully did an install with root on LVM on raid with current rawhide
and everything worked.  There have definitely been some fixes in those areas, so
closing NEXTRELEASE

Note You need to log in before you can comment on or make changes to this bug.