Bug 435228

Summary: mkinitrd doesn't grab dm modules for LVs listed by LABEL or UUID
Product: [Fedora] Fedora Reporter: Will Woods <wwoods>
Component: mkinitrdAssignee: Peter Jones <pjones>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: rawhideCC: amlau, bruno, dcantrell, dlehman, jeff, john.ellson, jonstanley, jwboyer, katzj, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: hotissue
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-03-17 23:55:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 235706, 430962    
Attachments:
Description Flags
git-format-patch that fixes this problem
none
patch for mkinitrd
none
patch to make findstoragedriver() consider "mapper/*" devices dm devs
none
/proc/partitions
none
/etc/fstab
none
dmsetup table output
none
This is the requested mkinitrd output.
none
Another mkinitrd log file after applying the patch.
none
bash -x output of mkinitrd
none
init file produced by mkinitrd none

Description Will Woods 2008-02-28 03:16:02 UTC
With current rawhide (mkinitrd 6.0.31 and 2.6.25-rc kernels), if all your LVs are listed in /etc/fstab as LABEL=/ or UUID=XXXX, mkinitrd will make an initrd that lacks the device-mapper modules, which makes 
your system  unbootable.

Current anaconda lists everything with UUID=XXX by default, so this keeps new installs from booting.

The problem lies somewhere in the fact that findblockdevinsys LABEL=/ will return a path in /sys, like 
/sys/block/dm-0. When this is passed to handlelvordev(), it is *not* recognized as a LVM logical volume 
(even though it is).

Comment 1 Will Woods 2008-02-28 03:23:10 UTC
Created attachment 296151 [details]
git-format-patch that fixes this problem

This patch fixes lvshow() to correctly identify which VolGroup (if any) a dm-X
device belongs to.

Comment 2 Peter Jones 2008-02-28 21:51:16 UTC
Created attachment 296268 [details]
patch for mkinitrd

This patch seems to do the trick, so I think I'll use it instead.

Comment 3 Jeremy Katz 2008-03-03 19:47:15 UTC
Aha, this fails exactly the same way in Fedora 8.  But when I switched anaconda
to use UUID=, it was with the (apparently mistaken :-) impression that we had
fixed this mkinitrd problem before F8.  

So, added the heuristic to anaconda to not list logical volumes with UUID= for
now.  We should still fix this, though

Comment 4 Jon Stanley 2008-03-06 01:15:41 UTC
F9Beta because a default install is unbootable.

Comment 5 Jeremy Katz 2008-03-06 01:23:01 UTC
Not F9beta because the default install isn't writing LABEL or UUID for the
rootfs anymore (we use the same logic we used to use).  That's why Will moved it
off the beta blocker last week.

Comment 6 Josh Boyer 2008-03-07 17:51:02 UTC
FYI this also causes the following output when installing kernels on my machine:

   1:kernel                 ########################################### [100%]
usage: find [-type type] [path [-name file]]
usage: find [-type type] [path [-name file]]
usage: find [-type type] [path [-name file]]
usage: find [-type type] [path [-name file]]
usage: find [-type type] [path [-name file]]
usage: find [-type type] [path [-name file]]
usage: find [-type type] [path [-name file]]
usage: find [-type type] [path [-name file]]
usage: find [-type type] [path [-name file]]
usage: find [-type type] [path [-name file]]
usage: find [-type type] [path [-name file]]
usage: find [-type type] [path [-name file]]
usage: find [-type type] [path [-name file]]

I tracked it down to mkinitrd looking for:

 Looking for driver for device /sys/block/dm-1/

and having an empty $device in that section of script.

It's apparently not fatal since the box still boots, but it's definitely confusing.

Comment 7 Will Woods 2008-03-07 20:57:03 UTC
Created attachment 297247 [details]
patch to make findstoragedriver() consider "mapper/*" devices dm devs

Something else seems to have changed, again - either the LVM tools or /sys
layout. Anyway, it's not recognizing *crypt* devices as dm devices now.
Attached patch fixes.

Comment 8 Bruno Wolff III 2008-03-08 16:01:07 UTC
The latest patch hasn't gotten into koji yet (at least as far as I can tell).
6.0.34 is busted and will break things for people updating their kernels. I hit
this last night and had to reinstall 6.0.32 and then reupdate the kernel to get
things working again.

Comment 9 Jeremy Katz 2008-03-10 17:50:05 UTC
(In reply to comment #7)
> Created an attachment (id=297247) [edit]
> patch to make findstoragedriver() consider "mapper/*" devices dm devs

Is this related to crypted devs or just LVM?  And what's in /etc/fstab for them?

Comment 10 Will Woods 2008-03-10 18:04:39 UTC
Crypted devs specifically, this time.

Normally the crypt modules get pulled in by handledm() if the dm device is
dmtype 'crypt'. 

mkinitrd's vgdisplay() does:
  lvm vgdisplay --ignorelockingfailure -v $1 2>/dev/null | sed -n 's/PV Name//p'
to get the backing PV device for a given VG.

For normal LVM that'll be something like "/dev/sda2" but for crypted PVs it'll
be something like "/dev/mapper/luks-sda2". 

findstoragedriver() then gets "mapper/luks-sda2" as the device name. If the
device name matches 'dm-*' then handledm() gets run. But "mapper/luks-sda2"
doesn't match that, so handledm() never runs for LUKS devices, so we never pull
in crypt modules.

the LABEL/UUID stuff isn't significant here, but it's the same root cause -
failure to properly identify devicemapper devs.

Comment 11 Bruno Wolff III 2008-03-10 19:22:03 UTC
I am not getting prompted for keys when using encryption over raid. I was
getting prompted for keys after installing encryption over LVM and it seemed to
work. The rawhide snapshots weren't the same though. I can get into the system
to do stuff using rescue mode with today's netinst.img. After doing a chroot I
can even run yum to install stuff. Updating to today's kernel didn't help. I
tried both the 6.0.34 and 6.0.32 versions of mkinitrd.
 This is probably a different problem than whatever broke other system which
just used raid. I fixed that by reinstalling the kernel with 6.0.32, but the
latest kernel was installed with 6.0.34 and it worked. It may have been that the
kernel I installed that had a problem wasn't using either of those versions. And
perhaps if I had just done a reinstall with 6.0.34 it would have also fixed things.

Comment 12 Bruno Wolff III 2008-03-10 20:14:14 UTC
I tried installing the patch to mkinitrd 6.0.34 and then doing a yum reinstall
to update the kernel. The timestamp on the initrd image file changed, so it
looks like it should have been rebuilt using the updated mkinitrd. However I
still don't get prompted for a password during the boot process and the root
switch fails.

Comment 13 Will Woods 2008-03-10 20:26:55 UTC
Does your /etc/fstab list the root/swap devices by UUID= or LABEL=? Because that
will still fail. If so, edit your fstab and change those devices to
/dev/VolGroupXX/LogVolXX.

You can rebuild the initrd by doing:
  mkinitrd -v -f /boot/initrd-[version].img [version]

And you can inspect the contents of the initrd with:
  gzip -dc /boot/initrd-[version].img | pax

Check to be sure dm-mod.ko and dm-crypt.ko are present. If they're both there,
then you've got a different problem.

Comment 14 Bruno Wolff III 2008-03-10 20:47:56 UTC
In fstab the 4 devices (/,/home,/play and swap) are all named as
/dev/mapper/luks-md? where ? is 1-4.
The crypto modules weren't included.
I ran mkinitrd as above with --with= to get dm-mod and dm-crypt loaded.
I still didn't get asked for a key and the switch root didn't work.
I noticed that mkinitrd has some things it includes only if there is at least
one crypto device. Maybe there is something else that isn't getting included?

Comment 15 Will Woods 2008-03-10 20:56:23 UTC
No, just adding the modules is not enough. You need for mkinitrd to properly
detect the devices as crypt devices so it will add the modules and the proper
scripts to set everything up.

Try rewriting your fstab to use /dev/VolGroupXX device names, as suggested before.

Comment 16 Bruno Wolff III 2008-03-10 21:02:58 UTC
I am not using LVM. Does that advice still apply?

Comment 17 Will Woods 2008-03-10 21:22:08 UTC
Err, hmm. Probably not useful advice, then.

Can you run:

  bash -x /sbin/mkinitrd -v -f /tmp/mkinitrd.img [kernel ver] &> mkinitrd.log

and attach the resulting mkinitrd.log?

The contents of /proc/partitions, /etc/fstab, and the result of "dmsetup table"
would also be useful.

Comment 18 Bruno Wolff III 2008-03-10 21:39:07 UTC
Created attachment 297515 [details]
/proc/partitions

I'll try to get all the requested stuff uploaded shortly.

Comment 19 Bruno Wolff III 2008-03-10 21:41:06 UTC
Created attachment 297517 [details]
/etc/fstab

Comment 20 Bruno Wolff III 2008-03-10 21:43:57 UTC
Created attachment 297518 [details]
dmsetup table output

Comment 21 Bruno Wolff III 2008-03-10 21:49:40 UTC
Created attachment 297520 [details]
This is the requested mkinitrd output.

Note this was with a vanilla 6.0.34 mkinitrd. (I reinstalled to get rid of the
patch included in this bug that I had used for some tests.)

Comment 22 Will Woods 2008-03-10 21:57:06 UTC
"Looking for driver for device mapper/luks-md2"

handledm() never runs ('cuz "mapper/" doesn't match) so you don't get the crypt
setup or modules.

Apply the patch from comment #7 and re-run mkinitrd.

Comment 23 Bruno Wolff III 2008-03-11 00:21:49 UTC
Created attachment 297538 [details]
Another mkinitrd log file after applying the patch.

It still doesn't load the crypto modules even with the patch.

Comment 24 Bruno Wolff III 2008-03-11 09:28:00 UTC
I am trying to figure out mkinitrd and one thing that seems suspicious is the
manipulation of slavedev. For one luks device it first is /sys/block/md1/ and
then a bashism I don't grok is applied and it becomes an empty string. I expect
it is supposed to do something that returns md1 or md1/, but this isn't happening.
I'll ponder on that for a while, but I figured it might save you some time to
mention it here.

Comment 25 Bruno Wolff III 2008-03-11 09:31:50 UTC
OK, I found what ##* means and this looks like an oops in that the device name
is returned with a trailing / so that the whole string is removed. My guess is
that isn't what was intended.
I'll see if I can cook something up that removes a trailing slash first and see
if that gets more reasonable output from mkinitrd.

Comment 26 Bruno Wolff III 2008-03-11 09:41:11 UTC
For a hack I added in the line:
slavedev=${slavedev%*/}
that will break things if they don't end in /, so probably isn't the right fix.
However, this did result in the crypto modules being added and their count
seemed to be getting updated correctly.
I am not physically at the machine right now, so I can't test to see if a reboot
will work (since I can't enter a password remotely), but I will be able to a in
few hours when I get into the office.

Comment 27 Bruno Wolff III 2008-03-11 10:43:45 UTC
It looks like the following is probably a reasonable way to strip off any
trailing /s:
slavedev=`expr match "$slavedev" '\(.*[^/]\)'`
I tried it out with this just before slavedev=${slavedev##*/} and things seem to
work reasonably, though I still haven't testing booting yet; I've just looked at
the included modules.

Comment 28 Bruno Wolff III 2008-03-11 16:23:24 UTC
There is still a problem or two.
The swap device wasn't handled.
Eventually a prompt for the password was displayed but additional output was
displayed indicating the switching the root had failed. I was still able to
enter a passphrase and it could detect the difference between a correct and an
incorrect one. But creating /dev/root failed. A second time there was a message
about the swap device. I expect it to fail the resume, but it said Unable to
access resume device (/dev/mapper/luks-md1) which suggests that encryption or
raid wasn't properly set up, as normally there is a message about not finding a
suspend signature.

Comment 29 Bruno Wolff III 2008-03-11 16:50:43 UTC
Created attachment 297640 [details]
bash -x output of mkinitrd

This is the bash -x output with patch from comment 7 and my check to remove
trailing /s.
I won't be available most of the afternoon so I won't be able to give you the
quick turn around that I was able to yesterday.

Comment 30 Will Woods 2008-03-11 22:51:26 UTC
Peter's proposed patch at http://pjones.fedorapeople.org/mkinitrd-lvlabel.patch
fixes the problem for the default case. 

Not sure if this would fix the LVM-on-encrypted-md0 case - I haven't finished an
install with that setup.

Comment 31 Bruno Wolff III 2008-03-12 02:48:37 UTC
I'll test that patch sometime tomorrow morning. I forgot to start up the ssh
daemon, so I can't try generating the initrd tonight.

Comment 32 Bruno Wolff III 2008-03-12 04:26:37 UTC
I took a look at the patch and I don't think it is going to solve my problem.
While I am having similar symptoms, none of my file systems are specified by
label instead of device.

Comment 33 Bruno Wolff III 2008-03-12 16:18:57 UTC
Created attachment 297801 [details]
init file produced by mkinitrd

I took a look at the generated init file and it looks reasonable to me.

Comment 34 Will Woods 2008-03-12 18:15:42 UTC
Patch is in mkinitrd-6.0.35-1.fc9, I believe.

Comment 35 Bruno Wolff III 2008-03-12 18:44:18 UTC
I looked at things more carefully and saw that mdadm was reporting a problem
before the luksopen prompts. I also noticed that there were usb device notices
displayed so that keyboard an mouse detection was happening around the same time.
The mdadm messages are probably pointing to the problem but perhaps the usb
detection is also an issue. I will compare init files to see if the change I put
in to avoid the bad pattern match for handing crypt devices broke raid devices.
I can also check that version of mkinitrd out.

Comment 36 Bruno Wolff III 2008-03-12 19:04:52 UTC
The mdadm calls in the init files looked the same between my raid only machine
and my encryption over raid machine.
I then compared the etc/mdadm.conf files and  saw some small differences. The
comments were a bit different and there was a blank line only in one of them. Of
particlar note is on the encryption over raid machine the UUID keyword was
capitalized, where as in the man page and the raid only machine it is lower case.
I don't know if that is a problem or not, but it is something for me to check.

Comment 37 Bruno Wolff III 2008-03-12 19:32:03 UTC
The etc/mdadm.conf file was just a copy of the installed one. I tried changing
the case, but that didn't affect the boot process.
The next step will be to grab the latest mkinitrd from koji. Based on the
comments I don't think there is a fix for the deleting too much of the device
name bug I mentioned above. But I'll double check this once I get it installed.

Comment 38 Bruno Wolff III 2008-03-12 19:53:47 UTC
mkinitrd-6.0.35-1.fc9.x86_64.rpm still has the problem where deleting to a /
deletes the whole device name because the device name ends with a /.
I'll make a local fix for that again and retest it to see if it impacts the
other problem.

Comment 39 Bruno Wolff III 2008-03-12 20:33:13 UTC
After reapplying the fixes for luks devices, I still get teh same problem I had
with 6.0.34. The is a warning about no devices found when it appears to be
trying to start the raid arrays. The uuids in /etc/mdadm.conf file is correct as
I checked them in rescue mode using mdadm -D (unless that doesn't actually read
that info from the device).

Comment 40 Will Woods 2008-03-12 23:22:09 UTC
Your problem is almost definitely Something Else, then. We'll need to file a
different bug about that to keep things straight.

Comment 41 Bruno Wolff III 2008-03-13 09:45:48 UTC
OK. Which things should get another bug filed?
Not checking for mapper/ for luks devices in addition to dm- ?
Incorrectly stripping away the whole device name when calculating slavedev
because the device name (at least for md devices) is returned with a trailing / ?
Some unknown problem that appears to be related to correctly setting up raid
when encyrpted devices are used on top of raid?

Related to the last issue, my latest guess there is that the proper sata drivers
may not be being loaded and I was wondering if you could suggest a way to test
that theory?

I'll have physical access to the machine on Thursday and Friday, and will be
going pretty close to it on Sunday (and might be able to do a quick test that
evening). sshd works in rescue mode so I can run mkinitrd and look at what it is
building remotely; I just can't test rebooting remotely as I can't enter a
password. (And if a reboot were to fail I'd be locked out until I got physical
access again.)

Comment 42 Bruno Wolff III 2008-03-13 10:25:28 UTC
437231 might be related to my problem and cover the boot problem relating to raid.

Comment 43 Jesse Keating 2008-03-17 23:55:30 UTC
The mkinitrd in rawhide today works with UUID, LVM and even encryption.

Comment 44 Khamit Ardashev 2008-05-15 20:30:04 UTC
I've seen the same thing in FC9 and decided to let go LVM. 
Tried to use intel fakeraid on intel dp35dp mb.
amazingly, it also failed on first boot.

I currently run FC7 and wanted to bump up to FC9 - I CAN'T !!!
I think support for FC7 should be extended until all important bugs are out of FC9.
 
P.S. would you guys stop breaking something that works?