Bug 435228
Summary: | mkinitrd doesn't grab dm modules for LVs listed by LABEL or UUID | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Will Woods <wwoods> |
Component: | mkinitrd | Assignee: | Peter Jones <pjones> |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | rawhide | CC: | amlau, bruno, dcantrell, dlehman, jeff, john.ellson, jonstanley, jwboyer, katzj, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | hotissue | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-03-17 23:55:30 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 235706, 430962 | ||
Attachments: |
Description
Will Woods
2008-02-28 03:16:02 UTC
Created attachment 296151 [details]
git-format-patch that fixes this problem
This patch fixes lvshow() to correctly identify which VolGroup (if any) a dm-X
device belongs to.
Created attachment 296268 [details]
patch for mkinitrd
This patch seems to do the trick, so I think I'll use it instead.
Aha, this fails exactly the same way in Fedora 8. But when I switched anaconda to use UUID=, it was with the (apparently mistaken :-) impression that we had fixed this mkinitrd problem before F8. So, added the heuristic to anaconda to not list logical volumes with UUID= for now. We should still fix this, though F9Beta because a default install is unbootable. Not F9beta because the default install isn't writing LABEL or UUID for the rootfs anymore (we use the same logic we used to use). That's why Will moved it off the beta blocker last week. FYI this also causes the following output when installing kernels on my machine: 1:kernel ########################################### [100%] usage: find [-type type] [path [-name file]] usage: find [-type type] [path [-name file]] usage: find [-type type] [path [-name file]] usage: find [-type type] [path [-name file]] usage: find [-type type] [path [-name file]] usage: find [-type type] [path [-name file]] usage: find [-type type] [path [-name file]] usage: find [-type type] [path [-name file]] usage: find [-type type] [path [-name file]] usage: find [-type type] [path [-name file]] usage: find [-type type] [path [-name file]] usage: find [-type type] [path [-name file]] usage: find [-type type] [path [-name file]] I tracked it down to mkinitrd looking for: Looking for driver for device /sys/block/dm-1/ and having an empty $device in that section of script. It's apparently not fatal since the box still boots, but it's definitely confusing. Created attachment 297247 [details]
patch to make findstoragedriver() consider "mapper/*" devices dm devs
Something else seems to have changed, again - either the LVM tools or /sys
layout. Anyway, it's not recognizing *crypt* devices as dm devices now.
Attached patch fixes.
The latest patch hasn't gotten into koji yet (at least as far as I can tell). 6.0.34 is busted and will break things for people updating their kernels. I hit this last night and had to reinstall 6.0.32 and then reupdate the kernel to get things working again. (In reply to comment #7) > Created an attachment (id=297247) [edit] > patch to make findstoragedriver() consider "mapper/*" devices dm devs Is this related to crypted devs or just LVM? And what's in /etc/fstab for them? Crypted devs specifically, this time. Normally the crypt modules get pulled in by handledm() if the dm device is dmtype 'crypt'. mkinitrd's vgdisplay() does: lvm vgdisplay --ignorelockingfailure -v $1 2>/dev/null | sed -n 's/PV Name//p' to get the backing PV device for a given VG. For normal LVM that'll be something like "/dev/sda2" but for crypted PVs it'll be something like "/dev/mapper/luks-sda2". findstoragedriver() then gets "mapper/luks-sda2" as the device name. If the device name matches 'dm-*' then handledm() gets run. But "mapper/luks-sda2" doesn't match that, so handledm() never runs for LUKS devices, so we never pull in crypt modules. the LABEL/UUID stuff isn't significant here, but it's the same root cause - failure to properly identify devicemapper devs. I am not getting prompted for keys when using encryption over raid. I was getting prompted for keys after installing encryption over LVM and it seemed to work. The rawhide snapshots weren't the same though. I can get into the system to do stuff using rescue mode with today's netinst.img. After doing a chroot I can even run yum to install stuff. Updating to today's kernel didn't help. I tried both the 6.0.34 and 6.0.32 versions of mkinitrd. This is probably a different problem than whatever broke other system which just used raid. I fixed that by reinstalling the kernel with 6.0.32, but the latest kernel was installed with 6.0.34 and it worked. It may have been that the kernel I installed that had a problem wasn't using either of those versions. And perhaps if I had just done a reinstall with 6.0.34 it would have also fixed things. I tried installing the patch to mkinitrd 6.0.34 and then doing a yum reinstall to update the kernel. The timestamp on the initrd image file changed, so it looks like it should have been rebuilt using the updated mkinitrd. However I still don't get prompted for a password during the boot process and the root switch fails. Does your /etc/fstab list the root/swap devices by UUID= or LABEL=? Because that will still fail. If so, edit your fstab and change those devices to /dev/VolGroupXX/LogVolXX. You can rebuild the initrd by doing: mkinitrd -v -f /boot/initrd-[version].img [version] And you can inspect the contents of the initrd with: gzip -dc /boot/initrd-[version].img | pax Check to be sure dm-mod.ko and dm-crypt.ko are present. If they're both there, then you've got a different problem. In fstab the 4 devices (/,/home,/play and swap) are all named as /dev/mapper/luks-md? where ? is 1-4. The crypto modules weren't included. I ran mkinitrd as above with --with= to get dm-mod and dm-crypt loaded. I still didn't get asked for a key and the switch root didn't work. I noticed that mkinitrd has some things it includes only if there is at least one crypto device. Maybe there is something else that isn't getting included? No, just adding the modules is not enough. You need for mkinitrd to properly detect the devices as crypt devices so it will add the modules and the proper scripts to set everything up. Try rewriting your fstab to use /dev/VolGroupXX device names, as suggested before. I am not using LVM. Does that advice still apply? Err, hmm. Probably not useful advice, then. Can you run: bash -x /sbin/mkinitrd -v -f /tmp/mkinitrd.img [kernel ver] &> mkinitrd.log and attach the resulting mkinitrd.log? The contents of /proc/partitions, /etc/fstab, and the result of "dmsetup table" would also be useful. Created attachment 297515 [details]
/proc/partitions
I'll try to get all the requested stuff uploaded shortly.
Created attachment 297517 [details]
/etc/fstab
Created attachment 297518 [details]
dmsetup table output
Created attachment 297520 [details]
This is the requested mkinitrd output.
Note this was with a vanilla 6.0.34 mkinitrd. (I reinstalled to get rid of the
patch included in this bug that I had used for some tests.)
"Looking for driver for device mapper/luks-md2" handledm() never runs ('cuz "mapper/" doesn't match) so you don't get the crypt setup or modules. Apply the patch from comment #7 and re-run mkinitrd. Created attachment 297538 [details]
Another mkinitrd log file after applying the patch.
It still doesn't load the crypto modules even with the patch.
I am trying to figure out mkinitrd and one thing that seems suspicious is the manipulation of slavedev. For one luks device it first is /sys/block/md1/ and then a bashism I don't grok is applied and it becomes an empty string. I expect it is supposed to do something that returns md1 or md1/, but this isn't happening. I'll ponder on that for a while, but I figured it might save you some time to mention it here. OK, I found what ##* means and this looks like an oops in that the device name is returned with a trailing / so that the whole string is removed. My guess is that isn't what was intended. I'll see if I can cook something up that removes a trailing slash first and see if that gets more reasonable output from mkinitrd. For a hack I added in the line: slavedev=${slavedev%*/} that will break things if they don't end in /, so probably isn't the right fix. However, this did result in the crypto modules being added and their count seemed to be getting updated correctly. I am not physically at the machine right now, so I can't test to see if a reboot will work (since I can't enter a password remotely), but I will be able to a in few hours when I get into the office. It looks like the following is probably a reasonable way to strip off any trailing /s: slavedev=`expr match "$slavedev" '\(.*[^/]\)'` I tried it out with this just before slavedev=${slavedev##*/} and things seem to work reasonably, though I still haven't testing booting yet; I've just looked at the included modules. There is still a problem or two. The swap device wasn't handled. Eventually a prompt for the password was displayed but additional output was displayed indicating the switching the root had failed. I was still able to enter a passphrase and it could detect the difference between a correct and an incorrect one. But creating /dev/root failed. A second time there was a message about the swap device. I expect it to fail the resume, but it said Unable to access resume device (/dev/mapper/luks-md1) which suggests that encryption or raid wasn't properly set up, as normally there is a message about not finding a suspend signature. Created attachment 297640 [details] bash -x output of mkinitrd This is the bash -x output with patch from comment 7 and my check to remove trailing /s. I won't be available most of the afternoon so I won't be able to give you the quick turn around that I was able to yesterday. Peter's proposed patch at http://pjones.fedorapeople.org/mkinitrd-lvlabel.patch fixes the problem for the default case. Not sure if this would fix the LVM-on-encrypted-md0 case - I haven't finished an install with that setup. I'll test that patch sometime tomorrow morning. I forgot to start up the ssh daemon, so I can't try generating the initrd tonight. I took a look at the patch and I don't think it is going to solve my problem. While I am having similar symptoms, none of my file systems are specified by label instead of device. Created attachment 297801 [details]
init file produced by mkinitrd
I took a look at the generated init file and it looks reasonable to me.
Patch is in mkinitrd-6.0.35-1.fc9, I believe. I looked at things more carefully and saw that mdadm was reporting a problem before the luksopen prompts. I also noticed that there were usb device notices displayed so that keyboard an mouse detection was happening around the same time. The mdadm messages are probably pointing to the problem but perhaps the usb detection is also an issue. I will compare init files to see if the change I put in to avoid the bad pattern match for handing crypt devices broke raid devices. I can also check that version of mkinitrd out. The mdadm calls in the init files looked the same between my raid only machine and my encryption over raid machine. I then compared the etc/mdadm.conf files and saw some small differences. The comments were a bit different and there was a blank line only in one of them. Of particlar note is on the encryption over raid machine the UUID keyword was capitalized, where as in the man page and the raid only machine it is lower case. I don't know if that is a problem or not, but it is something for me to check. The etc/mdadm.conf file was just a copy of the installed one. I tried changing the case, but that didn't affect the boot process. The next step will be to grab the latest mkinitrd from koji. Based on the comments I don't think there is a fix for the deleting too much of the device name bug I mentioned above. But I'll double check this once I get it installed. mkinitrd-6.0.35-1.fc9.x86_64.rpm still has the problem where deleting to a / deletes the whole device name because the device name ends with a /. I'll make a local fix for that again and retest it to see if it impacts the other problem. After reapplying the fixes for luks devices, I still get teh same problem I had with 6.0.34. The is a warning about no devices found when it appears to be trying to start the raid arrays. The uuids in /etc/mdadm.conf file is correct as I checked them in rescue mode using mdadm -D (unless that doesn't actually read that info from the device). Your problem is almost definitely Something Else, then. We'll need to file a different bug about that to keep things straight. OK. Which things should get another bug filed? Not checking for mapper/ for luks devices in addition to dm- ? Incorrectly stripping away the whole device name when calculating slavedev because the device name (at least for md devices) is returned with a trailing / ? Some unknown problem that appears to be related to correctly setting up raid when encyrpted devices are used on top of raid? Related to the last issue, my latest guess there is that the proper sata drivers may not be being loaded and I was wondering if you could suggest a way to test that theory? I'll have physical access to the machine on Thursday and Friday, and will be going pretty close to it on Sunday (and might be able to do a quick test that evening). sshd works in rescue mode so I can run mkinitrd and look at what it is building remotely; I just can't test rebooting remotely as I can't enter a password. (And if a reboot were to fail I'd be locked out until I got physical access again.) 437231 might be related to my problem and cover the boot problem relating to raid. The mkinitrd in rawhide today works with UUID, LVM and even encryption. I've seen the same thing in FC9 and decided to let go LVM. Tried to use intel fakeraid on intel dp35dp mb. amazingly, it also failed on first boot. I currently run FC7 and wanted to bump up to FC9 - I CAN'T !!! I think support for FC7 should be extended until all important bugs are out of FC9. P.S. would you guys stop breaking something that works? |