Bug 501198 - boot hangs with with encrypted lvm pv
Summary: boot hangs with with encrypted lvm pv
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: mkinitrd
Version: 11
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Peter Jones
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 518551 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-05-17 22:04 UTC by Martin Ebourne
Modified: 2010-01-12 16:22 UTC (History)
13 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2010-01-12 16:22:26 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Patch to work around empty/malformed crypttab. (913 bytes, patch)
2009-05-18 14:46 UTC, Peter Jones
no flags Details | Diff
initrd image (3.83 MB, application/octet-stream)
2009-05-21 13:33 UTC, Bruno Wolff III
no flags Details
init script built with -83 (works as expected) (2.65 KB, text/plain)
2009-05-26 19:59 UTC, Bruno Wolff III
no flags Details
init script made by -86 (does not work properly) (2.85 KB, text/plain)
2009-05-26 20:00 UTC, Bruno Wolff III
no flags Details
init made with -86 on home machine that works properly (2.71 KB, application/octet-stream)
2009-05-27 03:35 UTC, Bruno Wolff III
no flags Details
patch against mkinitrd-6.0.86-2.fc11.i586 (389 bytes, patch)
2009-06-10 13:15 UTC, Stéphane Lesimple
no flags Details | Diff

Description Martin Ebourne 2009-05-17 22:04:27 UTC
Description of problem:
With latest mkinitrd system fails to boot, I get a blank screen and a stuck system where plymouth would usually start.

The difference is due to this incantation which has changed in the init script:

@@ -66,8 +67,10 @@
 modprobe scsi_wait_scan
 rmmod scsi_wait_scan
 mkblkdevs
-echo Setting up disk encryption: /dev/sda2
-plymouth ask-for-password --command "cryptsetup luksOpen /dev/sda2 luks-3f98a609-85a7-4c62-8692-5100ad7a9f8e"
+setDeviceEnv LUKSUUID 
+echo Setting up disk encryption: $LUKSUUID
+buildEnv LUKSUUID cryptsetup luksOpen $LUKSUUID luks-3f98a609-85a7-4c62-8692-5100ad7a9f8e
+plymouth ask-for-password --command $LUKSUUID
 echo Scanning logical volumes
 lvm vgscan --ignorelockingfailure
 echo Activating logical volumes

If I edit the initrd and revert that change back to the two line version system boots as normal.

Version-Release number of selected component (if applicable):
mkinitrd-6.0.84-1.fc11.x86_64

How reproducible:
Every time

Comment 1 Peter Jones 2009-05-18 14:18:24 UTC
Can you show us what /etc/crypttab says?

Comment 2 Peter Jones 2009-05-18 14:46:27 UTC
Created attachment 344449 [details]
Patch to work around empty/malformed crypttab.

Also, can you test this patch to see if it correctly generates a working initrd.

Comment 3 Martin Ebourne 2009-05-18 19:29:34 UTC
=== /etc/cryptab dated 2008-04-26 ===
luks-sda2               /dev/sda2       none
======

The machine was installed using crypted pv pretty much as soon as anaconda supported that, back in F9 or maybe F8 timeframe. Would have used manual partitioning in anaconda.

I'll try the patch out shortly.

Comment 4 Martin Ebourne 2009-05-18 20:11:07 UTC
Confirmed patch in comment #2 is good.

Comment 5 Bruno Wolff III 2009-05-19 15:29:24 UTC
mkinitrd-6.0.85-1.fc11 is still broken for me.

[bruno@cerberus ~]$ more /etc/crypttab
luks-0459b95f-cc7b-4229-8f07-c3012582c726                /dev/md3        none
luks-b93e9fce-0ef0-4d55-a5c8-56deabaf2f61                /dev/md4        none
[bruno@cerberus ~]$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/luks-9cfcdd2a-d2ff-4c30-818c-344764c7cc21
                      41283648  35398808   5465412  87% /
/dev/mapper/luks-b93e9fce-0ef0-4d55-a5c8-56deabaf2f61
                     147153724  91442492  54216232  63% /home
/dev/mapper/luks-0459b95f-cc7b-4229-8f07-c3012582c726
                      41283648   8196692  32667528  21% /play
/dev/md0                256586     28478    225459  12% /boot
tmpfs                  1018580       112   1018468   1% /dev/shm

Comment 6 Bruno Wolff III 2009-05-19 18:21:11 UTC
I confirmed that rebuilding with mkinitrd-6.0.83-1.fc11.x86_64 got the kernel entry I started having problems with (after rebuilding with mkinitrd-6.0.84-1.fc11.x86_64 and then mkinitrd-6.0.85-1.fc11.x86_64) to work again. This suggests that my problem is caused/triggered by the mkinitrd change (and not say by the recent plymouth changes).

Comment 9 Will Woods 2009-05-20 22:06:03 UTC
Are you sure the broken initrd was built with mkinitrd-6.0.85? Can you compare the working init script with the one generated by mkinitrd-6.0.85 and post the results here?

Comment 10 Bruno Wolff III 2009-05-21 03:11:05 UTC
I believe so, but I want to retest to make sure. I want to do a kernel update tonight and I can combine a retest with -85 at the same time. I'll report back what I find.

Comment 11 Bruno Wolff III 2009-05-21 05:49:26 UTC
I tried it on my home machine and -85 worked. But the format of my /etc/crypttab at home is noticeably different. I'll retest it at work tomorrow. It may be that almost no one will have that format. I went through some intermediate periods of the initial encrypted root setup and some more when plymouth came to be. And adjusting things for changes may have left it in a state that worked, but which is very unlikely for other systems to be in.
Just to compare for my home system:
bash-4.0$ cat /etc/crypttab
luks-585ccbdd-26aa-4d06-ac88-e412c7dc6135 UUID=585ccbdd-26aa-4d06-ac88-e412c7dc6135 none
luks-f022434a-2aef-438a-836d-109e7b4ce931 UUID=f022434a-2aef-438a-836d-109e7b4ce931 none
luks-58aa4879-4d9f-4074-ac4b-173be649c36d UUID=58aa4879-4d9f-4074-ac4b-173be649c36d none
luks-bb224e36-976e-49fc-86a6-ee8c23b0694f UUID=bb224e36-976e-49fc-86a6-ee8c23b0694f none
bash-4.0$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/luks-58aa4879-4d9f-4074-ac4b-173be649c36d
                      41291328  38952824   1919212  96% /
/dev/mapper/luks-f022434a-2aef-438a-836d-109e7b4ce931
                     214509116 104551932  99060732  52% /home
/dev/mapper/luks-585ccbdd-26aa-4d06-ac88-e412c7dc6135
                      41291328  33636412   7235624  83% /otheros
/dev/md4                256586     43302    200036  18% /boot
tmpfs                  1031548       172   1031376   1% /dev/shm

Comment 12 Bruno Wolff III 2009-05-21 13:33:56 UTC
Created attachment 344958 [details]
initrd image

It turns out it isn't as bad as I thought. I was repeatedly offered a password prompt, but because there was a message after the first one and not after the second, I kept going. After entering it four times (once for each encrypted file system, each having the same passphrase) it booted successfully.

In the past I only needed to enter the password once and it would try it on each file system in turn.

I didn't have to do this on my home machine, which has a very similar setup. So I think the format of the cryptsetup file is influencing things. If you want I can try changing the format to see if that rectifies the issue? I can try adding an entry for all 4 file systems and/or using UUID= instead of a path.

Comment 13 Bruno Wolff III 2009-05-21 15:55:55 UTC
I have been seeing memory allocations fail this morning. I suspect that swap might not have been set up correctly. So for now my plan is to switch the format of crypttab to match my home system, run mkinitrd and reboot. I'll report back how that works. If you want some other configuration tested let me know.

Comment 14 Bruno Wolff III 2009-05-21 16:27:04 UTC
I confirmed that the swap fs was not luksopened. I changed my crypttab format to use UUID specifications and to include entries for the swap and root devices, enabled swap and rebuilt the initrd image. However I am still seeing the same symptoms. I need to enter a password for all 4 devices and the swap device is not opened after the reboot.
This machine has a very similar setup to the machine at home and things are working as expected there. The work machine is x86_64 and the home machine is i386.

Comment 15 Bruno Wolff III 2009-05-21 19:45:37 UTC
I retested with mkinitrd-6.0.86-1.fc11.x86_64 and am still seeing the same symptoms.

Comment 16 Bruno Wolff III 2009-05-26 19:59:51 UTC
Created attachment 345512 [details]
init script built with -83 (works as expected)

I am going to include the two extracted init scripts from the initrd file to make it easier to compare the one that works as expected and the one that doesn't.
I am starting with the one that works.

Comment 17 Bruno Wolff III 2009-05-26 20:00:42 UTC
Created attachment 345513 [details]
init script made by -86 (does not work properly)

Comment 18 Bruno Wolff III 2009-05-27 03:35:53 UTC
Created attachment 345552 [details]
init made with -86 on home machine that works properly

initrd files built with the -86 version of mkinitrd on my home machine work properly.

Comment 19 Bug Zapper 2009-06-09 16:00:18 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 20 Stéphane Lesimple 2009-06-10 13:15:08 UTC
Created attachment 347218 [details]
patch against mkinitrd-6.0.86-2.fc11.i586

Argh, took a couple of hours to track the problem I had after an upgrade from F10 to F11 down to this bug...

The initrd created on my machine with mkinitrd-6.0.86-2.fc11.i586 won't boot, the breakage is caused by the change in mkinitrd's emitcrypto() as Martin Ebourne pointed out.

The cause is pretty stupid, and so is the fix : my /etc/crypttab has its fields separated by TABs instead of spaces, hence `grep "^$2 " /etc/crypttab` will always return nothing. I created my /etc/crypttab by hand and I did not use UUIDs (I know, I should, I'll change this...) so the workaround proposed by Peter Jones doesn't work either.

Plymouth ask-for-password ends up trying to execute "cryptsetup luksOpen <device> <luksname>" with <device> being an empty string, which has no chance to work.

Attached patch fixes the problem.

Comment 21 udo 2009-06-19 14:08:57 UTC
I have this same issue since getting F10 upgraded to F11.
F10 was the original Fedora installed.
I got a line like:
luks-4a65c764-b0b1-4b1f-94fb-c76d1bc3e287 UUID=4a65c764-b0b1-4b1f-94fb-c76d1bc3e287 none

in crypttab while /dev/md1 is the encrypted device.
How is that supposed top work?!
Not even the recent patch, thanks for that, can fix this.
In other words:
How can I make this work again? 


This issue is blocking any kernel upgrades and could render a system 99.8% unoperable. So priority and severity are severely understated.

Comment 22 udo 2009-06-19 14:39:38 UTC
Please do not forget to have the installer (anaconda) verified and possibly updated for this issue.

/dev/md1 is not the encrypted device, but md1 carries the encrypted device.
the name of the encrypted device on md1 is what needs to go in place of luks-blabla in the previous comment.
Then rebuild the ramdisk and stuff will work.

Thanks redhat for regression-testing this feature. (yes I know a tiny bit about testing)

Comment 23 Bruno Wolff III 2009-06-26 15:07:52 UTC
For some additional information, I am seeing blkid differences between my system that works properly and the one that doesn't. In particular /dev/md1 is identified as a swap device in one case (when it shouldn't be, as it is encrypted and should show as an encrypted device) and not the other.

For the system that works:
bash-4.0$ blkid -s TYPE
/dev/sda3: TYPE="mdraid" 
/dev/sda1: TYPE="mdraid" 
/dev/sda2: TYPE="mdraid" 
/dev/md9: TYPE="crypt_LUKS" 
/dev/md4: TYPE="ext3" 
/dev/md8: TYPE="crypt_LUKS" 
/dev/sda5: TYPE="mdraid" 
/dev/sda6: TYPE="mdraid" 
/dev/sdb1: TYPE="mdraid" 
/dev/sdb2: TYPE="mdraid" 
/dev/sdb3: TYPE="mdraid" 
/dev/sdb5: TYPE="mdraid" 
/dev/sdb6: TYPE="mdraid" 
/dev/md6: TYPE="crypt_LUKS" 
/dev/md10: TYPE="crypt_LUKS" 
/dev/dm-0: TYPE="ext3" 
/dev/dm-1: TYPE="swap" 
/dev/dm-2: TYPE="ext3" 
/dev/dm-3: TYPE="ext3" 
/dev/mapper/luks-585ccbdd-26aa-4d06-ac88-e412c7dc6135: TYPE="ext3" 
/dev/mapper/luks-f022434a-2aef-438a-836d-109e7b4ce931: TYPE="ext3" 
/dev/mapper/luks-bb224e36-976e-49fc-86a6-ee8c23b0694f: TYPE="swap" 
/dev/mapper/luks-58aa4879-4d9f-4074-ac4b-173be649c36d: TYPE="ext3" 
bash-4.0$ 

For the system that doesn't work properly:
[bruno@cerberus ~]$ blkid -s TYPE
/dev/md2: TYPE="crypt_LUKS" 
/dev/sdb1: TYPE="mdraid" 
/dev/sdb2: TYPE="mdraid" 
/dev/sdb3: TYPE="mdraid" 
/dev/sdb5: TYPE="mdraid" 
/dev/sdb6: TYPE="mdraid" 
/dev/md1: TYPE="swap" 
/dev/sda3: TYPE="mdraid" 
/dev/sda1: TYPE="mdraid" 
/dev/sda2: TYPE="mdraid" 
/dev/sda5: TYPE="mdraid" 
/dev/sda6: TYPE="mdraid" 
/dev/md3: TYPE="crypt_LUKS" 
/dev/md0: TYPE="ext3" 
/dev/md4: TYPE="crypt_LUKS" 
/dev/loop0: TYPE="ext4" 
/dev/dm-0: TYPE="ext3" 
/dev/dm-1: TYPE="ext3" 
/dev/dm-2: TYPE="ext3" 
/dev/mapper/luks-0459b95f-cc7b-4229-8f07-c3012582c726: TYPE="ext3" 
/dev/mapper/luks-b93e9fce-0ef0-4d55-a5c8-56deabaf2f61: TYPE="ext3" 
/dev/mapper/luks-9cfcdd2a-d2ff-4c30-818c-344764c7cc21: TYPE="ext3" 
/dev/dm-3: TYPE="swap" 
/dev/dm-4: TYPE="ext4" 
[bruno@cerberus ~]$

Comment 24 Bruno Wolff III 2009-06-26 16:59:56 UTC
Since a previous comment mentioned tabs, I checked and found that I had tabs in the /etc/crypttab file on the problem machine. However replacing them with spaces didn't fix the problem.

Since the blkid output suggested a swap signature was being picked up on /dev/md1 and since it was just a swap area, I wiped the /dev/md1 block device and used cryptsetup to create a new luks device on it. And then made that device a swap device. I adjusted /etc/fstab and /etc/crypttab to use the new luks UUID. And now things work normally.

So it looks like the swap signature wasn't a problem before, but something changed to make it a problem.

Comment 25 Bruno Wolff III 2009-06-26 17:01:19 UTC
(And I rebuilt the initrd image.)

Comment 26 Hans de Goede 2009-08-21 11:42:02 UTC
*** Bug 518551 has been marked as a duplicate of this bug. ***

Comment 27 Tomas Hoger 2009-10-19 08:54:21 UTC
(In reply to comment #2)
> Created an attachment (id=344449) [details]
> Patch to work around empty/malformed crypttab.
> 
> Also, can you test this patch to see if it correctly generates a working
> initrd.  

Peter, would it be possible to emit some warning during the initrd generation when no matching entry is found in crypttab?  Or at least when $2 does not look like an anaconda-generated name (i.e. luks-UUID, so no luks- prefix)?

I had a problem with setup as: encrypted / using luks device named "root" on top of /dev/vg0/root LV.  But I had no entry for root in /etc/crypttab (yes, I know that's probably a misconfiguration on my side, crypttab was hand-made), as F10 mkinitrd had no problems with it and was able to figure out correct "cryptsetup luksOpen" arguments for / even without an entry in crypttab.  F11 mkinitrd generated initrd that tried to luksOpen UUID=root and that failed.

Comment 28 Hans de Goede 2010-01-12 15:32:44 UTC
This is a mass edit of all mkinitrd bugs.

Thanks for taking the time to file this bug report (and/or commenting on it).

As you may have heard in Fedora 12 mkinitrd has been replaced by dracut. In Fedora 12 the mkinitrd package is still around as some programs depend on
certain libraries it provides, but mkinitrd itself is no longer used.

In Fedora 13 mkinitrd will be removed completely. This means that all work
on initrd has stopped.

Rather then keeping mkinitrd bugs open and giving false hope they might get fixed we are mass closing them, so as to clearly communicate that no more work will be done on mkinitrd. We apologize for any inconvenience this may cause. 

If you are using Fedora 11 and are experiencing a mkinitrd bug you cannot work around, please upgrade to Fedora 12. If you experience problems with the initrd in Fedora 12, please file a bug against dracut.

Comment 29 udo 2010-01-12 15:52:27 UTC
Are we sure that the core issue that causes this bug does not cause harm with
dracut?

Please explain why the root cause is not a problem for dracut.

Comment 30 Hans de Goede 2010-01-12 16:22:26 UTC
(In reply to comment #61)
> Are we sure that the core issue that causes this bug does not cause harm with
> dracut?
> Just closing a few bugs because a package disappears does not show a dedication
> for quality.
> 
> Please explain why the root cause is not a problem for dracut.    

dracut is a complete rewrite, not re-using any code, using a completely different
princicple to find the rootfs to boot. And the initial comment points out that
the problem is in the mkinitrd generated script, which is gone now.

Closing again.


Note You need to log in before you can comment on or make changes to this bug.