Bug 692198 - LVM-on-LUKS detection borked
Summary: LVM-on-LUKS detection borked
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: initscripts
Version: 15
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Bill Nottingham
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-03-30 16:42 UTC by Tim Waugh
Modified: 2014-03-17 03:27 UTC (History)
13 users (show)

Fixed In Version: initscripts-9.29-1.fc15
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-04-15 21:13:36 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
/etc/fstab (1.29 KB, text/plain)
2011-04-01 10:37 UTC, Tim Waugh
no flags Details
/proc/cmdline (297 bytes, text/plain)
2011-04-01 10:37 UTC, Tim Waugh
no flags Details
dmesg (66.90 KB, text/plain)
2011-04-01 10:39 UTC, Tim Waugh
no flags Details
/var/log/messages (148.97 KB, text/plain)
2011-04-01 10:40 UTC, Tim Waugh
no flags Details
systemd --dump (524.29 KB, text/plain)
2011-04-01 10:41 UTC, Tim Waugh
no flags Details

Description Tim Waugh 2011-03-30 16:42:38 UTC
Description of problem:
Not entirely sure where to file this, but here's what I'm seeing.

During boot, only one of the encrypted volume groups on my system is correctly activated.  All encrypted devices share the same global passphrase, which I entered when prompted at boot.

Some boots, the other volume groups are not seen at all until I run 'vgscan'.

Once they are seen, I have to run 'vgchange -a y' and then 'systemd-tty-ask-password-agent' in order to get them activated.

Version-Release number of selected component (if applicable):
kernel-2.6.38.2-8.fc15.x86_64
systemd-20-2.fc15.x86_64
cryptsetup-luks-1.2.0-2.fc15.x86_64
device-mapper-1.02.63-1.fc15.x86_64
lvm2-2.02.84-1.fc15.x86_64
plymouth-0.8.4-0.20110304.1.fc15.x86_64
initscripts-9.28-1.fc15.x86_64
dracut-009-2.fc15.noarch

How reproducible:
Every boot.

Steps to Reproduce:
1.Boot

Additional info:
This is what 'dmsetup ls --tree' says after booting:

luks-cd4fe06b-cad2-4a42-8925-9d309ca30c62 (253:3)
 └─ (8:17)
luks-9ea37ae9-877d-4fd0-9506-029d6fbaf1ca (253:2)
 └─vg_worm01-LogVol00 (253:1)
    └─luks-4ea0a3b7-5868-426d-8f12-ac58e2911efb (253:0)
       └─ (8:3)
luks-2a3bee45-582c-4045-9042-2e48d5febf50 (253:4)
 └─ (8:6)
luks-75bf1f20-e934-4066-9369-192114dc8653 (253:5)
 └─ (8:5)

This is what it says after I have successfully activated the volume groups:

luks-cd4fe06b-cad2-4a42-8925-9d309ca30c62 (253:5)
 └─ (8:17)
vg_worm00-LogVol00 (253:8)
 └─luks-2a3bee45-582c-4045-9042-2e48d5febf50 (253:7)
    └─ (8:6)
luks-9ea37ae9-877d-4fd0-9506-029d6fbaf1ca (253:2)
 └─vg_worm01-LogVol00 (253:1)
    └─luks-4ea0a3b7-5868-426d-8f12-ac58e2911efb (253:0)
       └─ (8:3)
luks-2fd63944-5dc4-406f-b383-b6f7ba626b83 (253:6)
 └─vg_worm-LogVol00 (253:4)
    └─luks-75bf1f20-e934-4066-9369-192114dc8653 (253:3)
       └─ (8:5)

Comment 1 Jóhann B. Guðmundsson 2011-03-30 16:53:15 UTC
could you try the latest systemd release in koji and with and without selinux enabled?

http://koji.fedoraproject.org/koji/buildinfo?buildID=236648

Comment 2 Tim Waugh 2011-03-31 11:48:56 UTC
Here's what happened after I upgraded to systemd-21-2.fc15 and booted with 'enforcing=0':

I was asked (via plymouth) for a passphrase, and got normal boot-up.

'dmsetup ls' only showed the luks devices associated with vg_worm01-LogVol00 (i.e. 253:0 and 253:2).

I ran 'systemd-tty-ask-password-agent' and was prompted for passphrases for the other luks devices.  After that, 'dmsetup ls' did show the other luks devices.

However, the logical volumes were still not active.  After running 'vgchange -a y', and again running systemd-tty-ask-password-agent, the logical volumes were finally all present and active.

So it still isn't working I'm afraid.

[twaugh@worm ~]$ rpm -q systemd
systemd-21-2.fc15.x86_64
[twaugh@worm ~]$ getenforce 
Permissive

Comment 3 Jóhann B. Guðmundsson 2011-03-31 12:31:05 UTC
I'll need ask you to follow all bug reports section here

http://fedoraproject.org/wiki/How_to_debug_Systemd_problems

Along with providing /etc/fstab

Comment 4 Tim Waugh 2011-04-01 10:37:22 UTC
Created attachment 489350 [details]
/etc/fstab

Comment 5 Tim Waugh 2011-04-01 10:37:58 UTC
Created attachment 489352 [details]
/proc/cmdline

Comment 6 Tim Waugh 2011-04-01 10:39:08 UTC
Created attachment 489353 [details]
dmesg

Comment 7 Tim Waugh 2011-04-01 10:40:15 UTC
Created attachment 489354 [details]
/var/log/messages

Comment 8 Tim Waugh 2011-04-01 10:41:25 UTC
Created attachment 489355 [details]
systemd --dump

Comment 9 Jóhann B. Guðmundsson 2011-04-01 10:59:09 UTC
Could you also provide the output from ls -alZh /dev/disk/by-uuid/ ( just a check to see if your UUID have a matching symlink to a device )

What happens if you dont use UUID does it work then?

Comment 10 Tim Waugh 2011-04-01 11:08:49 UTC
Here is the output of that command after I've used "vgchange -a y" to enable the extra logical volumes:

# ls -alZh /dev/disk/by-uuid
drwxr-xr-x. root root system_u:object_r:device_t:s0    .
drwxr-xr-x. root root system_u:object_r:device_t:s0    ..
lrwxrwxrwx. root root system_u:object_r:device_t:s0    2a3bee45-582c-4045-9042-2e48d5febf50 -> ../../sda6
lrwxrwxrwx. root root system_u:object_r:device_t:s0    2fd63944-5dc4-406f-b383-b6f7ba626b83 -> ../../dm-7
lrwxrwxrwx. root root system_u:object_r:device_t:s0    4ea0a3b7-5868-426d-8f12-ac58e2911efb -> ../../sda3
lrwxrwxrwx. root root system_u:object_r:device_t:s0    66412111-7bc5-404f-a072-0f764d8007fa -> ../../sda2
lrwxrwxrwx. root root system_u:object_r:device_t:s0    69e2f8b7-d7b9-4e63-947a-ead09b7ae744 -> ../../dm-3
lrwxrwxrwx. root root system_u:object_r:device_t:s0    75bf1f20-e934-4066-9369-192114dc8653 -> ../../sda5
lrwxrwxrwx. root root system_u:object_r:device_t:s0    8eb3fd79-e971-43ce-94b4-1ea875853ac1 -> ../../dm-2
lrwxrwxrwx. root root system_u:object_r:device_t:s0    9ea37ae9-877d-4fd0-9506-029d6fbaf1ca -> ../../dm-1
lrwxrwxrwx. root root system_u:object_r:device_t:s0    bb1e75a0-e1a3-4db0-a41a-50c8b3b0098f -> ../../dm-6
lrwxrwxrwx. root root system_u:object_r:device_t:s0    bd946eb5-2d51-4424-ba3d-a1b37dc797e5 -> ../../dm-8
lrwxrwxrwx. root root system_u:object_r:device_t:s0    cc6e7172-7f4e-4edf-8f08-a067ce88202e -> ../../sda1
lrwxrwxrwx. root root system_u:object_r:device_t:s0    cd4fe06b-cad2-4a42-8925-9d309ca30c62 -> ../../sdb1

As for using UUID:
The /mnt/backup partition is identified by label instead of UUID, and its logical volume is one of those that are not automatically activated at boot.  Does that answer your question, or is there another method I should try?

Comment 11 Jóhann B. Guðmundsson 2011-04-01 12:48:06 UTC
What does e2label say for the device as in is the device Labelled 
( if not you can set it by e2label /dev/$foo <label) or tune2fs -L <label> /dev/$foo )

What I'm trying to figure out is.. 

A) 

Does everything work with direct entry as in /dev/mapper/$foo or /dev/sd$n or equivalent

B)

Does everything work with UUID= as opposed to direct entrys or Label 
( hence the check if the relevant symlinks are in place )

C) 

Does everything work with Label= 
( hence I'm asking you check the device label with e2label )

Next is to check if things work ( or break ) if you have only direct entry's as opposed of using UUID or Labels 

Next would be checking with UUID entry's ( see of things break ) 
and then with Labels ( break )

Comment 12 Tim Waugh 2011-04-01 13:25:13 UTC
Oh, sorry, I mis-spoke: /mnt/backup is not on a logical volume but instead is a plain LUKS device with an ext3 fs.  So ignore that one.

So with, for example, my /home filesystem:

vg_worm00-LogVol00 (253:6)
 └─luks-2a3bee45-582c-4045-9042-2e48d5febf50 (253:5)
    └─ (8:6)

I currently have this in /etc/fstab for it:

/dev/mapper/vg_worm00-LogVol00  /home  ext4  noauto,defaults  1  2

and after booting, "mount /home" fails.  Should I try changing it to this?:

/dev/mapper/luks-2a3bee45-582c-4045-9042-2e48d5febf50  /home  ext4  noauto,default  1  2

(Note: it's noauto just because when the boot fails to mount it, it's harder to fix it up again otherwise...)

I can try with UUID= if I know which UUID to use.  Is this it?:

# tune2fs -l /dev/mapper/vg_worm00-LogVol00 | grep UUID
Filesystem UUID:          bb1e75a0-e1a3-4db0-a41a-50c8b3b0098f

I don't think there's a label on it:

# e2label /dev/mapper/vg_worm00-LogVol00

# 

but I could add one and try LABEL=... for it and see what that does.

Comment 13 Jóhann B. Guðmundsson 2011-04-01 13:51:28 UTC
start with uncommenting and or move above the Label= this line in /etc/fstab 

/dev/mapper/luks-2fd63944-5dc4-406f-b383-b6f7ba626b83 /mnt/f14

It seems to be what cryptsetup is waiting on and what you add when you activate it manually


> Job 55:
	Action: cryptsetup.target -> start
	State: waiting
	Forced: no
-> Job 56:
	Action: cryptsetup@luks\x2d2fd63944\x2d5dc4\x2d406f\x2db383\x2db6f7ba626b83.service -> start
	State: waiting
	Forced: no
-> Job 57:
	Action: dev-disk-by\x2duuid-2fd63944\x2d5dc4\x2d406f\x2db383\x2db6f7ba626b83.device -> start
	State: running
	Forced: no
-> Job 58:
	Action: dev-mapper-luks\x2d2fd63944\x2d5dc4\x2d406f\x2db383\x2db6f7ba626b83.device -> start
	State: waiting
	Forced: no

Comment 14 Jóhann B. Guðmundsson 2011-04-01 13:56:20 UTC
btw you can run bklid to find out which UUID is assigned to which /dev entry

Comment 15 Tim Waugh 2011-04-01 15:09:44 UTC
(In reply to comment #13)
> start with uncommenting and or move above the Label= this line in /etc/fstab 
> 
> /dev/mapper/luks-2fd63944-5dc4-406f-b383-b6f7ba626b83 /mnt/f14

I commented it out (is that what you meant?) and tried booting again.  No change.  I still need to run 'vgchange -a y' on boot to get the logical volumes activated.

With /home, I tried all these different lines, booting separately with each one uncommented at a time, and it was exactly the same story.  After 'vgchange -a y' (and, of course, systemd-tty-ask-password-agent), each of the lines worked fine and I was able to "mount /home" without problems.

Comment 16 Jóhann B. Guðmundsson 2011-04-01 16:46:23 UTC
Yeah sorry if I was unclear uncommenting it from fstab will ofcourse require you to manually activated.

Move it above the Label= line to see if it got loaded since it's the next line that gets parse after Label= from fstab 

If it loads fine before the Label= line then we know that Label= is the cause if not then we know that its ( you can also just uncomment the label line and see if everything but what is defined there gets activated and mounted corrected ) 

Just to be clear are all the lvm partition not being unlocked activated and mounted or just specific ones?

Comment 17 Tim Waugh 2011-04-02 11:25:10 UTC
(In reply to comment #16)
> Move it above the Label= line to see if it got loaded since it's the next line
> that gets parse after Label= from fstab 

OK, I'll try that.

> Just to be clear are all the lvm partition not being unlocked activated and
> mounted or just specific ones?

The only one that's mounted is the one holding "/": vg_worm01-LogVol00.

Comment 18 Tim Waugh 2011-04-04 11:35:41 UTC
(In reply to comment #16)
> Move it above the Label= line to see if it got loaded since it's the next line
> that gets parse after Label= from fstab 

That made no difference.

Comment 19 Jóhann B. Guðmundsson 2011-04-04 12:09:42 UTC
So we have narrow it down to only one encrypted drive/partition gets unlocked during bootup ( vg_worm01-LogVol00 ) and all encrypted drives/partitions share the same global passphrase.

Comment 20 Tim Waugh 2011-04-04 12:20:43 UTC
Yes.

Comment 21 Lennart Poettering 2011-04-06 00:57:41 UTC
Uh, am I understanding this correctly, this is LVM on top of LUKS? Not the other way round? Urks.

Comment 22 Lennart Poettering 2011-04-06 01:05:45 UTC
If this is LVM on top of LUKS I really don't see any easy way to fix this. LVM is not written in this old obsolete style that it assumes it is run when "all devices have been found", instead of listening to hotplug devices coming and going. The effect of that is that we'd have to know which crypto volumes to wait for before we invoke the storage scripts, and which ones we don't have to wait for. The ones below the LVM are the ones to wait for the ones above the LVM are the ones not to wait for.

I do wonder how this was solved previously and I have no clue how to fix this in future as long as LVM is still this 90s program that can't deal with devices coming and going properly.

Comment 23 Lennart Poettering 2011-04-06 01:06:51 UTC
WTF, this is even worse! This is LUKS on top of LVM on top of LUKS according to your tree? What's that supposed to be? A way to waste CPU cycles?

Comment 24 Lennart Poettering 2011-04-06 01:31:50 UTC
Hmm, OK, so in F14 to deal with this problem we did this:

1. Setup crypto
2. Setup LVM
3. Setup crypto again

In F15 we currently do this:

1. Setup LVM
2. While doing that: setup crypto when the devices pop up.

Now, the problem here is that we don't wait for crypto devices before starting LVM. Hence LVM won't usually see any, unless it wins the race. We could of course order LVM so that it runs after all crypto, but then we'd break the much more common LUKS-on-LVM in order to support LVM-on-LUKS. 

My suggested fix is now to do both: run LVM once early, and once after cryptsetup. Which gets us to this solution:

1. Setup LVM
2. While doing that: setup crypto when the devices pop up
3. After all crypto devices popped up, run LVM again.

Of course, we won't support setups with arbitrary stacks with this, but F14 did neither. In F14, crypto on LVM on crypto works, or any subset of this, in my proposal LVM on crypto on LVM works or any subset of it.

Anyway, reassigning to initscripts for now.

Bill, can we get /lib/systemd/system/fedora-storage-init-late.service as a copy of /lib/systemd/system/fedora-storage-init.service with the only difference that this new service is "After=cryptsetup.target"? cryptsetup.target is a target that is ordered after all crypto devices.

The proper fix is to get LVM and friends updated to actually deal with hotplug events properly. But given that they didn't get the memo in the last 5 years I kinda doubt they'll get it anytime soon, hence the double "vgchange -ay" is the price people have to pay for using LVM.

Bill, does that make sense to you?

Comment 25 Bill Nottingham 2011-04-06 19:49:03 UTC
Added as http://git.fedorahosted.org/git/?p=initscripts.git;a=commitdiff;h=8df4ee072641b1153852c8b8b778ef4a02edb8bd, will be in 9.29-1.

(Can't wait to deploy something like stc and shoot this all in F16.)

Comment 26 Fedora Update System 2011-04-06 20:32:00 UTC
initscripts-9.29-1.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/initscripts-9.29-1.fc15

Comment 27 Fedora Update System 2011-04-07 02:20:44 UTC
Package initscripts-9.29-1.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing initscripts-9.29-1.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/initscripts-9.29-1.fc15
then log in and leave karma (feedback).

Comment 28 Fedora Update System 2011-04-15 21:13:24 UTC
initscripts-9.29-1.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.