Bug 741655 - RFE: provide a way to unlock encrypted logical volume inside encrypted volume during boot
Summary: RFE: provide a way to unlock encrypted logical volume inside encrypted volume...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: lvm2
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: LVM and device-mapper development team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F16-accepted, F16FinalFreezeExcept
TreeView+ depends on / blocked
 
Reported: 2011-09-27 14:16 UTC by Kamil Páral
Modified: 2013-12-09 13:35 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-12-09 13:35:22 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
boot messages with systemd debug (29.42 KB, text/plain)
2011-09-27 14:18 UTC, Kamil Páral
no flags Details
rpm -qa output (5.73 KB, text/plain)
2011-09-27 14:18 UTC, Kamil Páral
no flags Details
anaconda-ks.cfg (1.18 KB, text/plain)
2011-09-27 14:46 UTC, Kamil Páral
no flags Details
install.log (10.13 KB, text/plain)
2011-09-27 14:46 UTC, Kamil Páral
no flags Details
install.log.syslog (4.87 KB, text/plain)
2011-09-27 14:47 UTC, Kamil Páral
no flags Details
logs from /var/log (1.04 MB, application/x-tar)
2011-09-27 14:47 UTC, Kamil Páral
no flags Details

Description Kamil Páral 2011-09-27 14:16:06 UTC
Description of problem:
I have installed system with following setup:

vda1 GTP partition
vda2 /boot ext4 partition
vda3 vg_main (unencrypted)
vda4 vg_enc (encrypted)

vg_main:
lv_root /root ext4 partition
lv_swap swap partition

vg_enc:
lv_opt /opt ext4 partition (encrypted)

As you can see, I have encrypted partition lv_opt inside encrypted volume group vg_enc. Both share the same password.

Anaconda installs with this setup just fine. Also anaconda rescue mode can mount all partitions just fine. But I can't boot the system with it.

When I try to boot the system, I am asked for password, and then the system freezes. See attached debug log from systemd.

Version-Release number of selected component (if applicable):
anaconda 16.19
Fedora 16 Beta RC3 i686 DVD
KVM machine

How reproducible:
always

Steps to Reproduce:
1. install the system using the mentioned layout
2. try to boot
3.
  
Additional info:
I also tried to have encrypted lv_root inside encrypted vg_main. That works fine. The problem is only with additional non-root partition and volume group.

Comment 1 Kamil Páral 2011-09-27 14:18:14 UTC
Created attachment 525139 [details]
boot messages with systemd debug

The important part of the log:

Please enter passphrase for disk luks-e02c3e63-ddf8-4b83-84a3-6068aeb790a6!:********
[   40.132807] systemd-cryptsetup[517]: Set cipher aes, mode xts-plain64, key size 512 bits for device /dev/disk/by-uuid/e02c3e63-ddf8-4b83-84a3-6068aeb790a6.
Started Cryptography Setup for luks-e02c3e63-ddf8-4b83-84a3-6068aeb790a6.
[   96.742794] systemd[1]: Job dev-mapper-luks\x2dccdd0f39\x2d4795\x2d4586\x2da501\x2d4fd359df170b.device/start timed out.
Starting Cryptography Setup for luks-ccdd0f39-4795-4586-a501-4fd359df170b aborted because a dependency failed.
Starting /opt aborted because a dependency failed.

Comment 2 Kamil Páral 2011-09-27 14:18:33 UTC
Created attachment 525140 [details]
rpm -qa output

Comment 3 Kamil Páral 2011-09-27 14:22:04 UTC
I am not sure this is a cryptsetup bug. It just fails during boot, but works during anaconda rescue. So cryptsetup, systemd, dracut, ...?

This layout is very untypical, but still the system doesn't boot, proposing as a Beta/Final blocker.

Comment 4 Kamil Páral 2011-09-27 14:46:24 UTC
Now attaching logs from the installed system (accessed via anaconda rescue mode): anaconda logs from /root and all logs from /var/log.

Comment 5 Kamil Páral 2011-09-27 14:46:45 UTC
Created attachment 525149 [details]
anaconda-ks.cfg

Comment 6 Kamil Páral 2011-09-27 14:46:51 UTC
Created attachment 525150 [details]
install.log

Comment 7 Kamil Páral 2011-09-27 14:47:02 UTC
Created attachment 525151 [details]
install.log.syslog

Comment 8 Kamil Páral 2011-09-27 14:47:59 UTC
Created attachment 525152 [details]
logs from /var/log

Comment 9 Michal Schmidt 2011-09-27 15:18:20 UTC
It reminds me of bug 708684. We fixed it in F15 with a patch that was supposed to be temporary, because a proper fix was expected in lvm soon.
We need to check if this is indeed the same problem and if we have forward-port the fix from F15.

Comment 10 Adam Williamson 2011-09-27 19:16:20 UTC
Kamil: do you have to use custom partitioning to achieve this layout? If so it's probably a Final blocker: we pretty much consider all 'custom partitioning' path bugs to block Final, not Alpha or Beta.

Comment 11 Michal Schmidt 2011-09-27 23:46:53 UTC
(In reply to comment #9)
> It reminds me of bug 708684.

The bug is different and the patch from F15 won't help.

In fact this configuration (encrypted LV in encrypted PV) fails in the same way with Fedora 15. I don't think this is blocker material. The configuration is hardly practical.

What happens on boot in such a configuration is this (simplified):
1. fedora-storage-init.service calls "vgchange -a y", thus discovers the LVs
   on vg_main. It cannot see vg_enc yet because vda4 has not been unlocked.
2. cryptsetup@luks-$UUID_of_vda4.service unlocks vda4.
3. cryptsetup@luks-$UUID_of_opt.service is waiting for its device (lv_opt)
   to appear in order to unlock it.
4. cryptsetup.target waits on all cryptsetup@.service instances.
5. fedora-storage-init-late.service waits on cryptsetup.target.
   But f-s-i-late.service is the thing that is supposed to discover lv_opt.

There is a deadlock: 5 -> 4 -> 3 -> 5.
We wouldn't have this problem if LVM were able to assemble itself incrementally as devices appear.

As a workaround we could simulate incremental assembly by calling "vgchange -a y" every time we unlock an encrypted PV.

Comment 12 Lennart Poettering 2011-09-28 02:19:45 UTC
I don't think this is really blocker material since having lvm-on-crypto-on-lvm is nothing we really supported ever to my knowledge.

Also, I am tempted to just reassign all these bugs to LVM since the LVM folks really need to get their stuff in order and watch devices come and go instead of this broken scan logic.

Comment 13 Kay Sievers 2011-09-28 09:11:28 UTC
Reassigning to LVM. Systemd has no knowledge about LVM or RAID in
general, it handles only crypt devices. If this is a supported setup,
the LVM tools need to provide the tools to handle device assembly
during bootup.

Comment 14 Michal Schmidt 2011-09-28 15:29:14 UTC
(In reply to comment #12)
> I don't think this is really blocker material since having lvm-on-crypto-on-lvm
> is nothing we really supported ever to my knowledge.

It's not lvm-on-crypto-on-lvm, it's crypto-on-lvm-on-crypto. It worked with F14's rc.sysinit. Still, it's an unreasonable configuration.

Comment 15 Robyn Bergeron 2011-09-28 16:32:25 UTC
-1, looks more like it would hit final to me with this configuration

Comment 16 Adam Williamson 2011-09-28 16:41:04 UTC
we also have a -1 from southern_gentleman in IRC, so five -1s on this. Rejected as a blocker. I'll propose as a final blocker, per the fairly strict "The installer must be able to create and install to any workable partition layout using any file system offered in a default installer configuration, LVM, software, hardware or BIOS RAID, or combination of the above" criterion.

Comment 17 Milan Broz 2011-09-28 16:50:05 UTC
This is all about policy daemon which will activate LV acccording to some system policy. In the meantime Fedora can activate everything but it is not final solution.

Even it is not lvm problem - you can probably simulate similar situation with raid (well, currently all I think all MD devices are activated, so simiar workaround here.)

(btw bug #708684 should no longer exist in rawhide, lvm should take list of devices from udev so any cache update workarounds should be obsolete.)

Comment 18 Michal Schmidt 2011-09-28 19:13:50 UTC
(In reply to comment #17)
> This is all about policy daemon which will activate LV acccording to some
> system policy.

No such daemon exists in Fedora 16, does it?

> In the meantime Fedora can activate everything but it is not final solution.

If this bug will be considered a final release blocker and if the policy daemon is not expected to be ready soon, we have to find a temporary workaround.
Some workarounds that come to mind:
 - forbid the creation of the crypto-on-lvm-on-crypto layout in anaconda
 - call "vgchange -a y" from cryptsetup@...service after a PV gets unlocked

> Even it is not lvm problem - you can probably simulate similar situation with
> raid (well, currently all I think all MD devices are activated, so simiar
> workaround here.)

I don't understand. I have not tested it, but I'd expect crypto-on-md-on-crypto to work fine, because the md array should be assembled from md's udev rules as soon as its component devices are unlocked (I see some "mdadm -I ..." calls there.)

> (btw bug #708684 should no longer exist in rawhide, lvm should take list of
> devices from udev so any cache update workarounds should be obsolete.)

I can confirm that. We do not have the workaround for it in F16 and bug 708684 does not manifest.

Comment 19 Milan Broz 2011-09-28 20:20:33 UTC
> No such daemon exists in Fedora 16, does it?

No. Neither in F14 - where this worked. So systemd just depends on either activating everything immediately when it appears or on not yet existing "assembly" daemon.
(See that hack /lib/systemd/fedora-storage-init which was added there - this activates everything.)

>  - forbid the creation of the crypto-on-lvm-on-crypto layout in anaconda

No, please do not limit possibilities here.

>  - call "vgchange -a y" from cryptsetup@...service after a PV gets unlocked

Perhaps something like this but fine tuned.
Activate just VG which is on that PV can look like this:

blkid -t TYPE=LVM2_member $device >/dev/null && vgchange -a y $(pvs --noheadings -o vg_name $device)

(blkid can be replaced by udev attribute query. Just do not waste time by running lvm when it is not lvm device and do not call plain "vgchange -a y" which can be expensive and activate more than user want after plugging in LUKS device.)

(It is incredibly ugly but should do its job. We should not expect partial vg here, but even for partial it will work - just it will print some warnings and activate it with the last part of VG which appears.)

> I don't understand. I have not tested it, but I'd expect crypto-on-md-on-crypto
> to work fine, because the md array should be assembled from md's udev rules as
> soon as its component devices are unlocked (I see some "mdadm -I ..." calls
> there.)

Because mdadm now activates everything and it is simply wrong as well.
It should activate only devices which administrator set this way, not everything (back to assembly policy/daemon though ;-)

> > (btw bug #708684 should no longer exist in rawhide, lvm should take list of
> > devices from udev so any cache update workarounds should be obsolete.)
> 
> I can confirm that. We do not have the workaround for it in F16 and bug 708684
> does not manifest.

ok. so I would like to ask udev/systemd gurus to not repeat the song that lvm do some scans of /dev. If it need device list, it reads active block devices from libudev. No vgscans should be needed. The /etc/lvm/.cache is now obsolete and it is not used anymore when in udev mode.

Comment 20 Michal Schmidt 2011-09-28 21:58:14 UTC
(In reply to comment #19)
> > No such daemon exists in Fedora 16, does it?
> 
> No. Neither in F14 - where this worked.

Right. In F14 this worked using the sequence in rc.sysinit:
 - start udev, wait for udev settle, wait for scsi_wait_scan
 - unlock the crypto devices found so far (attempt #1)
 - "vgchange -a y --sysinit"  # i.e. LVM activate everything
 - unlock the crypto devices found so far (attempt #2)
(later there's actually attempt #3, but it's irrelevant for this discussion)

In Fedora>=15 the crypto devices are unlocked as they appear.
"vgchange -a y --sysinit" is run twice:
 - firstly, right after udev settle and scsi_wait_scan
   fedora-storage-init.service - it activates the VGs on non-encrypted PVs.
 - secondly, after all configured crypto devices have been unlocked or
   timed out.
   fedora-storate-init-late.service - activates the VGs on encrypted PVs.

> [...]
> (See that hack /lib/systemd/fedora-storage-init which was added there - this
> activates everything.)

F14 also activates everything. fedora-storage-init is almost an exact copy of the storage part of F14's rc.sysinit. The difference in F15 is not about what the script, but _when_ it does it with respect to other boot tasks (such as cryptsetup).

> Activate just VG which is on that PV can look like this:
> 
> blkid -t TYPE=LVM2_member $device >/dev/null && vgchange -a y $(pvs
> --noheadings -o vg_name $device)
> 
> [...more advice...]

Great suggestions, thanks. I'll try something like that for F16.
(It may even make fedora-storage-init-late.service redundant, but there may be another reason for its continued existence.)

Comment 21 Michal Schmidt 2011-09-28 22:00:42 UTC
(In reply to comment #20)
> The difference in F15 is not about what the script, but ...

Should be:
The difference in F15 is not about what the script does, but ...

Comment 22 Adam Williamson 2011-09-30 19:26:55 UTC
Discussed at 2011-09-30 blocker review meeting. Agreed this is not a blocker: the relevant criterion would be "The installer must be able to create and install to any workable partition layout using any file system offered in a default installer configuration, LVM, software, hardware or BIOS RAID, or combination of the above", but that criterion does not specify encryption, and we agreed it's probably correct not to require any possible encryption layout to work, as that's a pretty complex target and involves systemd, plymouth and other components as well as anaconda.

The only requirement at present for encryption is for the checkbox on the partition method selection screen to work, not for complex manually-defined encryption setups to work.

Accepted as NTH as it would be worthwhile to fix this, and it obviously can't be done with an update.

Comment 23 Fedora End Of Life 2013-01-16 15:05:27 UTC
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 24 Fedora End Of Life 2013-02-13 16:27:00 UTC
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 25 Kamil Páral 2013-02-14 07:42:44 UTC
Fedora 18's anaconda doesn't support such complicated setups, so it's not trivial to re-test this easily. But if I understand the discussion correctly, the problem in question is well discovered and it's a design flaw in how we currently unlock partitions during boot. It doesn't seem that anybody fixed it yet. Reassigning to Rawhide.

Comment 26 Fedora End Of Life 2013-04-03 19:32:03 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 27 Ondrej Kozina 2013-12-09 13:35:22 UTC
Well, actually this is already fixed in Fedora 19 where lvmetad daemon is enabled by default.

The only additional requirement apart from having lvmetad enabled and active for such setup to work is adding a valid /etc/crypttab record for every involved LUKS device. For every device in crypttab, systemd will initiate the activation service as soon as the device appears in system.

(I have mistaken this bug for another one and I have wrongly marked this one with RFE/FutureFeature tags)


Note You need to log in before you can comment on or make changes to this bug.