Red Hat Bugzilla – Bug 741655
RFE: provide a way to unlock encrypted logical volume inside encrypted volume during boot
Last modified: 2013-12-09 08:35:50 EST
Description of problem:
I have installed system with following setup:
vda1 GTP partition
vda2 /boot ext4 partition
vda3 vg_main (unencrypted)
vda4 vg_enc (encrypted)
lv_root /root ext4 partition
lv_swap swap partition
lv_opt /opt ext4 partition (encrypted)
As you can see, I have encrypted partition lv_opt inside encrypted volume group vg_enc. Both share the same password.
Anaconda installs with this setup just fine. Also anaconda rescue mode can mount all partitions just fine. But I can't boot the system with it.
When I try to boot the system, I am asked for password, and then the system freezes. See attached debug log from systemd.
Version-Release number of selected component (if applicable):
Fedora 16 Beta RC3 i686 DVD
Steps to Reproduce:
1. install the system using the mentioned layout
2. try to boot
I also tried to have encrypted lv_root inside encrypted vg_main. That works fine. The problem is only with additional non-root partition and volume group.
Created attachment 525139 [details]
boot messages with systemd debug
The important part of the log:
Please enter passphrase for disk luks-e02c3e63-ddf8-4b83-84a3-6068aeb790a6!:********
[ 40.132807] systemd-cryptsetup: Set cipher aes, mode xts-plain64, key size 512 bits for device /dev/disk/by-uuid/e02c3e63-ddf8-4b83-84a3-6068aeb790a6.
Started Cryptography Setup for luks-e02c3e63-ddf8-4b83-84a3-6068aeb790a6.
[ 96.742794] systemd: Job dev-mapper-luks\x2dccdd0f39\x2d4795\x2d4586\x2da501\x2d4fd359df170b.device/start timed out.
Starting Cryptography Setup for luks-ccdd0f39-4795-4586-a501-4fd359df170b aborted because a dependency failed.
Starting /opt aborted because a dependency failed.
Created attachment 525140 [details]
rpm -qa output
I am not sure this is a cryptsetup bug. It just fails during boot, but works during anaconda rescue. So cryptsetup, systemd, dracut, ...?
This layout is very untypical, but still the system doesn't boot, proposing as a Beta/Final blocker.
Now attaching logs from the installed system (accessed via anaconda rescue mode): anaconda logs from /root and all logs from /var/log.
Created attachment 525149 [details]
Created attachment 525150 [details]
Created attachment 525151 [details]
Created attachment 525152 [details]
logs from /var/log
It reminds me of bug 708684. We fixed it in F15 with a patch that was supposed to be temporary, because a proper fix was expected in lvm soon.
We need to check if this is indeed the same problem and if we have forward-port the fix from F15.
Kamil: do you have to use custom partitioning to achieve this layout? If so it's probably a Final blocker: we pretty much consider all 'custom partitioning' path bugs to block Final, not Alpha or Beta.
(In reply to comment #9)
> It reminds me of bug 708684.
The bug is different and the patch from F15 won't help.
In fact this configuration (encrypted LV in encrypted PV) fails in the same way with Fedora 15. I don't think this is blocker material. The configuration is hardly practical.
What happens on boot in such a configuration is this (simplified):
1. fedora-storage-init.service calls "vgchange -a y", thus discovers the LVs
on vg_main. It cannot see vg_enc yet because vda4 has not been unlocked.
2. cryptsetup@luks-$UUID_of_vda4.service unlocks vda4.
3. cryptsetup@luks-$UUID_of_opt.service is waiting for its device (lv_opt)
to appear in order to unlock it.
4. cryptsetup.target waits on all cryptsetup@.service instances.
5. fedora-storage-init-late.service waits on cryptsetup.target.
But f-s-i-late.service is the thing that is supposed to discover lv_opt.
There is a deadlock: 5 -> 4 -> 3 -> 5.
We wouldn't have this problem if LVM were able to assemble itself incrementally as devices appear.
As a workaround we could simulate incremental assembly by calling "vgchange -a y" every time we unlock an encrypted PV.
I don't think this is really blocker material since having lvm-on-crypto-on-lvm is nothing we really supported ever to my knowledge.
Also, I am tempted to just reassign all these bugs to LVM since the LVM folks really need to get their stuff in order and watch devices come and go instead of this broken scan logic.
Reassigning to LVM. Systemd has no knowledge about LVM or RAID in
general, it handles only crypt devices. If this is a supported setup,
the LVM tools need to provide the tools to handle device assembly
(In reply to comment #12)
> I don't think this is really blocker material since having lvm-on-crypto-on-lvm
> is nothing we really supported ever to my knowledge.
It's not lvm-on-crypto-on-lvm, it's crypto-on-lvm-on-crypto. It worked with F14's rc.sysinit. Still, it's an unreasonable configuration.
-1, looks more like it would hit final to me with this configuration
we also have a -1 from southern_gentleman in IRC, so five -1s on this. Rejected as a blocker. I'll propose as a final blocker, per the fairly strict "The installer must be able to create and install to any workable partition layout using any file system offered in a default installer configuration, LVM, software, hardware or BIOS RAID, or combination of the above" criterion.
This is all about policy daemon which will activate LV acccording to some system policy. In the meantime Fedora can activate everything but it is not final solution.
Even it is not lvm problem - you can probably simulate similar situation with raid (well, currently all I think all MD devices are activated, so simiar workaround here.)
(btw bug #708684 should no longer exist in rawhide, lvm should take list of devices from udev so any cache update workarounds should be obsolete.)
(In reply to comment #17)
> This is all about policy daemon which will activate LV acccording to some
> system policy.
No such daemon exists in Fedora 16, does it?
> In the meantime Fedora can activate everything but it is not final solution.
If this bug will be considered a final release blocker and if the policy daemon is not expected to be ready soon, we have to find a temporary workaround.
Some workarounds that come to mind:
- forbid the creation of the crypto-on-lvm-on-crypto layout in anaconda
- call "vgchange -a y" from cryptsetup@...service after a PV gets unlocked
> Even it is not lvm problem - you can probably simulate similar situation with
> raid (well, currently all I think all MD devices are activated, so simiar
> workaround here.)
I don't understand. I have not tested it, but I'd expect crypto-on-md-on-crypto to work fine, because the md array should be assembled from md's udev rules as soon as its component devices are unlocked (I see some "mdadm -I ..." calls there.)
> (btw bug #708684 should no longer exist in rawhide, lvm should take list of
> devices from udev so any cache update workarounds should be obsolete.)
I can confirm that. We do not have the workaround for it in F16 and bug 708684 does not manifest.
> No such daemon exists in Fedora 16, does it?
No. Neither in F14 - where this worked. So systemd just depends on either activating everything immediately when it appears or on not yet existing "assembly" daemon.
(See that hack /lib/systemd/fedora-storage-init which was added there - this activates everything.)
> - forbid the creation of the crypto-on-lvm-on-crypto layout in anaconda
No, please do not limit possibilities here.
> - call "vgchange -a y" from cryptsetup@...service after a PV gets unlocked
Perhaps something like this but fine tuned.
Activate just VG which is on that PV can look like this:
blkid -t TYPE=LVM2_member $device >/dev/null && vgchange -a y $(pvs --noheadings -o vg_name $device)
(blkid can be replaced by udev attribute query. Just do not waste time by running lvm when it is not lvm device and do not call plain "vgchange -a y" which can be expensive and activate more than user want after plugging in LUKS device.)
(It is incredibly ugly but should do its job. We should not expect partial vg here, but even for partial it will work - just it will print some warnings and activate it with the last part of VG which appears.)
> I don't understand. I have not tested it, but I'd expect crypto-on-md-on-crypto
> to work fine, because the md array should be assembled from md's udev rules as
> soon as its component devices are unlocked (I see some "mdadm -I ..." calls
Because mdadm now activates everything and it is simply wrong as well.
It should activate only devices which administrator set this way, not everything (back to assembly policy/daemon though ;-)
> > (btw bug #708684 should no longer exist in rawhide, lvm should take list of
> > devices from udev so any cache update workarounds should be obsolete.)
> I can confirm that. We do not have the workaround for it in F16 and bug 708684
> does not manifest.
ok. so I would like to ask udev/systemd gurus to not repeat the song that lvm do some scans of /dev. If it need device list, it reads active block devices from libudev. No vgscans should be needed. The /etc/lvm/.cache is now obsolete and it is not used anymore when in udev mode.
(In reply to comment #19)
> > No such daemon exists in Fedora 16, does it?
> No. Neither in F14 - where this worked.
Right. In F14 this worked using the sequence in rc.sysinit:
- start udev, wait for udev settle, wait for scsi_wait_scan
- unlock the crypto devices found so far (attempt #1)
- "vgchange -a y --sysinit" # i.e. LVM activate everything
- unlock the crypto devices found so far (attempt #2)
(later there's actually attempt #3, but it's irrelevant for this discussion)
In Fedora>=15 the crypto devices are unlocked as they appear.
"vgchange -a y --sysinit" is run twice:
- firstly, right after udev settle and scsi_wait_scan
fedora-storage-init.service - it activates the VGs on non-encrypted PVs.
- secondly, after all configured crypto devices have been unlocked or
fedora-storate-init-late.service - activates the VGs on encrypted PVs.
> (See that hack /lib/systemd/fedora-storage-init which was added there - this
> activates everything.)
F14 also activates everything. fedora-storage-init is almost an exact copy of the storage part of F14's rc.sysinit. The difference in F15 is not about what the script, but _when_ it does it with respect to other boot tasks (such as cryptsetup).
> Activate just VG which is on that PV can look like this:
> blkid -t TYPE=LVM2_member $device >/dev/null && vgchange -a y $(pvs
> --noheadings -o vg_name $device)
> [...more advice...]
Great suggestions, thanks. I'll try something like that for F16.
(It may even make fedora-storage-init-late.service redundant, but there may be another reason for its continued existence.)
(In reply to comment #20)
> The difference in F15 is not about what the script, but ...
The difference in F15 is not about what the script does, but ...
Discussed at 2011-09-30 blocker review meeting. Agreed this is not a blocker: the relevant criterion would be "The installer must be able to create and install to any workable partition layout using any file system offered in a default installer configuration, LVM, software, hardware or BIOS RAID, or combination of the above", but that criterion does not specify encryption, and we agreed it's probably correct not to require any possible encryption layout to work, as that's a pretty complex target and involves systemd, plymouth and other components as well as anaconda.
The only requirement at present for encryption is for the checkbox on the partition method selection screen to work, not for complex manually-defined encryption setups to work.
Accepted as NTH as it would be worthwhile to fix this, and it obviously can't be done with an update.
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '16'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 16's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 16 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged to click on
"Clone This Bug" and open it against that version of Fedora.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.
Fedora 18's anaconda doesn't support such complicated setups, so it's not trivial to re-test this easily. But if I understand the discussion correctly, the problem in question is well discovered and it's a design flaw in how we currently unlock partitions during boot. It doesn't seem that anybody fixed it yet. Reassigning to Rawhide.
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.
(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)
More information and reason for this action is here:
Well, actually this is already fixed in Fedora 19 where lvmetad daemon is enabled by default.
The only additional requirement apart from having lvmetad enabled and active for such setup to work is adding a valid /etc/crypttab record for every involved LUKS device. For every device in crypttab, systemd will initiate the activation service as soon as the device appears in system.
(I have mistaken this bug for another one and I have wrongly marked this one with RFE/FutureFeature tags)