Bug 708684 - LVM VGs on encrypted and correctly unlocked PVs are not found
Summary: LVM VGs on encrypted and correctly unlocked PVs are not found
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 15
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Lennart Poettering
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 718137 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-28 20:20 UTC by an0nym
Modified: 2011-07-01 11:40 UTC (History)
16 users (show)

Fixed In Version: systemd-26-5.fc15
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-27 23:56:48 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Selected text from "messages" log. (4.45 KB, text/plain)
2011-06-07 00:38 UTC, eric
no flags Details
Comment 10 info (425.92 KB, application/x-gzip)
2011-06-13 17:16 UTC, Norman Smith
no flags Details
[PATCH] allow fedora-storage-init detect encrypted PVs (745 bytes, patch)
2011-06-17 15:15 UTC, Michal Schmidt
no flags Details | Diff

Description an0nym 2011-05-28 20:20:38 UTC
Description of problem:
Fedora fails to luksOpen the encrypted partition. 

Version-Release number of selected component (if applicable):
Installed from Fedora 15 x86_64 live CD.

How reproducible:
Always. 

Steps to Reproduce:
1. Install two HDDs. 
2. Partition them in such a manner: sda1 512 Mb, sda2 11 Gb, sdb1 1 Gb. 
3. Install Fedora 15 x86_64 from live CD. 
4. While installing, use such a FS configuration: 
4.1. format sda1 as ext4 and mount /boot there, 
4.2. format sda2 as PV(1) and encrypt it, 
4.3. format sdb1 as PV(2) and encrypt it,
4.4. create VG(3) on top of (1),
4.5. create VG(4) on top of (2),
4.6. create LV(5) 10 Gb on top of (3), format as ext4 and mount / there,
4.7. create LV(6) 1 Gb on top of (3), format as swap, 
4.8. create LV(7) 1 Gb on top of (4), format as ext4 and mount /home there.
5. Reboot. 
6. Provide the passphrase (the same for both encrypted partitions). 
7. Provide the passphrase again (the same for both encrypted partitions).
8. Wait a minute or two. 
9. You are introduced to emergency mode, because the second partition was not successfully opened and /home cannot be mounted. 
  
Actual results:
ls -al /dev/mapper shows luks-UUIDforSDA2 pointing to /dev/dm-0, luks-UUIDforSDB1 -> /dev/dm-3, (5) -> /dev/dm-1, (6) -> /dev/dm-2. 
ls -al /dev/dm* shows dm-0, dm-1, dm-2 only. 
cryptsetup luksOpen /dev/sdb1 luks-UUIDforSDB1 successfully creates /dev/dm-3, you exit emergency mode, Fedora retries to boot (without reboot) and invites you to emergency mode again. And again there is no /dev/dm-3. 

Expected results:
Successful boot. 

Additional info:

Comment 1 an0nym 2011-05-29 11:25:39 UTC
Workaround: 
1. do not mount /home (or anything else) to another partition/LV during installation; i. e. have one and only one encrypted partition;
2. boot installed Fedora 15 from HDD;
3. add other encrypted partition(s) to /etc/crypttab;
4. add mounts to /etc/fstab (in case of home you need to boot into single mode I think - I did so);
5. profit.

Comment 2 Sebastian Krämer 2011-06-02 18:52:42 UTC
Possibly something related to LVM?
I have encrypted home and it does get mounted if I enter the correct passphrase in time. (However I too get prompted for emergency mode when the password request times out..). I did an upgrade from F14 with the official F15 x86_64 installer DVD.

Comment 3 Michal Schmidt 2011-06-03 15:26:29 UTC
(In reply to comment #1)
> Workaround: 
> 1. do not mount /home (or anything else) to another partition/LV during
> installation; i. e. have one and only one encrypted partition;
> 2. boot installed Fedora 15 from HDD;
> 3. add other encrypted partition(s) to /etc/crypttab;
> 4. add mounts to /etc/fstab (in case of home you need to boot into single mode
> I think - I did so);

I don't understand why this works. How exactly is crypttab or fstab different when you let Anaconda create them?

Comment 4 an0nym 2011-06-04 00:50:58 UTC
I don't understand why as well. I didn't notice anything wrong in crypttab and fstab. Maybe first boot is somehow broken and everything works fine from second boot and so on. 
I have already reinstalled the system using provided workaround so I can't help with config dumps anymore.

Comment 5 eric 2011-06-07 00:35:10 UTC
I, too, am having a similar problem.  I created multiple encrypted partitions several Fedora versions ago and have been upgrading my computer ever since.  When I boot up F15 it mounts my other partitions but not /home.

I have occasionally gotten the "emergency mode" but usually the computer will boot all the way to the GUI login and let me login but none of my files will be there (obviously).

I've tried to umount and mount the partition but it won't mount.  Rebooting several times seems to finally mount the partition.

Comment 6 eric 2011-06-07 00:38:48 UTC
Created attachment 503353 [details]
Selected text from "messages" log.

This is selected text from my messages log.  The /home partition is luks-95cd5991-1879-4d48-9d51-dc972b67b6a7.

Comment 7 Sebastian Krämer 2011-06-07 08:29:06 UTC
Hm, if you can't mount the partition manually, how should a startup script be able to? The fact that you can't mount it altough it's appearently intact, is weird.
Having it work only sometimes might indicate a race condition. Maybe /home fails because root isn't mounted at that time? In any case, I was wondering about the race condition earlier. Sometimes I get prompted immediately for /home, even before "Welcome to F15..", other times I get prompted for SOMEDEVICE_NAME..luks385263.. which appears to be some kind of fallback.

Maybe it's the same problem with different symptoms.

Comment 8 an0nym 2011-06-07 08:39:39 UTC
(In reply to comment #7)
> Hm, if you can't mount the partition manually, how should a startup script be
> able to? The fact that you can't mount it altough it's appearently intact, is
> weird.
> Having it work only sometimes might indicate a race condition. Maybe /home
> fails because root isn't mounted at that time? In any case, I was wondering
> about the race condition earlier. Sometimes I get prompted immediately for
> /home, even before "Welcome to F15..", other times I get prompted for
> SOMEDEVICE_NAME..luks385263.. which appears to be some kind of fallback.
> 
> Maybe it's the same problem with different symptoms.
I'm quite sure the race condition of asking password for encrypted partitions in different order and count is somehow connected with this issue.

Comment 9 eric 2011-06-07 12:29:27 UTC
Okay, this is interesting.  I hadn't considered a race condition.

I noted that it appears earlier than most partitions in grub.conf.  Perhaps moving to the end would remedy this?

Comment 10 Norman Smith 2011-06-13 17:12:09 UTC
I wish this bug had been entered a couple of weeks ago.  I installed and re-installed several times until I found that the encrypted pv was not being opened.  I have attached some additional info for what its worth.

Comment 11 Norman Smith 2011-06-13 17:16:39 UTC
Created attachment 504502 [details]
Comment 10 info

Comment 12 Michal Schmidt 2011-06-13 18:14:56 UTC
(In reply to comment #11)
> Created attachment 504502 [details]
> Comment 10 info

Thanks.
The description deserves to be seen in full here, not just hidden in
the attachment:

==================
There is a problem with LVM, RAID, and Encryption when the
layout is not simple.  I have used layered layouts with multiple raid
devices, multiple physical LVM volumes and multiple logical LVM volumes
mixed with encrypted and unencrypted devices prior to F15.

I installed a minimum F15 on two disks that fails like my attempts at a
more complicated layout.

Test setup:

sata1	80 gig wd drive
sata2	80 gig wd drive
sata3 	dvd burner

1.  Booted F15 386 install DVD into rescue mode.

2.  Wiped partition tables from drives.

    dd if=/dev/zero of=/dev/sda bs=32768 count=1
    dd if=/dev/zero of=/dev/sdb bs=32768 count=1

    reboot

3.  Started F15 install.

4.  Which installation layout....

    Custom layout.

   LVM Volume Groups
    vg_test   32736
        ROOT  28760  /         ext4   format
        SWAP   4096            swap   format
    vg_test00 32736  
        HOME  16384  /home     ext4   format
        Free  16352
 
   RAID Devices
     md0       1022  /boot     ext4   format
     md1      32766  vg_test   phys vol(LVM) format
     md2      32766  vg_test00 phys vol(LVM) ENCRYPT

   Hard Drives
     sda
       sda1    1024  md0  software RAID  format
       sda2   32768  md1  software RAID  format
       sda3   32768  md2  software RAID  format
       Free    9732

     sdb
       sdb1    1024  md0  software RAID  format
       sdb2   32768  md1  software RAID  format
       sdb3   32768  md2  software RAID  format
       Free    9758

5.  At end of install rebooted.

    Welcome to emergency mode.

    I have uploaded photos of the screens as I poked around.
================

Comment 13 Michal Schmidt 2011-06-14 08:21:01 UTC
I am now able to reproduce this reliably.

The important thing that an0nym's and Norman's disk layouts have in common is that they have a filesystem not handled by the initramfs (such as /home) stored on a VG made of an encrypted PV.

(That they also both have two physical disks and more than one VG is not important for the bug.)

A simpler layout like this one is sufficient (tested in a virtual guest):

Physical disk vda:
  vda1 :  ext4, /
  vda2 :  encrypted PV
LVM group vg_encrypted consisting of the encrypted PV on vda2:
  lv_home : ext4, /home

On reboot after installation it manages to unlock the PV, but it fails to find the VG on it. After the timeout, it goes to emergency mode (closing the LUKS mapping in the process).

If you just reboot, the result will be the same. If instead you do:
 cryptsetup luksOpen /dev/vda2 lalala
 vgscan
 reboot
... you will have fixed it, the system will correctly find the VG on boot from now on.

The explanation lies in /etc/lvm/cache/.cache:
On the first boot, the .cache file does not exist. When fedora-storage-init.service runs, it runs "vgchange -a y --sysinit" which does a full scan automatically and builds the cache.
Then the encrypted PV is unlocked.
Then fedora-storage-init-late.service is run and it runs "vgchange -a y --sysinit" again. The assumption is that it should find the newly available PV.

But this is not the case. The seconds invocation of vgchange find the .cache is already present, so it does not run the full scan over the block devices and so it is not aware of the new PV.

A quick fix (but a bit expensive) is to add a call to "vgscan" before "vgchange" in /lib/systemd/fedora-storage-init

Bill, any other ideas? (except for "let's switch to stcd already!")

Comment 14 Milan Broz 2011-06-17 13:41:32 UTC
First, the problem is properly solved in upstream lvm2 where it uses udev to get list of block devices and obsoletes lvm2 .cache completely.
(And future is incremental assembly.)

But we cannot update lvm2 now in F15 (there is still not stable release out).

So let's find some temporary workaround:

The problem is that "vgchange -a y" (without explicit VG specification) trusts lvm2 cache and does not scan new devices. (If you run "vgchange -ay VG" and PVs for VG are not in lvm2 cache, full scan in the end is triggered.)

So back to problem:

- there is existing lvm2 .cache
- new cryptsetup device (with PV on top) appears
- vgchange -a y is run, new device is not yet in cache -> it will not activate newly appeared  VG

1) One bruteforce method is to add vgscan. This will add another deep and complete scan of all block devices and everyone will complain;-)

2) use trick to update lvm2 cache just for new device.

When new cryptsetup device appears, run "pvs -o pv_uuid". It _should_ update lvm2 cache without full scan. Note I am reporting _only_ "pv uuid label" field to avoid VG metadata processing.

Note, that if _more_ devices appears this way, maybe vgscan is better solution (it will update lvm2 cache in one run.)

Can we try to add "pvs -o pv_uuid <DEVICE> >/dev/null 2>&1" somewhere and try if it helps?

Comment 15 Bill Nottingham 2011-06-17 14:50:46 UTC
If I'm understanding this right, the pvs command would have to be done in systemd itself in its cryptsetup handling, correct?

Comment 16 Michal Schmidt 2011-06-17 14:59:09 UTC
We can add it as a second ExecStart after ExecStart=/lib/systemd/systemd-cryptsetup... in the generated cryptsetup@... units.

Comment 17 Milan Broz 2011-06-17 15:09:03 UTC
Are you sure that every system has lvm2 tools installed (probably yes for Fedora)?
And also if there is no PV on it, it fails (so it need to ignore return code).

It is really meant as workaround until new lvm2 code is out.

Comment 18 Michal Schmidt 2011-06-17 15:15:59 UTC
Created attachment 505293 [details]
[PATCH] allow fedora-storage-init detect encrypted PVs

Yes, I am aware of the need to ignore errors from pvs.
I prefixed the command with '-'.

Comment 19 Fedora Update System 2011-06-20 08:32:35 UTC
systemd-26-5.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/systemd-26-5.fc15

Comment 20 Fedora Update System 2011-06-21 17:44:25 UTC
Package systemd-26-5.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-26-5.fc15'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/systemd-26-5.fc15
then log in and leave karma (feedback).

Comment 21 Chad Feller 2011-06-21 19:58:25 UTC
I'm not sure if this belongs here or in a separate bug report.  I'm having a similar issue as the reporter.  I have a machine that has had encrypted partitions since Fedora 11.  I've upgraded through each Fedora release w/out issue until now.

Since the upgrade to Fedora 15 my partitions (other than /) do not unlock during the boot process which causes me to be dropped to an emergency shell during boot.  Only my root partition is unlocked and mounted, so I have to manually go through each partition and unlock and then mount them.  My workaround has been to drop a shell script into the root partition to go through and unlock and mount my other partitions.  It is a bit hackish, but works.

After I get the partitions unlocked and mounted and the system up and running, I see a lot of this in dmesg:

[  156.156548] systemd[1]: Job dev-mapper-luks\x2d3815b69e\x2d94f3\x2d4343\x2db507\x2df520246b7fdf.device/start timed out.
[  156.156564] systemd[1]: Job cryptsetup.target/start failed with result 'dependency'.
[  156.156576] systemd[1]: Job cryptsetup@luks\x2d3815b69e\x2d94f3\x2d4343\x2db507\x2df520246b7fdf.service/start failed with result 'dependency'.
[  156.156587] systemd[1]: Job fedora-autorelabel-mark.service/start failed with result 'dependency'.
[  156.156598] systemd[1]: Job fedora-autorelabel.service/start failed with result 'dependency'.
[  156.156609] systemd[1]: Job local-fs.target/start failed with result 'dependency'.
[  156.156619] systemd[1]: Triggering OnFailure= dependencies of local-fs.target.
[  156.156735] systemd[1]: Job usr-local.mount/start failed with result 'dependency'.
[  156.156772] systemd[1]: Job dev-mapper-luks\x2d3815b69e\x2d94f3\x2d4343\x2db507\x2df520246b7fdf.device/start failed with result 'timeout'.


The difference between the reporter and I is that I'm not using LVM.

I'm wondering if somehow systemd is waiting for more passwords but isn't getting them?  The passphrase should be global (but I don't know where that is set and to check for that), so traditionally I only enter one passphrase and all partitions should be unlocked with it.  However, what happens now is that I enter my passphrase, and root is unlocked and then it just sits there until the boot process times out and I'm dropped to an emergency shell.

I just updated to systemd-26-5.fc15, but the problem remains.

Let me know if you need any additional information.

Comment 22 Michal Schmidt 2011-06-22 06:57:06 UTC
Chad,
please file a new bug and attach your /etc/fstab and /etc/crypttab. Thanks.

Comment 23 Chad Feller 2011-06-23 21:09:16 UTC
Michael,
  Done: Bug #716291
Thanks.

Comment 24 Fedora Update System 2011-06-27 23:56:41 UTC
systemd-26-5.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 25 Michal Schmidt 2011-07-01 11:40:31 UTC
*** Bug 718137 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.