Bug 695407

Summary: Must not continue boot until pass-phrase has been entered
Product: [Fedora] Fedora Reporter: David Zeuthen <davidz>
Component: systemdAssignee: Lennart Poettering <lpoetter>
Status: CLOSED CANTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 15CC: awilliam, bugs.michael, dmach, johannbg, lpoetter, mcepl, mcepl, mclasen, metherid, mschmidt, notting, nphilipp, plautrba, tilmann, vondruch
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: RejectedBlocker
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-10 20:36:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
/var/log/messages none

Description David Zeuthen 2011-04-11 16:13:07 UTC
Currently the system continues booting even when the passphrase to unlock+mount /home has not been entered.

Details:
 <davidz> mezcalero: so just booted my laptop then went to get coffee... then
   I see the plymouth "ask-for-password" thing (since /home is encrypted) and
   on top of that a GTK+ dialog complaining something about ICE ....
   (I have autologin enabled)

Comment 1 Adam Williamson 2011-04-15 20:15:28 UTC
Discussed at 2011-04-15 blocker review meeting. As we understand this, there's a timeout after which the system will try and continue to boot without unencrypting the partition, on the basis that it may be a non-critical partition. We don't think this hits any blocker criteria, though improved behaviour may be possible: don't time out if the partition is a 'critical' one (like /home)...though even then, home isn't really *critical* necessarily.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 2 Lennart Poettering 2011-04-19 19:23:24 UTC
*** Bug 696896 has been marked as a duplicate of this bug. ***

Comment 3 Lennart Poettering 2011-04-28 03:24:21 UTC
in systemd 25 we have everything in place to mark encrypted partitions for non-timed out password queries. This has to be configured at multiple places.

a) To ensure that we ask for a password indefinitely, the encrypted device should be listed in /etc/crypttab with an option of timeout=0.

b) To ensure that we wait for a device to show up indefinitely, the encrypted file system should be listed in /etc/fstab with an option of comment=systemd.device-timeout=0

If both are used in conjunction then we will ask for the password and wait for the device to show up forever. This should be what you have been asking for. However, note that this has drawbacks: if a device never shows up because the hw is borked the boot-up won't time out anymore since we cannot distinguish the case "decrypted device never shows up because user doesn't input his password" from "decrypted device never shows up because HDD is borked".

Why does this timeout need to be configured twice and isn't configurable in a single place only? Well, the reason is simply that in the general case this is impossible. Consider /dev/sda5 being a LUKS disk, on which a file system with the label of "home" is located. We want to mount that to /home, hence we place a line starting with "LABEL=home" in /etc/fstab. But how should systemd now be able to relate this line in fstab with the encrypted device if the label itself is stored encrypted on disk and hence not available before the user entered his password and the disk was decrypted? Thus systemd is unable to apply the timeout option from fstab to the password query and vice versa unable to pass the option from crypttab to the wait-for-device waiting.

Anyway, closing this now. If you want these options to be set by default for crypto disks, file a bug against anaconda. Anaconda is the only place which really knows the connection between the fstab and the crypttab line, and hence can configure both timeouts properly.

Comment 4 Vít Ondruch 2011-04-28 05:51:40 UTC
About what version of systemd are you speaking? The latest systemd does not work as you described.

$ rpm -q systemd
systemd-25-1.fc15.x86_64

Comment 5 Michal Schmidt 2011-04-28 09:37:58 UTC
(In reply to comment #4)
> The latest systemd does not work as you described.

What differences from the described behaviour are you seeing?
What do you have in /etc/fstab and /etc/crypttab?

Comment 6 Vít Ondruch 2011-04-28 09:49:57 UTC
Created attachment 495468 [details]
/var/log/messages

The password prompt is displayed, but approximately after 150s the boot simply continues to GDM.


$ cat /etc/crypttab 
luks-59f45798-8b73-4f8d-8ec9-e5ed252dbd47 UUID=59f45798-8b73-4f8d-8ec9-e5ed252dbd47 none timeout=0
$ cat /etc/fstab 

#
# /etc/fstab
# Created by anaconda on Fri Mar 11 14:51:15 2011
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/vg_dhcp251-lv_root_f15 /                       btrfs   defaults        1 1
UUID=dd4ad330-a3b4-4a94-9bcd-7f30ac5eddeb /boot                   ext4    defaults        1 2
/dev/mapper/luks-59f45798-8b73-4f8d-8ec9-e5ed252dbd47 /home                   btrfs   comment=systemd.device-timeout=0 1 2
/dev/mapper/vg_dhcp251-lv_swap_f15 swap                    swap    defaults        0 0
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0

filer-eng.brq.redhat.com:/vol/mirrors/engineering  /mnt/mirror        nfs ro  0 0
filer-eng.brq.redhat.com:/vol/engineering/share    /mnt/archive       nfs ro  0 0
nfs.englab.brq.redhat.com:/exports/scratch         /mnt/scratch       nfs rw  0 0
nfs.englab.brq.redhat.com:/pub                     /mnt/globalsync    nfs ro  0 0
ntap-vader.corp.redhat.com:/vol/engarchive2        /mnt/engarchive2   nfs ro  0 0
curly.devel.redhat.com:/vol/engineering/devarchive/redhat /mnt/redhat nfs ro  0 0

Comment 7 Michal Schmidt 2011-04-28 14:09:45 UTC
From the log:
[    8.475273] systemd[1]: systemd 24 running in system mode.

Make sure you test with 25.

Comment 8 Vít Ondruch 2011-04-29 09:00:55 UTC
I am sure: 

$ rpm -q systemd
systemd-25-1.fc15.x86_64

Please make sure that systemd is reporting proper version number.

Comment 9 Michal Schmidt 2011-04-29 09:47:44 UTC
It reports 25 just fine here.

$ dmesg|grep 'running in system'
[    6.361466] systemd[1]: systemd 25 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +SYSVINIT +LIBCRYPTSETUP; fedora)

$ sudo prelink -u /bin/systemd; ls -l /bin/systemd /sbin/init; md5sum /bin/systemd
-rwxr-xr-x. 1 root root 765920 21. dub 03.51 /bin/systemd
lrwxrwxrwx. 1 root root     14 22. dub 09.14 /sbin/init -> ../bin/systemd
07cdbe26ad043d68f0b2d1262333e3f6  /bin/systemd

$ rpm -V systemd
$

Comment 10 Vít Ondruch 2011-04-29 10:05:28 UTC
Hmmm, $rpm -V systemd reports a lot of errors. How that can be?

Ok, I did

$ sudo yum reinstall systemd-* --enablerepo=testing-udpates

and will see what happen.

Comment 11 Vít Ondruch 2011-04-29 16:56:59 UTC
Ok, it seems it works now. Not sure what was the reason for borked update :/ sorry for the noise.

Comment 12 David Zeuthen 2011-05-04 21:07:01 UTC
(In reply to comment #3)
> in systemd 25 we have everything in place to mark encrypted partitions for
> non-timed out password queries. This has to be configured at multiple places.

Sorry, but it's completely unacceptable that this is something that has to be configured - it really needs to work out of the box without any configuration. Reopening.

Comment 13 David Zeuthen 2011-05-04 21:10:40 UTC
Basically, what I'm looking for is an explanation for why this needs to be configuration _at all_... if you are concerned about servers or systems without a console, note that if you care about your server, then it's fine to assume there's a KVM or serial console.

Comment 14 Lennart Poettering 2011-05-08 23:00:28 UTC
david, read comment #3 and file a bug against anaconda.

Comment 15 David Zeuthen 2011-05-09 17:52:47 UTC
(In reply to comment #14)
> david, read comment #3 and file a bug against anaconda.

Of course I read comment 3. And I still maintain it's unreasonable that this is configurable at all. Please see e.g. http://ometer.com/free-software-ui.html for why it's an evil escape hatch to just throw configuration options at problems. Reopening. Please explain why it needs to be configurable before closing it again.

Comment 16 Lennart Poettering 2011-05-10 20:35:56 UTC
Jeez man. You didn't read #6. 

Let me try this again: if you place LABEL=foo in /etc/fstab, and we never see the device popping up, then this can have two reasons:

a) the device is not encrypted but simply not plugged in/powered on. hence never shows up

b) the device is encrypted, but the user never entered the key.

In case a) you want a timeout, because it is a hardware issue and just because your HDD broke you don't want a frozen system. In case b) you want no timeout, because we actually wait for user input.

Now, the problem is that we have no chance to distinguish the two cases automatically. We cannot know whether LABEL=foo refers to an encrypted or an unencrypted device, because the label is stored encrypted on the device. As long as it is not decrypted we won't know the label of a device. Hence we cannot know whether LABEL=foo will show up on an encrypted disk or not.

Because that is the way it is, we cannot automate this. The only part of the system which actually knows that LABEL=foo is on an encrypted disk is anaconda since it set it up, hence file the bug there.

Please don't reopen this bug all the time. You cannot argue with logic.

Comment 17 David Zeuthen 2011-05-11 19:27:56 UTC
My point is that the default behavior should be to _not_ continue booting if some dependency cannot be fulfilled.

Requiring random config files with references to specific devices (so these config files can't be reused in generic images) to have a working system is just completely insane and shows quite a lack of taste on your part.

FYI, even the installer team (historically) agrees with this - historically they've been trying to minimize the number of config files the installer spits out. And with good reason - no-one wants a system where you need to tweak stupid config files to make things work. I'm not going to reopen this - hopefully you will open your eyes and fix this.

Comment 18 David Zeuthen 2011-05-11 19:41:23 UTC
And as I said in comment 13, if the device is not there or the HDD is dead or faulty or whatever, then the user can already fix this by e.g. changing the kernel command line to boot into a bash shell or use some other kind of rescue mode. There's absolutely no need for you to continue the boot process because you have some flawed idea that the user is not capable of doing this.

Comment 19 Nils Philippsen 2011-06-10 10:51:25 UTC
(In reply to comment #16)
> a) the device is not encrypted but simply not plugged in/powered on. hence
> never shows up
> 
> b) the device is encrypted, but the user never entered the key.

What I don't understand is why systemd doesn't do this:

In the case where we have one or more encrypted, not yet unlocked volumes, systemd should know if the unencrypted volumes can fulfil all mount points nor marked as "noauto" in fstab. If one of these is missing, it should act on the assumption that the wanted volume may be on one of the yet-to-be unlocked encrypted volumes and wait for the passphrases to be typed in without timing out. In the case where this assumption is correct, it will continue booting afterwards. If this assumption doesn't hold true (device missing or broken), the need to type the password(s) in for encrypted volumes on removable media, or to reboot without these being plugged in is a small cost in my eyes. For cases where the actual owner or admin of a machine doesn't know the passphrase of one or more volumes on a machine, systemd could have a command line option to let asking for passphrases time out like it does now. But this uncommon case should not be the default IMO.

If my assumptions above are correct, I dispute CANTFIX. If not, please enlighten me what I'm missing ;-).

Comment 20 Michal Schmidt 2011-06-10 11:05:02 UTC
I agree with Nils.

One possible way to implement it would be to set a special property on the password agent services (systemd-ask-password-*.service) that would cause systemd to delay the timeouts for the time the service is running.

An alternative way could be to have systemd-cryptsetup somehow ask systemd to delay the timeouts while it's running.

Lennart, please reconsider.

Comment 21 Lennart Poettering 2011-06-15 16:32:17 UTC
(In reply to comment #19)
> (In reply to comment #16)
> > a) the device is not encrypted but simply not plugged in/powered on. hence
> > never shows up
> > 
> > b) the device is encrypted, but the user never entered the key.
> 
> What I don't understand is why systemd doesn't do this:
> 
> In the case where we have one or more encrypted, not yet unlocked volumes,
> systemd should know if the unencrypted volumes can fulfil all mount points nor
> marked as "noauto" in fstab. 

So you have LABEL=foobar listed in fstab. To find that fs you need to be able to read the superblock of all file systems plugged in. Now, for the normal case this is easy, you just go and read it. But for the encrypted ones it's not so easy, because you can figure out the label only *after* having decrypted it. And that's is the problem. (Similar for UUID=)

Comment 22 Michal Schmidt 2011-06-15 16:59:35 UTC
(In reply to comment #21)
> (In reply to comment #19)
> > In the case where we have one or more encrypted, not yet unlocked volumes,
> > systemd should know if the unencrypted volumes can fulfil all mount points nor
> > marked as "noauto" in fstab. 
> 
> So you have LABEL=foobar listed in fstab. To find that fs you need to be able
> to read the superblock of all file systems plugged in. Now, for the normal case
> this is easy, you just go and read it. But for the encrypted ones it's not so
> easy, because you can figure out the label only *after* having decrypted it.
> And that's is the problem. (Similar for UUID=)

Lennart,

Everybody agrees with that. Nils does too. That's why he specifically wrote about "unencrypted" (i.e. decrypted, unlocked) volumes in the part you quoted.
However, the most interesting part of Nils's comment is what follows after that and it is the part that has not been answered.

Comment 23 Dr. Tilmann Bubeck 2011-07-19 16:19:47 UTC
This seems to be a regression since FC14, which waited indefinitely for the password. Here is an outline of rc.sysinit, how they did this, where "init_crypto" asked for a password and then made "luksOpen" and no timeout happened.

[...]
# line 173: 
init_crypto 0

[...]
# line 200:
# Start any MD RAID arrays that haven't been started yet
[...] /sbin/mdadm -IRs
# Setting up Logical Volume Management
[...] /sbin/lvm vgchange -a y --sysinit

init_crypto 0

# line 405: 
fsck -T -t noopts=_netdev -A $fsckoptions
# line 502: 
mount -a -t nonfs,nfs4,smbfs,ncpfs,cifs,gfs,gfs2 -O no_netdev

And then again
init_crypto 1 in /etc/init.d/netfs