Bug 868421

Summary: Dracut should not time out and fail waiting for LUKS decryption
Product: [Fedora] Fedora Reporter: Stephen Gallagher <sgallagh>
Component: dracutAssignee: dracut-maint
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: 18CC: akostadi, awilliam, bct, bkoz, bruno, bvandenh, cs+rhbz, diego.ml, dracut-maint, eblake, fabrice, hansgeorg.schwibbe, harald, ikke, james.antill, jeff, jmontleo, john.florian, jonathan, kparal, kraymond, mattdm, michele, mjc, nikodll, nonamedotc, pablo.iranzo, pcfe, peter, pfrields, rbergero, robatino, sgrubb, smooge, spoore, tcarter, tflink, theinric, wwoods
Target Milestone: ---Keywords: CommonBugs, Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=861123
https://bugzilla.redhat.com/show_bug.cgi?id=881670
https://bugzilla.redhat.com/show_bug.cgi?id=949702
Whiteboard: https://fedoraproject.org/wiki/Common_F18_bugs#encrypt-timeout-rescue RejectedBlocker
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 949697 (view as bug list) Environment:
Last Closed: 2013-05-28 10:24:25 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Attachments:
Description Flags
A photo of my monitor after it happened none

Description Stephen Gallagher 2012-10-19 15:39:18 EDT
Description of problem:
With full-disk encryption enabled, dracut will fail and drop the user into an emergency recovery mode if they do not enter their LUKS password in time. This is not user-friendly and the timeout should be eliminated.

Version-Release number of selected component (if applicable):
dracut-024-5.git20121019.fc18.x86_64

How reproducible:
Every time

Steps to Reproduce:
1. Install Fedora with full-disk encryption (this may also be the case with individual partition encryption, I haven't tested that)
2. Power on the system
3. Go get a cup of coffee
4. When you return in a few minutes, the system will be sitting at the emergency prompt.
  
Actual results:
The system timed out waiting for the LUKS password prompt and dropped the user into an emergency prompt.

Expected results:
The system should just remain displaying the LUKS password prompt indefinitely until the user provides a password.

Additional info:
I remember this working properly in older versions of Fedora, but I didn't happen to notice exactly when this behavior started. I think during F17.
Comment 1 Harald Hoyer 2012-11-13 09:00:37 EST
Was the initramfs generated with dracut-024-5.git20121019.fc18.x86_64 ??

# lsinitrd | head -4
Comment 2 Stephen Gallagher 2012-11-13 10:47:42 EST
My current system produces:

[sgallagh@sgallagh520:~]$ sudo lsinitrd |head -4
/boot/initramfs-3.6.6-3.fc18.x86_64.img: 19M
========================================================================
dracut-024-5.git20121019.fc18
========================================================================


I haven't checked in the last week or so whether this is still the case. I'll see if it happens to me when I reboot in a little while and then will report it.
Comment 3 Stephen Gallagher 2012-11-13 15:57:34 EST
I can confirm that this is still the case on 

dracut-024-5.git20121019.fc18.x86_64
kernel-3.6.6-3.fc18.x86_64


Which as I look again is clearly the same version I was originally running.

I did time it this time, it times out after 120 seconds.
Comment 4 Harald Hoyer 2012-11-22 07:10:18 EST
which version of systemd is this?
Comment 5 Stephen Gallagher 2012-11-26 07:55:33 EST
I am running systemd-195-7.fc18.x86_64 currently (and I confirmed again this morning that the issue is still occurring).
Comment 6 Tim Flink 2012-11-30 13:22:20 EST
This seems related to the following systemd bugs but approaching the issue from dracut instead of systemd:
 - https://bugzilla.redhat.com/show_bug.cgi?id=861123
 - https://bugzilla.redhat.com/show_bug.cgi?id=881670

861123 was rejected as a blocker and I'm thinking this falls into the same category. It doesn't clearly violate any of the F18 release criteria and could be fixed post-release with an update.

-1 blocker
Comment 7 Adam Williamson 2012-12-01 03:16:55 EST
agreed - the difference is just whether / is encrypted or not. -1 on the basis of decisions on those bugs.
Comment 8 Kevin Fenzi 2012-12-04 18:46:09 EST
-1 blocker, but +1 NTH as it seems an anoying bug.
Comment 9 Robyn Bergeron 2012-12-05 01:01:25 EST
-1 blocker, 0 on NTH - not sure how much a fix could potentially rock the boat.
Comment 10 Kamil Páral 2012-12-05 05:15:24 EST
I think there is a lot of confusion in all these issues (see See Also) and we shouldn't use in-bug voting for this. I'm mainly interested whether this truly can be fixed post-release with an update.
Comment 11 Harald Hoyer 2012-12-05 07:54:12 EST
(In reply to comment #10)
> I think there is a lot of confusion in all these issues (see See Also) and
> we shouldn't use in-bug voting for this. I'm mainly interested whether this
> truly can be fixed post-release with an update.

yes, it can be fixed post-release
Comment 12 Tim Flink 2012-12-05 10:23:27 EST
I count -4 blocker votes, moving it to rejected.
Comment 13 Kamil Páral 2012-12-05 10:49:44 EST
I have talked to Harald on IRC, here are some details:

<kparal> haraldh: hello, I would like to understand the issue better. currently it seems that dracut should be handling unlocking the root partition (and not timing out), and systemd should be handling unlocking all other partitions (timing out if the partition doesn't have a special flag in /etc/fstab). is that correct?
<haraldh> kparal, dracut should fill in crypttab with "timeout=0"
<kparal> haraldh: sorry, but how can you read /etc/crypttab when it is located on the root partition, that is still locked?
<kparal> haraldh: the root= partition should simply never time out, isn't that right?
<haraldh> kparal, it's a generated crypttab in the initramfs
<haraldh> generated from the rd.luks.uuid on the kernel command line
<kparal> haraldh: oh I see. so simple dracut update can fix the issue, and people don't have to modify their /etc/cryptab by hand, right?
<haraldh> yes
<haraldh> we even might patch systemd to no timeout, if it is in the initramfs

Some things are still unknown:
<kparal> haraldh: one more question, do you know why lennart asked anaconda guys to supply x-systemd.device-timeout=0 to all encrypted partitions in /etc/fstab, when we could automatically generate it from /etc/crypttab (add timeout=0 to all lines in /etc/crypttab)?
<kparal> I mean automatically during initrd generation
<kparal> hmm, if systemd is not inside initrd but is running from the disk, then it sees just the original unmodified /etc/crypttab, not the internal one in initrd with the timeout metadata. that might be the reason
<kparal> but I don't really know if that's true or not
<haraldh> kparal, I'll ask Lennart about x-systemd.device-timeout=0
<kparal> haraldh: I think he's on a long vacation now, I was just curious if you know more about it. if you don't, doesn't matter, thanks anyway
Comment 14 Steve Grubb 2012-12-07 09:08:40 EST
This is also happening when only /home is encrypted. I'd think this is a blocker bug since its rather rude to boot your laptop and get busy with something only to find boot has completely failed instead of waiting for your password.
Comment 15 Kamil Páral 2012-12-10 13:26:20 EST
I tested with just encrypted /home. It always falls into dracut shell, no matter what I try. I added x-systemd.device-timeout=0 to /etc/fstab, to /etc/crypttab and rebuilt initrd by running dracut -f. It still times out.
Comment 16 Adam Williamson 2012-12-10 19:47:00 EST
steve: it's rude, sure, but last I checked, 'rudeness' didn't block release. :) you can just reboot and try again, yes?
Comment 17 Steve Grubb 2012-12-10 20:06:42 EST
Well, I am rebooting and trying again. But I see lots of updates being pushed into F18 fixing trivial bugs while this is a bigger problem. Is there a scratch build for testing or something sitting in the testing repo that we can try to establish confidence in the fix? IOW, if this is going to be fixed post release as a 0 day update, shouldn't we be testing the fix soonish? Thanks.
Comment 18 Harald Hoyer 2012-12-14 04:16:59 EST
*** Bug 884847 has been marked as a duplicate of this bug. ***
Comment 19 Harald Hoyer 2012-12-14 04:19:12 EST
For those with /home encrypted. Either have "rd.luks=0", if you root and /usr is not encrypted or "rd.luks.uuid=<luksofrootorusr>" on the kernel command line. Then dracut will not try to open /home.
Comment 20 Kamil Páral 2012-12-14 08:48:26 EST
(In reply to comment #19)
> For those with /home encrypted. Either have "rd.luks=0", if you root and
> /usr is not encrypted or "rd.luks.uuid=<luksofrootorusr>" on the kernel
> command line. Then dracut will not try to open /home.

No, rd.luks=0 doesn't help, with just an encrypted /home. Still times out. /etc/fstab contains x-systemd.device-timeout=0.
Comment 21 Harald Hoyer 2012-12-18 08:55:56 EST
(In reply to comment #20)
> (In reply to comment #19)
> > For those with /home encrypted. Either have "rd.luks=0", if you root and
> > /usr is not encrypted or "rd.luks.uuid=<luksofrootorusr>" on the kernel
> > command line. Then dracut will not try to open /home.
> 
> No, rd.luks=0 doesn't help, with just an encrypted /home. Still times out.
> /etc/fstab contains x-systemd.device-timeout=0.

Then the time out does not happen in the initramfs in this case.
Comment 22 Fedora Update System 2012-12-19 11:13:43 EST
dracut-024-15.git20121218.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/FEDORA-2012-20580/dracut-024-15.git20121218.fc18
Comment 23 Kamil Páral 2012-12-19 11:42:06 EST
(In reply to comment #22)
Doesn't fix neither full disk encryption nor just /home encryption. The prompt still times out.
Comment 24 Fedora Update System 2012-12-20 00:37:25 EST
dracut-024-15.git20121218.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 25 Travers Carter 2013-01-21 00:18:36 EST
I'm still seeing the timeout with dracut-024-18.git20130102.fc18 (I have / encrypted)
Comment 26 Rahul Sundaram 2013-01-25 16:04:22 EST
*** Bug 904228 has been marked as a duplicate of this bug. ***
Comment 27 Ben Thompson 2013-01-27 08:54:20 EST
Still seeing this problem with dracut-024-23.git20130118.fc18
Comment 28 Ilkka Tengvall 2013-01-28 05:17:38 EST
This happened to me half an hour ago using net upgrade, so it should pull the latest stuff available. I have my entire pv encrypted.
Comment 29 Ilkka Tengvall 2013-01-28 08:04:55 EST
dracut version is 024-23.git20130118.fc18.x86_64
systemd version is 197-1.fc18.1.x86_64
Comment 30 hansgeorg.schwibbe 2013-02-10 06:46:24 EST
Created attachment 695729 [details]
A photo of my monitor after it happened

I have the same issue after upgrading from Fedora 17 to Fedora 18. The timeout issue appears about a couple of minutes after the password prompt appeared.
Comment 31 Kamil Páral 2013-02-18 05:54:53 EST
Harald, there is a patch in bug 861123 comment 36 to 38. Can you please look at it? Thanks.
Comment 32 Harald Hoyer 2013-02-28 09:28:37 EST
(In reply to comment #31)
> Harald, there is a patch in bug 861123 comment 36 to 38. Can you please look
> at it? Thanks.

yes
Comment 33 Stephen John Smoogen 2013-04-02 12:59:23 EDT
This is affecting Fedora 19 Alpha so can we get this fixed..
Comment 34 Benjamin Kosnik 2013-04-05 14:40:48 EDT
I'm going to try to up the severity on this. That you cannot boot encrypted un-attended is quite severe.
Comment 35 Stephen John Smoogen 2013-04-08 13:44:25 EDT
This needs to be fixed if not for Fedora, it is going to be a Red Hat Future Release major problem.
Comment 36 Paul W. Frields 2013-04-08 14:31:00 EDT
Looks like the first patch is in upstream master (post-v200) so F19 would presumably get it in the next packaged version.  The second patch had a bunch of discussion and I couldn't follow all of it, but looked like it was rejected.  Can't tell if both are needed to support the case of not timing out for an encrypted fs.
Comment 37 Harald Hoyer 2013-04-08 14:46:16 EDT
(In reply to comment #35)
> This needs to be fixed if not for Fedora, it is going to be a Red Hat Future
> Release major problem.

no worries ... we will fix it.

let's wait for systemd-201 .. today or tomorrow
Comment 38 Harald Hoyer 2013-04-08 14:47:32 EDT
btw, anaconda should write "timeout=0" in /etc/crypttab for everything.
Comment 39 David Lehman 2013-04-08 18:29:39 EDT
Anaconda already writes x-systemd.device-timeout=0 to the options in /etc/fstab for encrypted devices. That's all we've been asked to do on this front. We have to add something to /etc/crypttab as well? Okay. What else? Let's get it all done in one shot if that's possible.
Comment 40 Harald Hoyer 2013-04-09 02:28:12 EDT
(In reply to comment #39)
> Anaconda already writes x-systemd.device-timeout=0 to the options in
> /etc/fstab for encrypted devices. That's all we've been asked to do on this
> front. We have to add something to /etc/crypttab as well? Okay. What else?
> Let's get it all done in one shot if that's possible.

That's it.
Comment 42 Harald Hoyer 2013-04-09 04:07:23 EDT
(In reply to comment #38)
> btw, anaconda should write "timeout=0" in /etc/crypttab for everything.

i.e.:

luks-3112725b-f982-4f5a-9a05-08873a187202 /dev/disk/by-uuid/3112725b-f982-4f5a-9a05-08873a187202 - timeout=0
Comment 43 Bruno Wolff III 2013-04-09 10:14:57 EDT
I do want to mention that as a side affect of this fix, some other broken cases have gotten worse. I had encrypted home not getting mounted in early boot (probably because of a raid array not being available at that time), but after a timeout the boot would proceed and home would get mounted later. With the change, the boot did get past this point. I ended up having to explicitly list which luks devices to try to mount in early boot to keep systemd from trying to mount home until later.
Comment 44 Harald Hoyer 2013-04-09 10:46:45 EDT
(In reply to comment #43)
> I do want to mention that as a side affect of this fix, some other broken
> cases have gotten worse. I had encrypted home not getting mounted in early
> boot (probably because of a raid array not being available at that time),
> but after a timeout the boot would proceed and home would get mounted later.
> With the change, the boot did get past this point. I ended up having to
> explicitly list which luks devices to try to mount in early boot to keep
> systemd from trying to mount home until later.

What is the problem now again?
Does systemd later on fail to mount /home?
Comment 45 Bruno Wolff III 2013-04-09 12:03:30 EDT
I was referring to the fix making the symptoms of bug 919752 worse. I don't think the fix is wrong, but rather that we might get some more questions because of it.

And 919752 has a work around that has been tested and it has a fix today which I plan to test within the hour.
Comment 46 David Lehman 2013-04-09 14:23:57 EDT
Why is this not the default behavior? It does not make sense to ask us to add code to the OS installer to enforce a default that is apparently not a default. Can someone justify this?
Comment 47 Kamil Páral 2013-04-10 11:15:56 EDT
David, IIUIC, they want to have a default of timeout=0 for the root partition and encrypted partitions, but timeout=30 for all other partitions. So that even in a case of malfunctioning HDD with non-critical partitions the system still boots.

What I fail to understand, however, is that why kernel+plymouth know these things (plymouth displaying a password prompt is the proof that we know what is encrypted and what is not), but dracut+systemd can't use the same approach and need hints in fstab and crypttab. In other words, I don't understand why it can't work automatically.
Comment 48 Kamil Páral 2013-04-10 11:42:22 EDT
(In reply to comment #47)
> What I fail to understand, however, is that why kernel+plymouth know these
> things (plymouth displaying a password prompt is the proof that we know what
> is encrypted and what is not), but dracut+systemd can't use the same
> approach and need hints in fstab and crypttab. In other words, I don't
> understand why it can't work automatically.

Lennart answered this in bug 861123 comment 42. TLDR: too complicated.
Comment 49 David Lehman 2013-04-10 11:48:02 EDT
(In reply to comment #47)
> David, IIUIC, they want to have a default of timeout=0 for the root
> partition and encrypted partitions, but timeout=30 for all other partitions.

Right, so why can't they make the default for crypttab entries be a timeout value of 0? It's what makes the most sense as a default anyway. I mean, what other default value makes more sense for a timeout?

If all encrypted devices should have a default timeout of 0 then why must this be specified in /etc/crypttab? It shouldn't -- it is a default.
Comment 50 Bruno Wolff III 2013-04-10 12:12:37 EDT
Note that 90crypt/module-setup.sh has been changed to rewrite the initramfs crypttab file to only list devices needed for early boot. This could probably be further changed to include timeout=0 if a timeout wasn't specified in the crypttab file.
Comment 51 Bruno Wolff III 2013-04-10 12:17:28 EDT
I should add this is only done in hostonly mode, but that is the new default.
Comment 52 Bruno Wolff III 2013-04-10 15:30:36 EDT
If I did some work for a patch to dracut to add timeout=0 to the crypttab file used in the initramfs for hostonly if a timeout wasn't provided in the original crypttab, would this be likely to be accepted (or at least use a base for a similar patch)?
Comment 53 David Lehman 2013-04-11 16:00:35 EDT
To make myself clear, I've decided not to patch anaconda or blivet for this purpose until/unless someone convinces me that there is a good reason to do so. That discussion should occur in bug 949702.
Comment 54 Adam Williamson 2013-05-13 13:03:19 EDT
Wiping RejectedBlocker as Smooge re-proposed this.
Comment 55 Adam Williamson 2013-05-13 13:20:20 EDT
Discussed at 2013-05-13 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-05-13/f19beta-blocker-review-5.2013-05-13-16.00.log.txt . We're still agreed that this doesn't violate Beta criteria and does not need to block the Beta release. Rejected as a blocker.
Comment 56 Stephen John Smoogen 2013-05-13 13:33:36 EDT
I think this may also be closed. I don't see it anymore with my laptop due to fixes in systemd.
Comment 57 Adam Williamson 2013-12-05 15:41:19 EST
So I noticed this showed up in the dracut harald submitted to fix a bug with decryption keyboard layout on upgrade this morning:

- fixed failing the boot while waiting for password input

http://koji.fedoraproject.org/koji/buildinfo?buildID=483057

Harald, what's the story there? Does it just fix this harder, or what? :) Is it talking about some other 'password'?
Comment 58 Stephen Gallagher 2013-12-05 15:54:37 EST
For the record, this had regressed in Fedora 20. I'd been meaning to dig up this BZ and reopen it, now it looks like I may not need to.
Comment 59 Harald Hoyer 2013-12-06 02:24:48 EST
(In reply to Adam Williamson from comment #57)
> So I noticed this showed up in the dracut harald submitted to fix a bug with
> decryption keyboard layout on upgrade this morning:
> 
> - fixed failing the boot while waiting for password input
> 
> http://koji.fedoraproject.org/koji/buildinfo?buildID=483057
> 
> Harald, what's the story there? Does it just fix this harder, or what? :) Is
> it talking about some other 'password'?

In F20, I let systemd have total control over the password query for encrypted devices. That meant, that dracut had to wait somehow for it. That wait procedure was broken until now and should be fixed with the latest release.