Bug 868421
Summary: | Dracut should not time out and fail waiting for LUKS decryption | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Stephen Gallagher <sgallagh> | ||||
Component: | dracut | Assignee: | dracut-maint | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 18 | CC: | akostadi, awilliam, bct, bkoz, bruno, bvandenh, cs+rhbz, diego.ml, dracut-maint, eblake, fabrice, hansgeorg.schwibbe, harald, ikke, james.antill, jeff, jmontleo, john, jonathan, kparal, kraymond, mattdm, michele, mjc, nikodll, nonamedotc, pablo.iranzo, pcfe, peter, pfrields, rbergero, robatino, sgrubb, smooge, spoore, tcarter, tflink, theinric, wwoods | ||||
Target Milestone: | --- | Keywords: | CommonBugs, Reopened | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | https://fedoraproject.org/wiki/Common_F18_bugs#encrypt-timeout-rescue RejectedBlocker | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 949697 (view as bug list) | Environment: | |||||
Last Closed: | 2013-05-28 14:24:25 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Stephen Gallagher
2012-10-19 19:39:18 UTC
Was the initramfs generated with dracut-024-5.git20121019.fc18.x86_64 ?? # lsinitrd | head -4 My current system produces: [sgallagh@sgallagh520:~]$ sudo lsinitrd |head -4 /boot/initramfs-3.6.6-3.fc18.x86_64.img: 19M ======================================================================== dracut-024-5.git20121019.fc18 ======================================================================== I haven't checked in the last week or so whether this is still the case. I'll see if it happens to me when I reboot in a little while and then will report it. I can confirm that this is still the case on dracut-024-5.git20121019.fc18.x86_64 kernel-3.6.6-3.fc18.x86_64 Which as I look again is clearly the same version I was originally running. I did time it this time, it times out after 120 seconds. which version of systemd is this? I am running systemd-195-7.fc18.x86_64 currently (and I confirmed again this morning that the issue is still occurring). This seems related to the following systemd bugs but approaching the issue from dracut instead of systemd: - https://bugzilla.redhat.com/show_bug.cgi?id=861123 - https://bugzilla.redhat.com/show_bug.cgi?id=881670 861123 was rejected as a blocker and I'm thinking this falls into the same category. It doesn't clearly violate any of the F18 release criteria and could be fixed post-release with an update. -1 blocker agreed - the difference is just whether / is encrypted or not. -1 on the basis of decisions on those bugs. -1 blocker, but +1 NTH as it seems an anoying bug. -1 blocker, 0 on NTH - not sure how much a fix could potentially rock the boat. I think there is a lot of confusion in all these issues (see See Also) and we shouldn't use in-bug voting for this. I'm mainly interested whether this truly can be fixed post-release with an update. (In reply to comment #10) > I think there is a lot of confusion in all these issues (see See Also) and > we shouldn't use in-bug voting for this. I'm mainly interested whether this > truly can be fixed post-release with an update. yes, it can be fixed post-release I count -4 blocker votes, moving it to rejected. I have talked to Harald on IRC, here are some details: <kparal> haraldh: hello, I would like to understand the issue better. currently it seems that dracut should be handling unlocking the root partition (and not timing out), and systemd should be handling unlocking all other partitions (timing out if the partition doesn't have a special flag in /etc/fstab). is that correct? <haraldh> kparal, dracut should fill in crypttab with "timeout=0" <kparal> haraldh: sorry, but how can you read /etc/crypttab when it is located on the root partition, that is still locked? <kparal> haraldh: the root= partition should simply never time out, isn't that right? <haraldh> kparal, it's a generated crypttab in the initramfs <haraldh> generated from the rd.luks.uuid on the kernel command line <kparal> haraldh: oh I see. so simple dracut update can fix the issue, and people don't have to modify their /etc/cryptab by hand, right? <haraldh> yes <haraldh> we even might patch systemd to no timeout, if it is in the initramfs Some things are still unknown: <kparal> haraldh: one more question, do you know why lennart asked anaconda guys to supply x-systemd.device-timeout=0 to all encrypted partitions in /etc/fstab, when we could automatically generate it from /etc/crypttab (add timeout=0 to all lines in /etc/crypttab)? <kparal> I mean automatically during initrd generation <kparal> hmm, if systemd is not inside initrd but is running from the disk, then it sees just the original unmodified /etc/crypttab, not the internal one in initrd with the timeout metadata. that might be the reason <kparal> but I don't really know if that's true or not <haraldh> kparal, I'll ask Lennart about x-systemd.device-timeout=0 <kparal> haraldh: I think he's on a long vacation now, I was just curious if you know more about it. if you don't, doesn't matter, thanks anyway This is also happening when only /home is encrypted. I'd think this is a blocker bug since its rather rude to boot your laptop and get busy with something only to find boot has completely failed instead of waiting for your password. I tested with just encrypted /home. It always falls into dracut shell, no matter what I try. I added x-systemd.device-timeout=0 to /etc/fstab, to /etc/crypttab and rebuilt initrd by running dracut -f. It still times out. steve: it's rude, sure, but last I checked, 'rudeness' didn't block release. :) you can just reboot and try again, yes? Well, I am rebooting and trying again. But I see lots of updates being pushed into F18 fixing trivial bugs while this is a bigger problem. Is there a scratch build for testing or something sitting in the testing repo that we can try to establish confidence in the fix? IOW, if this is going to be fixed post release as a 0 day update, shouldn't we be testing the fix soonish? Thanks. *** Bug 884847 has been marked as a duplicate of this bug. *** For those with /home encrypted. Either have "rd.luks=0", if you root and /usr is not encrypted or "rd.luks.uuid=<luksofrootorusr>" on the kernel command line. Then dracut will not try to open /home. (In reply to comment #19) > For those with /home encrypted. Either have "rd.luks=0", if you root and > /usr is not encrypted or "rd.luks.uuid=<luksofrootorusr>" on the kernel > command line. Then dracut will not try to open /home. No, rd.luks=0 doesn't help, with just an encrypted /home. Still times out. /etc/fstab contains x-systemd.device-timeout=0. (In reply to comment #20) > (In reply to comment #19) > > For those with /home encrypted. Either have "rd.luks=0", if you root and > > /usr is not encrypted or "rd.luks.uuid=<luksofrootorusr>" on the kernel > > command line. Then dracut will not try to open /home. > > No, rd.luks=0 doesn't help, with just an encrypted /home. Still times out. > /etc/fstab contains x-systemd.device-timeout=0. Then the time out does not happen in the initramfs in this case. dracut-024-15.git20121218.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/FEDORA-2012-20580/dracut-024-15.git20121218.fc18 (In reply to comment #22) Doesn't fix neither full disk encryption nor just /home encryption. The prompt still times out. dracut-024-15.git20121218.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report. I'm still seeing the timeout with dracut-024-18.git20130102.fc18 (I have / encrypted) *** Bug 904228 has been marked as a duplicate of this bug. *** Still seeing this problem with dracut-024-23.git20130118.fc18 This happened to me half an hour ago using net upgrade, so it should pull the latest stuff available. I have my entire pv encrypted. dracut version is 024-23.git20130118.fc18.x86_64 systemd version is 197-1.fc18.1.x86_64 Created attachment 695729 [details]
A photo of my monitor after it happened
I have the same issue after upgrading from Fedora 17 to Fedora 18. The timeout issue appears about a couple of minutes after the password prompt appeared.
Harald, there is a patch in bug 861123 comment 36 to 38. Can you please look at it? Thanks. (In reply to comment #31) > Harald, there is a patch in bug 861123 comment 36 to 38. Can you please look > at it? Thanks. yes This is affecting Fedora 19 Alpha so can we get this fixed.. I'm going to try to up the severity on this. That you cannot boot encrypted un-attended is quite severe. This needs to be fixed if not for Fedora, it is going to be a Red Hat Future Release major problem. Looks like the first patch is in upstream master (post-v200) so F19 would presumably get it in the next packaged version. The second patch had a bunch of discussion and I couldn't follow all of it, but looked like it was rejected. Can't tell if both are needed to support the case of not timing out for an encrypted fs. (In reply to comment #35) > This needs to be fixed if not for Fedora, it is going to be a Red Hat Future > Release major problem. no worries ... we will fix it. let's wait for systemd-201 .. today or tomorrow btw, anaconda should write "timeout=0" in /etc/crypttab for everything. Anaconda already writes x-systemd.device-timeout=0 to the options in /etc/fstab for encrypted devices. That's all we've been asked to do on this front. We have to add something to /etc/crypttab as well? Okay. What else? Let's get it all done in one shot if that's possible. (In reply to comment #39) > Anaconda already writes x-systemd.device-timeout=0 to the options in > /etc/fstab for encrypted devices. That's all we've been asked to do on this > front. We have to add something to /etc/crypttab as well? Okay. What else? > Let's get it all done in one shot if that's possible. That's it. part2: systemd-201 https://admin.fedoraproject.org/updates/FEDORA-2013-5141/systemd-201-1.fc19,initial-setup-0.3.4-3.fc19?_csrf_token=f9d46dbd50a42787a3bd3fee798ab76c47fbcd32 (In reply to comment #38) > btw, anaconda should write "timeout=0" in /etc/crypttab for everything. i.e.: luks-3112725b-f982-4f5a-9a05-08873a187202 /dev/disk/by-uuid/3112725b-f982-4f5a-9a05-08873a187202 - timeout=0 I do want to mention that as a side affect of this fix, some other broken cases have gotten worse. I had encrypted home not getting mounted in early boot (probably because of a raid array not being available at that time), but after a timeout the boot would proceed and home would get mounted later. With the change, the boot did get past this point. I ended up having to explicitly list which luks devices to try to mount in early boot to keep systemd from trying to mount home until later. (In reply to comment #43) > I do want to mention that as a side affect of this fix, some other broken > cases have gotten worse. I had encrypted home not getting mounted in early > boot (probably because of a raid array not being available at that time), > but after a timeout the boot would proceed and home would get mounted later. > With the change, the boot did get past this point. I ended up having to > explicitly list which luks devices to try to mount in early boot to keep > systemd from trying to mount home until later. What is the problem now again? Does systemd later on fail to mount /home? I was referring to the fix making the symptoms of bug 919752 worse. I don't think the fix is wrong, but rather that we might get some more questions because of it. And 919752 has a work around that has been tested and it has a fix today which I plan to test within the hour. Why is this not the default behavior? It does not make sense to ask us to add code to the OS installer to enforce a default that is apparently not a default. Can someone justify this? David, IIUIC, they want to have a default of timeout=0 for the root partition and encrypted partitions, but timeout=30 for all other partitions. So that even in a case of malfunctioning HDD with non-critical partitions the system still boots. What I fail to understand, however, is that why kernel+plymouth know these things (plymouth displaying a password prompt is the proof that we know what is encrypted and what is not), but dracut+systemd can't use the same approach and need hints in fstab and crypttab. In other words, I don't understand why it can't work automatically. (In reply to comment #47) > What I fail to understand, however, is that why kernel+plymouth know these > things (plymouth displaying a password prompt is the proof that we know what > is encrypted and what is not), but dracut+systemd can't use the same > approach and need hints in fstab and crypttab. In other words, I don't > understand why it can't work automatically. Lennart answered this in bug 861123 comment 42. TLDR: too complicated. (In reply to comment #47) > David, IIUIC, they want to have a default of timeout=0 for the root > partition and encrypted partitions, but timeout=30 for all other partitions. Right, so why can't they make the default for crypttab entries be a timeout value of 0? It's what makes the most sense as a default anyway. I mean, what other default value makes more sense for a timeout? If all encrypted devices should have a default timeout of 0 then why must this be specified in /etc/crypttab? It shouldn't -- it is a default. Note that 90crypt/module-setup.sh has been changed to rewrite the initramfs crypttab file to only list devices needed for early boot. This could probably be further changed to include timeout=0 if a timeout wasn't specified in the crypttab file. I should add this is only done in hostonly mode, but that is the new default. If I did some work for a patch to dracut to add timeout=0 to the crypttab file used in the initramfs for hostonly if a timeout wasn't provided in the original crypttab, would this be likely to be accepted (or at least use a base for a similar patch)? To make myself clear, I've decided not to patch anaconda or blivet for this purpose until/unless someone convinces me that there is a good reason to do so. That discussion should occur in bug 949702. Wiping RejectedBlocker as Smooge re-proposed this. Discussed at 2013-05-13 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-05-13/f19beta-blocker-review-5.2013-05-13-16.00.log.txt . We're still agreed that this doesn't violate Beta criteria and does not need to block the Beta release. Rejected as a blocker. I think this may also be closed. I don't see it anymore with my laptop due to fixes in systemd. So I noticed this showed up in the dracut harald submitted to fix a bug with decryption keyboard layout on upgrade this morning: - fixed failing the boot while waiting for password input http://koji.fedoraproject.org/koji/buildinfo?buildID=483057 Harald, what's the story there? Does it just fix this harder, or what? :) Is it talking about some other 'password'? For the record, this had regressed in Fedora 20. I'd been meaning to dig up this BZ and reopen it, now it looks like I may not need to. (In reply to Adam Williamson from comment #57) > So I noticed this showed up in the dracut harald submitted to fix a bug with > decryption keyboard layout on upgrade this morning: > > - fixed failing the boot while waiting for password input > > http://koji.fedoraproject.org/koji/buildinfo?buildID=483057 > > Harald, what's the story there? Does it just fix this harder, or what? :) Is > it talking about some other 'password'? In F20, I let systemd have total control over the password query for encrypted devices. That meant, that dracut had to wait somehow for it. That wait procedure was broken until now and should be fixed with the latest release. |