Red Hat Bugzilla – Full Text Bug Listing
|Summary:||Please add x-systemd.device-timeout=0 to mount options for encrypted file systems that need a passphrase from the user|
|Product:||[Fedora] Fedora||Reporter:||Lennart Poettering <lpoetter>|
|Status:||CLOSED CURRENTRELEASE||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Version:||18||CC:||awilliam, eblake, fabrice, frankly3d, g.kaviyarasu, harald, jaroslav.pulchart, johannbg, john, jonathan, kparal, kraymond, lnykryn, mattdm, metherid, msekleta, nikodll, nonamedotc, notting, plautrba, reklov, rstrode, sct, sparks, systemd-maint, theinric, vanmeeuwen+fedora, vpavlin, vpodzime|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2013-04-10 16:55:01 EDT||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Lennart Poettering 2012-09-27 11:04:43 EDT
In systemd at boot time we watch all devices listed in /etc/fstab and mount them as they appear. At early boot we wait for all file systems listed therein to show up before we continue with the boot. In order to deal nicely (or, as nice as reasonably possible) with file systems that never show up because a harddisk is bad, or some cable is unplugged, we apply a timeout to this logic and boot into emergency mode if the time limit is reached. We think this is the most appropriate behaviour for most hardware issues like this. This turns out to be problematic however for encrypted file systems: if a file system takes long to show up because it requires a user to enter a passphrase on the console and he isn't around then we will treat this as any other file system that doesn't show up: and boot into emergency mode after the timeout. This of course sucks, since the password prompt should not be timeouted like this. Instead, we should just wait indefinitely until the user shows up and enters his password. Now, the problem here is that systemd cannot know whether a file system doesn't show up because of a hw failure, or because of missing user passphrase input: before the block device is decrypted we cannot look into its label/uuid, since that label/uuid is encrypted too. So systemd cannot know if an fs with LABEL=foo is on some hw that so far hasn't shown up, or in some LUKS device that hasn't been decrypted so far. But to timeout the former, but not the latter we'd need to know precisely that. In other words: automatically detecting whether we need to timeout waiting for an fs or not is not possible with just the information currently included in fstab. Anaconda otoh does know more about the file systems and where they reside than fstab encodes: anaconda *does* know that a certain file system is on a crypto disk. I'd thus like anaconda to add a special mount option for this file system to fstab: "x-systemd.device-timeout=0". This will have the desired effect that systemd will wait forever for this specific file system, and not time out the user prompt. Could you please update Anaconda to add x-systemd.device-timeout=0 to the mount options of all file systems that reside on a crypto disk?
Comment 1 Vratislav Podzimek 2012-10-05 04:47:25 EDT
(In reply to comment #0) > In systemd at boot time we watch all devices listed in /etc/fstab and mount > them as they appear. At early boot we wait for all file systems listed > therein to show up before we continue with the boot. > > In order to deal nicely (or, as nice as reasonably possible) with file > systems that never show up because a harddisk is bad, or some cable is > unplugged, we apply a timeout to this logic and boot into emergency mode if > the time limit is reached. We think this is the most appropriate behaviour > for most hardware issues like this. > > This turns out to be problematic however for encrypted file systems: if a > file system takes long to show up because it requires a user to enter a > passphrase on the console and he isn't around then we will treat this as any > other file system that doesn't show up: and boot into emergency mode after > the timeout. > > This of course sucks, since the password prompt should not be timeouted like > this. Instead, we should just wait indefinitely until the user shows up and > enters his password. Now, the problem here is that systemd cannot know > whether a file system doesn't show up because of a hw failure, or because of > missing user passphrase input: before the block device is decrypted we > cannot look into its label/uuid, since that label/uuid is encrypted too. So > systemd cannot know if an fs with LABEL=foo is on some hw that so far hasn't > shown up, or in some LUKS device that hasn't been decrypted so far. But to > timeout the former, but not the latter we'd need to know precisely that. In > other words: automatically detecting whether we need to timeout waiting for > an fs or not is not possible with just the information currently included in > fstab. > > Anaconda otoh does know more about the file systems and where they reside > than fstab encodes: anaconda *does* know that a certain file system is on a > crypto disk. I'd thus like anaconda to add a special mount option for this > file system to fstab: "x-systemd.device-timeout=0". This will have the > desired effect that systemd will wait forever for this specific file system, > and not time out the user prompt. > > Could you please update Anaconda to add x-systemd.device-timeout=0 to the > mount options of all file systems that reside on a crypto disk? My experience is that in some cases (Plymouth turned off), the prompt doesn't show at all. Do we really want to hang in such state forever? Wouldn't some long enough timeout be better?
Comment 2 Fedora Update System 2012-10-08 20:22:07 EDT
anaconda-18.14-1.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/anaconda-18.14-1.fc18
Comment 3 Fedora Update System 2012-10-09 13:21:59 EDT
Package anaconda-18.14-1.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing anaconda-18.14-1.fc18' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-15707/anaconda-18.14-1.fc18 then log in and leave karma (feedback).
Comment 4 Lennart Poettering 2012-10-12 17:46:40 EDT
(In reply to comment #1) > > Anaconda otoh does know more about the file systems and where they reside > > than fstab encodes: anaconda *does* know that a certain file system is on a > > crypto disk. I'd thus like anaconda to add a special mount option for this > > file system to fstab: "x-systemd.device-timeout=0". This will have the > > desired effect that systemd will wait forever for this specific file system, > > and not time out the user prompt. > > > > Could you please update Anaconda to add x-systemd.device-timeout=0 to the > > mount options of all file systems that reside on a crypto disk? > My experience is that in some cases (Plymouth turned off), the prompt > > doesn't show at all. Do we really want to hang in such state forever? > Wouldn't some long enough timeout be better? Not sure, I mean we do have ways to handle that specific case, i.e. the user should be able to press C-A-Del, reboot, and then enter emergency mode via the kernel cmdline or so to figure out what is going wrong. I think we should go to emergency mode automatically only if we a really sure that things aren't alright, i.e. when some hw doesn't respond. But if there's doubt whether the user just went out for lunch we probably should default to just wait for good. That said, I don't have a strong opinion on this. The parameter takes time values such as "2h", in case you prefer this.
Comment 5 Michal Schmidt 2012-10-17 07:17:17 EDT
*** Bug 866957 has been marked as a duplicate of this bug. ***
Comment 6 Kamil Páral 2012-10-17 07:44:09 EDT
(In reply to comment #5) > *** Bug 866957 has been marked as a duplicate of this bug. *** Michal, please make sure you transfer Blocks: field from duplicates. I'm doing that now. Because I learned that this is something anaconda should set (and can therefore be quite difficult to be fixed by an update), I propose it as a blocker, not just NTH. Changing milestone to Final, because this doesn't seem to be serious enough for Beta. Petr, can you please test the fix?
Comment 7 Harald Hoyer 2012-10-17 08:23:46 EDT
Another incarnation of this bug is fixed by: https://admin.fedoraproject.org/updates/FEDORA-2012-16223/dracut-024-1.fc18
Comment 8 Kamil Páral 2012-10-18 04:29:42 EDT
Lennart, forget me my technical ignorance, but I don't understand one thing - plymouth knows it is waiting for password, but systemd doesn't and therefore these fstab hacks. Isn't that just a problem in communication between these two components? Can't plymouth (or the underlying framework) just signal to systemd somehow "now we are waiting for a password, please don't time out"?
Comment 9 Petr Schindler 2012-10-18 09:05:59 EDT
Created attachment 629364 [details] screenshot of console after boot fail I updated dracut and tried it. It still fails to boot because of time out. It seems to wait longer, but then it still falls into dracut console.
Comment 10 Harald Hoyer 2012-10-18 09:08:08 EDT
(In reply to comment #9) > Created attachment 629364 [details] > screenshot of console after boot fail > > I updated dracut and tried it. It still fails to boot because of time out. > It seems to wait longer, but then it still falls into dracut console. What does the journal say? # journalctl -a
Comment 11 Petr Schindler 2012-10-18 10:31:23 EDT
Created attachment 629475 [details] file with journal logs Here you have the whole journal log.
Comment 12 Petr Schindler 2012-10-18 10:35:40 EDT
Created attachment 629480 [details] the same log, but in text :)
Comment 13 Harald Hoyer 2012-10-18 11:05:28 EDT
(In reply to comment #12) > Created attachment 629480 [details] > the same log, but in text :) Oct 18 16:22:31 localhost systemd: /usr/lib/systemd/system-generators/systemd-cryptsetup-generator exited with exit status 1. can you please attach /etc/crypttab
Comment 14 Harald Hoyer 2012-10-18 11:05:56 EDT
(In reply to comment #13) > (In reply to comment #12) > > Created attachment 629480 [details] > > the same log, but in text :) > > Oct 18 16:22:31 localhost systemd: > /usr/lib/systemd/system-generators/systemd-cryptsetup-generator exited with > exit status 1. > > can you please attach /etc/crypttab I mean from within the dracut rescue shell
Comment 16 Harald Hoyer 2012-10-19 04:50:42 EDT
Oct 18 16:22:31 localhost systemd: Expecting device dev-mapper-luks\x2d868e79ad\x2dc627\x2d42b5\x2d96a2\x2d89f6c9eb5cfd.device... Oct 18 16:22:31 localhost systemd: Expecting device dev-mapper-luks\x2d69186cac\x2d4ac6\x2d4b88\x2d9ff7\x2d7c4509384d69.device... Oct 18 16:22:32 localhost systemd: Started Show Plymouth Boot Screen. Oct 18 16:22:32 localhost systemd: Started Dispatch Password Requests to Console Directory Watch. Oct 18 16:22:32 localhost systemd: Starting Forward Password Requests to Plymouth Directory Watch. Oct 18 16:22:32 localhost systemd: Started Forward Password Requests to Plymouth Directory Watch. Oct 18 16:22:32 localhost systemd: Starting Forward Password Requests to Plymouth... Oct 18 16:22:32 localhost systemd: Started Forward Password Requests to Plymouth. Oct 18 16:22:32 localhost kernel: tsc: Refined TSC clocksource calibration: 2691.283 MHz Oct 18 16:24:01 localhost systemd: Job dev-mapper-luks\x2d69186cac\x2d4ac6\x2d4b88\x2d9ff7\x2d7c4509384d69.device/start timed out. Hmm, is this a plymouth problem? Not showing the password query?
Comment 17 Petr Schindler 2012-10-19 06:54:46 EDT
I see password query in plymouth (graphical password input).
Comment 18 Volker Sobek 2012-10-24 15:51:25 EDT
I did an install from the Fedora 18 Beta TC6 DVD to a kvm, using the default settings, just cecking the 'encrypt my data' option. Now, when booting, plymouth shows up correctly, and the plymouth password prompt also appears correctly. However, if i let it time out, I see the same issue as Petr described. # cat /etc/crypttab luks-60f6e71a-ea23-4513-b632-c22ef2f8a2bc UUID=60f6e71a-ea23-4513-b632-c22ef2f8a2bc none luks-678fdd31-bb1c-4551-90b9-6b1c1dfc7c22 UUID=678fdd31-bb1c-4551-90b9-6b1c1dfc7c22 none #cat /etc/fstab # # /etc/fstab # Created by anaconda on Wed Oct 24 13:56:55 2012 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # /dev/mapper/luks-60f6e71a-ea23-4513-b632-c22ef2f8a2bc / ext4 defaults,x-systemd.device-timeout=0 1 1 UUID=f67fcd26-9a90-4ee4-a215-34e2c3081fcb /boot ext4 defaults 1 2 /dev/mapper/luks-678fdd31-bb1c-4551-90b9-6b1c1dfc7c22 swap swap defaults,x-systemd.device-timeout=0 0 0
Comment 19 Volker Sobek 2012-10-24 15:52:32 EDT
Created attachment 633012 [details] output of 'journalctl --all' for PROVIDING the correct password, followed by a successful boot
Comment 20 Volker Sobek 2012-10-24 15:59:59 EDT
Created attachment 633014 [details] Error message after not providing the password in time
Comment 21 Volker Sobek 2012-10-24 17:23:58 EDT
Created attachment 633048 [details] journalctl --all for an up-to-date f18, after a failed boot due to LUKS password timeout I updated the new TC6 installation and rebooted, it still doesn't work, the timeout is longer, but then it fails again. Attaching to full journalctl for the failed boot after the password prompt timed out.
Comment 22 Matthew Miller 2012-10-29 12:01:43 EDT
*** Bug 871082 has been marked as a duplicate of this bug. ***
Comment 23 Petr Schindler 2012-11-06 06:48:40 EST
No change with Beta TC7 with systemd-195-2.fc18.x86_64
Comment 24 David Lehman 2012-11-15 09:49:24 EST
The remaining issues have nothing to do with anaconda. Reassigning to systemd.
Comment 25 Kamil Páral 2012-11-28 11:41:48 EST
+1 to Final blocker. Placing users into dracut shell because they haven't managed to provide their hdd password in time is just unacceptable.
Comment 26 Adam Williamson 2012-11-28 12:59:12 EST
Discussed at 2012-11-28 blocker review meeting: http://meetbot.fedoraproject.org/fedora-qa/2012-11-28/f18final-blocker-review-1.2012-11-28-16.59.log.txt . This was rejected as a blocker. It's an annoyance, sure, but it is not a showstopper - if you hit it you just reboot and try again - and it can be fixed with an update, we don't see any way to hit this on a live boot or during installation.
Comment 27 Eric Blake 2012-11-28 16:09:45 EST
How will adding a new mount option work in the case of upgrading an existing F17 installation into F18? I've been bit by the drop to an emergency shell when I'm not fast enough entering my password, but it was on a machine that was incrementally upgraded, rather than installing F18 from scratch. Do we need to clone this bug to fedup to ensure that an incremental upgrade modifies /etc/fstab to add the option where appropriate as part of the upgrade?
Comment 28 Adam Williamson 2012-11-29 03:02:31 EST
not really, because fedup itself isn't supposed to do that. The design is that fedup really just handles downloading packages and running them. It provides a mechanism by which packages can ship scripts to be run on upgrade: if you want a package to do something on upgrade, that package should ship a script to make it happen. Ask Will for details - I think there are just dracut hooks named upgrade-pre , upgrade and upgrade-post , and you hook into those (per https://ohjeezlinux.wordpress.com/2012/11/13/fedup-a-little-background/ ), but IMBW on the details. Of course, the other option is just not to do it. Historically we haven't guaranteed to make changes like this on upgrade.
Comment 29 Adam Williamson 2012-11-29 03:03:24 EST
eric: oh, and I don't think this bug is actually about adding mount options any more, anyway...see comment #24.
Comment 30 Kamil Páral 2012-11-29 05:26:27 EST
Adam, you are right that the current problem can probably be fixed with an update. Anaconda tasks are done and only systemd bug remains. I still think it fails the criteria and it's too damn important, but I've been outvoted. But Eric is right about the upgrade issue. If we don't have the upgrade hook, we won't have the mount option in place, and that can't be fixed with an update (according to description). I reported bug 881670 about it.
Comment 31 Kamil Páral 2012-12-04 10:00:36 EST
I have just realized something very substantial. How can systemd know it should wait for the root filesystem? It can't see /etc/fstab beforehand. The usual encrypted layout in anaconda is one big encrypted partition with VG and several LVs on it. Whatever anaconda sets in /etc/fstab doesn't matter, because you can't access the information before you decrypt and start the VG. I think that's the reason why people report this is still broken. I see only two possible ways here: 1. Put this information also to grub.cfg. Something like systemd.wait=uuid=UUID. 2. Communicate with plymouth. If plymouth knows it's encrypted, systemd should know as well. No manual tinkering with fstab and grub.cfg needed, everything automatic. It seems to me that this issue was not well thought ahead and it was a mistake to commit these changes before the infrastructure (fstab, grub, whatnot) is prepared for it. Systemd guys, can you please tell us whether it is likely that this can be fixed with an update (systemd, dracut, plymouth)? If not, I'll re-propose this as an F18 Final blocker. I don't see any reason to break systemd boot for people when it's not absolutely necessary (and please acknowledge that it's not necessary in this case at all, detecting faulty hardware is a noble goal, but it does not warrant as harsh consequences as currently caused). That brings me a second question, are you willing to revert this functionality in systemd if we are not ready to fix it _fully_ (clean install and upgrade; encrypted home and/or root) in time for Fedora 18 release?
Comment 32 Adam Williamson 2012-12-05 03:58:32 EST
erm, I'm not sure whether I'm misunderstanding something here, or Kamil is. Isn't the root partition specified on the cmdline nowadays, since it has to get mounted by dracut, not in fstab at all? afaics this bug isn't about / at all, it's about *other* encrypted system partitions. e.g. /usr if you have it separated out. those get mounted by systemd. / does not, which is why we have a separate dracut bug for the / case. my /proc/cmdline says "root=/dev/mapper/vg_adam-lv_root" .
Comment 33 Kamil Páral 2012-12-05 05:12:32 EST
I assumed this bug report was related to all partitions mounted at boot, including root partition. Now that you have mentioned dracut, Adam, I have searched and found bug 868421 (which is not linked here, adding to See Also). It seems to me that the dracut bug is just a duplicate of this bug, but I can't really judge here, I'm lacking a lot of information about this issue. My current theory is: 1. systemd pushed some changes, causing all these issues around 2. systemd asked anaconda to incorporate fstab adjustments to fix clean installs 3. systemd forgot to ask fedup to incorporate fstab adjustments to fix upgraded installs; I did that in bug 881670 4. systemd forgot to ask dracut to ignore the timeout when unlocking the root partition; Stephen reported the problem in bug 868421 (Ideally, those changes should have be done in the reverse order.) I might be completely wrong, systemd guys, please correct me if I'm talking nonsense, thank you.
Comment 34 Kamil Páral 2012-12-05 10:55:07 EST
I have talked to Harald, see bug 868421 comment 11 to 13. Adam, you seem to be right, it seems dracut should handle unmounting the root partition (no timeout) and systemd should handle the rest (no timeout for encrypted devices). It's still bit unclear the role of fstab vs crypttab in this. With my new knowledge, I *think* everything in this bug since comment 9 is esentially just a duplicate of bug 868421. That would mean this bug can be closed, because anaconda fstab-related changes are incorporated and all the bug reports were just symptoms of bug 868421.
Comment 35 Frank Murphy 2013-01-10 08:30:53 EST
> Could you please update Anaconda to add x-systemd.device-timeout=0 to the > mount options of all file systems that reside on a crypto disk? On an already installed system, does that go on the kernel line?
Comment 36 Fabrice Bellet 2013-02-15 13:45:21 EST
I noticed that the timeout is caused by two reasons on a fresh F18 install where filesystems are encrypted (from the information in /run/systemd/generator): 1- the dev-mapper-luksXXX.device unit has a default timeout of 90s, because it's a device unit. It is bound to the systemd-cryptsetup@luksXXX.service unit. So even if the service has a timeout of 0, it will fail because the device unit will fail first. 2- the systemd-cryptsetup@luksXXX.service should not time out itself. The TimeoutSec=0 is maybe not needed, because the service is of type oneshot, and in this case the timeout is disabled by default. But the invocation of /usr/lib/systemd/systemd-cryptsetup will time out at 90s by default. solution to 1) could be in the generator to add a supplementary dev-mapper-luksXXX.device unit file containing "JobTimeoutSec=0". solution to 2) could be to add a "timeout=0" option to systemd-cryptsetup in the .service unit file.
Comment 37 Fabrice Bellet 2013-02-15 13:46:46 EST
Created attachment 697944 [details] Add a JobTimeoutSec=0 for the dev-mapper device unit file in the cryptsetup generator
Comment 38 Fabrice Bellet 2013-02-15 13:47:43 EST
Created attachment 697945 [details] Pass timeout=0 to systemd-cryptsetup in the cryptsetup-generator
Comment 39 Kamil Páral 2013-02-18 05:56:39 EST
This should have been probably placed in bug 868421 (all of this is confusing, I know). I added bug 868421 comment 31 and asked Harald to look at this. Thanks Fabrice.
Comment 40 Harald Hoyer 2013-03-01 09:16:56 EST
http://lists.freedesktop.org/archives/systemd-devel/2013-March/009254.html http://lists.freedesktop.org/archives/systemd-devel/2013-March/009255.html sent my proposed patches to upstream mailing list
Comment 41 Stephen Tweedie 2013-03-01 09:36:31 EST
I'd love to test --- is there a scratch build with these patches already?
Comment 42 Lennart Poettering 2013-04-09 15:50:20 EDT
Hey, so let me get this right. Anaconda now writes out "x-systemd.device-timeout=0" for all new installation into fstab, is that correct? So the remaining issue this bug is about is what to do about upgraded systems? (And for those who asked why we cannot make systemd and plymouth "communicate": that of course is somethign we could do -- in theory. However this is much nastier as it sounds: we wouldn't know which of the many timeouts of systemd to 'pause', i.e. which of the timeouts systemd manages are actually for resources that wait for user input (and this is unfixable -- since we don't know before decrypting a block device which block device it will be). So we only have the option then to pause *all* timeouts systemd manages. However, that can be quite a few and many of those are really not that obvious For example, we'd have to tell the kernel automounts about it and other stuff. Now, if anaconda tells us via the mount option which mount it is precisely that waits for user input things are much easier, since this allows us to only disable the timeout in question and the stuff depending on it and be done with it.)
Comment 43 Frank Murphy 2013-04-10 04:51:39 EDT
I got a needinfo for this?, which just says needinfo.
Comment 44 Lennart Poettering 2013-04-10 08:58:01 EDT
Still looking for a reply on my comment #42...
Comment 45 Kamil Páral 2013-04-10 11:40:07 EDT
(In reply to comment #42) > So the remaining issue this bug is about is what to do about upgraded > systems? I reported bug 881670 about it - you (or someone) can provide an upgrade hook for fedup that will adjust fstab accordingly, similarly to what anaconda does. Some of the comments might be confusing, because there are several issues interwoven and there was no clear reply coming from systemd/dracut guys, so there was confusion all around. I'm glad you're beginning to untangle this mess.
Comment 46 Lennart Poettering 2013-04-10 16:55:01 EDT
Well, regarding the fedup issue, let's follow-up in 881670 then. If this is all that's missing then I guess I can close this bug. For evreything else please open independent new bugs, if there's still something else left to fix. Thanks.