Bug 1462378
Summary: | systemd-234-3 boots to emergency mode on encrypted lvm system | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Kevin Fenzi <kevin> | ||||||||||
Component: | systemd | Assignee: | systemd-maint | ||||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | rawhide | CC: | harald, johannbg, jpokorny, lnykryn, msekleta, muadda, ssahani, s, systemd-maint, valdis.kletnieks, zbyszek | ||||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | systemd-234-5.fc27 | Doc Type: | If docs needed, set a value | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2017-07-31 20:20:03 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Kevin Fenzi
2017-06-16 23:21:59 UTC
Created attachment 1288483 [details]
journalctl from bad boot
Sorry about that, here's the proper full attachment.
I've untagged this from f27 for now so we can sort this before it lands in rawhide. Jun 16 10:56:43 sheelba.scrye.com systemd[1]: Unnecessary job for dev-mapper-luks\x2d2a0e4949\x2d94d4\x2d45b2\x2d8427\x2dbc6937bc29fb.device was removed. This suggests that this isn't build-system related, i.e. not meson builds's fault, but just a plain old regression. Possibly related: https://github.com/systemd/systemd/issues/1620. I just went through the issue list upstream, and I don't see anything relevant. Possibly https://github.com/systemd/systemd/pull/5164, and a bunch of commits for JobTimeout= setting, but nothing which would jump out. In particular v233-509-g2d79a0bbb9 was included in that build, so it must be a different issue. Hm... this does look a bit look the issue fixed by v233-509-g2d79a0bbb9. I noticed that in initramfs you have an older systemd. Maybe there's some weird interaction when the state is passed during root switch. I'm using current git on a bunch of machines with lvm (encrypted and not), and I don't see this, so it seems to be something specific to your configuration. I'll build a scratch build based on the latest git (we're getting ready to release v234, and there are some bugs left, but they are not related to this.) Can you: 1. try to boot using this scratch build, and please make sure that the initramfs is rebuilt 2. paste your /etc/fstab ? ok. Still fails with that scratch build and initramfs rebuilt. ;( here's /etc/fstab: # # /etc/fstab # Created by anaconda on Sun Nov 1 16:41:05 2015 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # /dev/mapper/fedora-root / ext4 defaults,discard,x-systemd.device-timeout=0 1 1 UUID=91f05f18-b037-4e85-93dd-8e62284afe6d /boot ext4 defaults 1 2 UUID=6077-066B /boot/efi vfat umask=0077,shortname=winnt 0 2 /dev/mapper/fedora-home /home ext4 defaults,discard,x-systemd.device-timeout=0 1 2 /dev/mapper/fedora-swap swap swap defaults,discard,x-systemd.device-timeout=0 0 0 Will attach journal output. Created attachment 1293652 [details]
journalctl from 20170702 bad boot
Thanks for the data. This turns out to be quite simple. I don't know why I didn't see that immediately. → https://github.com/systemd/systemd/pull/6264 Ah, so it was seeing the 'x-systemd.device-timeout=0' and immediately timing out because it was encrypted and it hadn't gotten the passphrase yet? Anyhow, thanks for tracking it down. :) Experienced this issue with the mentioned systemd-234-1.fc27. Thanks to the previous comments, I tried changing "x-systemd.device-timeout=0" for the affected /etc/fstab entries: - "0s" -> did not help (I believe I got to the situation of the original poster with this) - "30s" -> did help Also, please take the fact that /etc/fstab may be one-off generated and then carried along across distro versions etc. into account. (I have never touched that file with the current installation before, it states it was genered by anaconda on 2017-01-05). I suspect there's still something fishy, perhaps the fix was incomplete? Never hit this issue before (prior to update, was using 233-6.fc27). Reopening seems justified. Should be fixed now properly. (We got the patch merged upstream two days ago, but I couldn't build systemd in koji because of s390x issues.) Thanks, Zbyszek, confirming -3 works well with x-systemd.timeout=0 (prompt response as usual, you rock). Note that -3 died gloriously on my laptop because it didn't have the x-systemd.timeout=0 in /etc/fstab. A full fix will have to deal with legacy /etc/fstab entries... Correction... it's something else, I need to go digging more. Valdis, by any chance, aren't you talking about the situation the boot process gets stuck in the plymouth screen after the (presumably correct) LUKS password has been entered and the only state modifiers are: - escape -> flips the screen into line mode, but there's no progress - ctrl-alt-del -> reboots - power button -> graceful shutdown Reboot helps, though. I am occasionally seeing this with recent rawhide systemd/kernel; last time I entered the LUKS password after some time after the prompt if that could increase the chance to reproduce. I am not sure how to debug that issue, journal is not aware of the affected boot sessions at all. re [comment 17]: Just reproduced that again, it seems it can be triggered reliably. Line mode from the boot says: [ OK ] Started Cryptography Setup for luks-... [ OK ] Reached target Encrypted Volumes. [ OK ] Reached target System Initialization. [ OK ] Reached target Basic System. [ OK ] Reached target Initrd Default Target. [ OK ] Started dracut initque hook. Starting File System Check on /dev/mapper/$HOSTNAME-root... [ OK ] Reached target Remote File Systems (pre). [ OK ] Reached target Remote File Systems . [ OK ] Started File System Check on /dev/mapper/$HOSTNAME-root. Mounting /sysroot... [ OK ] Mounted /sysroot. And then, nothing else happens. re [comment 18]: Upon ctrl-alt-del, first new line reads ~ "Stopped dracut cmdline hook". Does it ring any bells? Harald, any idea? See comments #17-#19. re [comment 17] and on: The reproducibility seems reliable given that LUKS password is not entered immediately, but with some delay (say 1 minute+, which on the other hand can be quite frequent if you are used to turn the machine on and jump on some other real life stuff before grub etc. proceeds). Actually, the very first line upon manually triggered reboot from that waiting-for-Godot state reads something like: > Stopping Password forward requests for Plymouth So it seems like integration issue of some kind, now casting some question marks on plymouth, but it's hard to tell for me. systemd-234-3.fc27.x86_64 kernel-4.13.0-0.rc1.git4.1.fc27.x86_64 plymouth-0.9.3-0.7.20160620git0e65b86c.fc26.x86_64 Created attachment 1305656 [details]
Screenshot from bad boot.
Created attachment 1305657 [details]
And screenshot from good boot...
Looking at the output of the good boot, it does a whole bunch of other stuff before it ever gets to "Started Show Plymouth Boot Screen". Maybe the systemd unit is firing *way* too early due to a busted After= definition? I'm pretty sure this is a different issue than the original one, but let's continue to debug this here. Jan, Valdis, can you paste your fstab, crypttab, and /proc/cmdline? /proc/cmdline: BOOT_IMAGE=/vmlinuz-4.13.0-rc2-next-20170725 root=/dev/mapper/turing--police-root ro rd.md=0 rd.dm=0 rd.lvm.lv=turing-police/00 console.keymap= vconsole.font=latarcyrheb-sun16 rd.luks.uuid=luks-665bb147-9e39-4003-b3ae-7be925f51a97 rd.lvm.lv=turing-police/swap rd.lvm.lv=turing-police/root quiet LANG=en_US.UTF-8 nouveau.modeset=0 rd.driver.blacklist=nouveau video=vesa:off /etc/crypttab: # luks-8b10a1ba-0fe3-4f12-89ba-ec2119910adb UUID=8b10a1ba-0fe3-4f12-89ba-ec2119910adb none Relevant part of /etc/fstab (omitting some cifs and nfs shares that don't automount): /dev/mapper/turing--police-root / ext4 defaults,x-systemd.device-timeout=0 1 1 /dev/sda2 /boot ext4 defaults 1 2 /dev/sda1 /boot/efi vfat umask=0077,shortname=winnt 0 0 /dev/mapper/turing--police-home /home ext4 defaults,x-systemd.device-timeout=0 1 2 /dev/mapper/turing--police-music /music ext4 defaults,x-systemd.device-timeout=0 1 2 /dev/mapper/turing--police-catalogs /catalogs ext4 defaults,x-systemd.device-timeout=0 1 2 /dev/mapper/turing--police-vm /vm ext4 defaults,x-systemd.device-timeout=0 1 2 /dev/mapper/turing--police-00 /usr ext4 defaults,x-systemd.device-timeout=0 1 2 /dev/mapper/turing--police-var /var ext4 defaults,x-systemd.device-timeout=0 1 2 /dev/mapper/turing--police-local /usr/local ext4 defaults,x-systemd.device-timeout=0 1 2 /dev/mapper/turing--police-01 /usr/share ext4 defaults,x-systemd.device-timeout=0 1 2 /dev/mapper/turing--police-src /usr/src ext4 defaults,x-systemd.device-timeout=0 1 2 /dev/mapper/turing--police-opt /opt ext4 defaults,x-systemd.device-timeout=0 1 2 /dev/mapper/turing--police-swap swap swap defaults,x-systemd.device-timeout=0 0 0 I can confirm the plymouth issue described in comment #c17: if I let the password prompt expire, and systemd tries to go to emergency mode, but cannot because the root account is locked, and shows the password prompt again, keystrokes have no effect. If I press ESC, a grey plymouth screen with some dots appears, and things become dysfunctional. If I disable plymouth with 'plymouth.enable=0', and I go through the same steps, it boots properly. (Re comment #c26: I'll try to set up a machine with LVM like that. Nowadays I default to btrfs, and this might be relevant. For anyone lookin into this, https://github.com/systemd/systemd/issues/6381 has a bunch more reproducers.) Should be fixed now ;) |