Bug 919752
Summary: | boot process hangs until timeout | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Bruno Wolff III <bruno> | ||||||||||||||||||||||||
Component: | dracut | Assignee: | Harald Hoyer <harald> | ||||||||||||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||||||||||||||||
Priority: | unspecified | ||||||||||||||||||||||||||
Version: | 19 | CC: | bruno, bugzilla, dracut-maint, harald, jonathan, kevin, robatino, selinux | ||||||||||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||||||
Last Closed: | 2013-04-20 19:20:26 UTC | Type: | Bug | ||||||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||||||
Attachments: |
|
kernel-3.9.0-0.rc1.git1.2.fc19.x86_64 from nodebug repo boots for me. [root@tlondon ~]# uname -a; rpm -q dracut; Linux tlondon.localhost.org 3.9.0-0.rc1.git1.2.fc19.x86_64 #1 SMP Fri Mar 8 16:15:51 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux dracut-026-1.fc19.x86_64 [root@tlondon ~]# Created attachment 707686 [details]
journalctl -a output
I forgot that the dracut debugging wiki page was out of date. I reran the boot and saved journalctl -a output which has the important info. I also updated the wiki page to suggest journalctl -a instead of /init.log.
Created attachment 707687 [details]
fstab
Since not everyone is seeing this, I included my fstab info.
Created attachment 707688 [details]
mdadm.conf
Software raid setup.
Created attachment 707689 [details]
Actual raid setup
One odd thing is that I split md13 a while back in order to try to track down the cause of a bug introduced in F13 that hasn't been fixed yet. So there is a mismatch between mdadm.conf and the actual arrays right now. I don't know if that has any bearing, but it is unusual.
Mar 10 02:24:51 localhost kernel: Kernel command line: ro root=/dev/mapper/luks-9a976b86-8aaa-40d9-8039-89d710eac5c9 SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us radeon.agpmode=-1 noreplace-smp slub_debug=- rdshell rdinitdebug You are either missing "rd.auto=1" or "rd.luks.uuid=9a976b86-8aaa-40d9-8039-89d710eac5c9" on the kernel command line. Can you try either option? oh, and what is the output of: # lsinitrd /boot/initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img # lsinitrd /boot/initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img etc/crypttab # lsinitrd /boot/initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img etc/mdadm.conf # lsinitrd /boot/initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img 'etc/cmdline.d/*' And the output of: # grep hostonly /etc/dracut.conf /etc/dracut.conf.d/* /usr/lib/dracut/dracut.conf.d/* # grep hostonly /etc/dracut.conf /etc/dracut.conf.d/* /usr/lib/dracut/dracut.conf.d/* /etc/dracut.conf:#hostonly="yes" grep: /etc/dracut.conf.d/*: No such file or directory lsinitrd doesn't seem to work properly: # lsinitrd /boot/initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img No <initramfs file> specified and the default image '/boot/initramfs-3.9.0-0.rc1.git0.5.fc19.i686.PAE.img}' cannot be accessed! Usage: lsinitrd [-s] [<initramfs file> [<filename>]] -h, --help print a help message and exit. -s, --size sort the contents of the initramfs by size. [root@bruno bruno]# ls /boot/initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img /boot/initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img Adding rd.auto=1 to the command line got the machine booting. The suggested raid device names weren't used which is causing a problem with mounting /boot again after the root pivot, since it is referenced by device name in /etc/fstab. I'll change that to use uuid or label either of which is a more modern approach than what I am currently doing. My kernel boot parameters were initially set several releases ago (with tweaks to avoid some hardware driver issues) and probably at that time rd.auto=1 wasn't being added by default. It might be worth noting that things don't work without this on the yum upgrade page for upgrading to f19. My mdadm.conf file does specify the raid devices and which devices they are supposed to end up on. Is rd.auto=1 supposed to bypass that? I found a page describing raid options for dracut and there was an option to not use the mdadm.conf file, but I didn't see any to tell it to use it. systemd upstream fixes: http://cgit.freedesktop.org/systemd/systemd/commit/?id=4d4fe926cb2ad1e624ddaea26f143fa93d867296 http://cgit.freedesktop.org/systemd/systemd/commit/?id=6bde0b3220e95a08cefb70846f73b2cf24b7734a (In reply to comment #13) > systemd upstream fixes: > > http://cgit.freedesktop.org/systemd/systemd/commit/ > ?id=4d4fe926cb2ad1e624ddaea26f143fa93d867296 > http://cgit.freedesktop.org/systemd/systemd/commit/ > ?id=6bde0b3220e95a08cefb70846f73b2cf24b7734a oops.. wrong bug :) nevertheless, I fixed systemd and dracut. dracut-026-19.git20130313 and systemd-198-4 should fix your issues and booting without "rd.auto" or "rd.luks.uuid" should be possible now. I tried the updates out, but I am now not able to successfully enter my luks password. When typing at the luks prompt I sometimes get asterisks, sometimes the character I typed and somtimes space greater than. Guessing, I'd say two processes are fighting over the input. I noticed that the new initramfs is smaller than I was getting before: -rw-------. 1 root root 34898514 Mar 10 11:11 initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img -rw-------. 1 root root 34898868 Mar 11 00:36 initramfs-3.9.0-0.rc1.git2.2.fc19.i686.PAE.img -rw-------. 1 root root 9119685 Mar 13 07:51 initramfs-3.9.0-0.rc2.git0.3.fc19.i686.PAE.img When I ran the kernel script to rebuild the initramfs I got the following error: dracut-install: ERROR: failed to install '/var/tmp/initramfs.G03kNJ/etc/modules- load.d/apmd.conf'' for '/var/tmp/initramfs.G03kNJ/etc/modules-load.d/apmd.conf' This was with dracut-026-19.git20130313.fc19.i686 and systemd-198-4.fc19.i686. (In reply to comment #15) > I tried the updates out, but I am now not able to successfully enter my luks > password. When typing at the luks prompt I sometimes get asterisks, > sometimes the character I typed and somtimes space greater than. Guessing, > I'd say two processes are fighting over the input. sounds like a plymouth problem, which I might have fixed already... will debug use "rd.plymouth=0 plymouth.enable=0" for now > > I noticed that the new initramfs is smaller than I was getting before: > -rw-------. 1 root root 34898514 Mar 10 11:11 > initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img > -rw-------. 1 root root 34898868 Mar 11 00:36 > initramfs-3.9.0-0.rc1.git2.2.fc19.i686.PAE.img > -rw-------. 1 root root 9119685 Mar 13 07:51 > initramfs-3.9.0-0.rc2.git0.3.fc19.i686.PAE.img Yes, this is https://fedoraproject.org/wiki/Features/DracutHostOnly > > When I ran the kernel script to rebuild the initramfs I got the following > error: > dracut-install: ERROR: failed to install > '/var/tmp/initramfs.G03kNJ/etc/modules- > load.d/apmd.conf'' for > '/var/tmp/initramfs.G03kNJ/etc/modules-load.d/apmd.conf' > > This was with dracut-026-19.git20130313.fc19.i686 and > systemd-198-4.fc19.i686. Can you attach the debug.txt of: # dracut -H test.img --debug &> debug.txt OK. I'll try again tonight. (I can't do reboots remotely and need to wait until I get home from work.) oh, and please try dracut-026-33.git20130313.fc19 (currently building) Created attachment 709797 [details]
dracut --debug for dracut-026-19.git20130313.fc19.i686
Sorry I didn't get this done a bit sooner. I didn't need to do a reboot for this test.
Created attachment 709809 [details]
dracut --debug for dracut-026-33.git20130313.fc19.i686
I tested rebuilding the initramfs with dracut-026-33.git20130313.fc19.i686 and am still seeing the error message referring to apmd.conf.
I'll try rebooting in a few hours.
Using the plymouth options didn't work. I got a dracut shell prompt at about the point where I would normally be asked for the luks password. (In reply to comment #20) > Created attachment 709809 [details] > dracut --debug for dracut-026-33.git20130313.fc19.i686 > > I tested rebuilding the initramfs with dracut-026-33.git20130313.fc19.i686 > and am still seeing the error message referring to apmd.conf. /etc/modules-load.d/apmd.conf seems to have the executable bit set and dracut is running "ldd" on it :-) Will fix "dracut-install" to ignore this specific ldd error message. # koji download-build --arch=$(arch) systemd-198-6.fc19 # koji download-build --arch=$(arch) dracut-026-48.git20130315.fc19 # rpm -Fvh systemd*.rpm dracut*.rpm # dracut -f --kver=<kernel_version> Created attachment 710785 [details]
journalctl -a output
That seems to be a lot better. The raid arrays were named as expected and all file systems ended up mounted.
I am still booting with the rd.plymouth=0 plymouth.enable=0 .
There were a couple of oddities. There was a timeout for one of the devices (perhaps the array used for swap) early in the boot, though it did get set up later.
Also I got asked for the password for each device (they all use the same pass phrase) when normally I wouldn't. I suspect this was just to using the boot parameters listed above, but figured I should mention it just in case.
(In reply to comment #24) > Created attachment 710785 [details] > journalctl -a output > > That seems to be a lot better. The raid arrays were named as expected and > all file systems ended up mounted. > > I am still booting with the rd.plymouth=0 plymouth.enable=0 . > > There were a couple of oddities. There was a timeout for one of the devices > (perhaps the array used for swap) early in the boot, though it did get set > up later. Hmm, strange. What is the layout of the /dev/disk/by-uuid/451045dc-5ca4-49a7-9e2d-95a78e6fabca luks-451045dc-5ca4-49a7-9e2d-95a78e6fabca device? > > Also I got asked for the password for each device (they all use the same > pass phrase) when normally I wouldn't. I suspect this was just to using the > boot parameters listed above, but figured I should mention it just in case. yeah, that was because "rd.plymouth=0 plymouth.enable=0" turns off plymouth. /dev/md14: UUID="451045dc-5ca4-49a7-9e2d-95a78e6fabca" TYPE="crypto_LUKS" /dev/mapper/luks-451045dc-5ca4-49a7-9e2d-95a78e6fabca: LABEL="home" UUID="cf157d80-cd85-469a-992c-20f3d617b061" TYPE="ext4" That device is my /home which is ext4 on top of luks on top of software raid 1. (In reply to comment #26) > /dev/md14: UUID="451045dc-5ca4-49a7-9e2d-95a78e6fabca" TYPE="crypto_LUKS" > /dev/mapper/luks-451045dc-5ca4-49a7-9e2d-95a78e6fabca: LABEL="home" > UUID="cf157d80-cd85-469a-992c-20f3d617b061" TYPE="ext4" > > That device is my /home which is ext4 on top of luks on top of software raid > 1. Ah, that is not needed for mounting the root filesystem and booting into it. Please add "rd.luks.uuid=9a976b86-8aaa-40d9-8039-89d710eac5c9" to the kernel command line, which should fix the timeout. Anaconda would have done this for you. This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle. Changing version to '19'. (As we did not run this process for some time, it could affect also pre-Fedora 19 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19 That didn't help. I also tried rd.luks.uuid=451045dc-5ca4-49a7-9e2d-95a78e6fabca as that was the device not getting mounted. It also looks like /dev/md14 isn't getting started and if not, maybe that is why 451045dc-5ca4-49a7-9e2d-95a78e6fabca isn't getting found? Things have gotten worse today. The timeout is much longer. Perhaps this is related to fixing timeouts for encrypted devices. Unfortunately this timeout isn't waiting on a passphrase and isn't ever going to happen. And since it is /home which gets mounted later anyway, the previous situation was tolerable, but the current one is not. I didn't notice any raid options to be used for system / kernel similar to the luks options. Though since /dev/md14 is defined in /etc/mdadm.conf, I am not sure why it wouldn't get enabled early in the boot process. I now see that rd.luks.uuid doesn't add another luks device to try to open, but actually restricts the luks devices to open to just that one. I have encrypted swap as well as root, so that causes a problem. Specifying two rd.luks.uid parameters doesn't work, as only one gets used. I don't see any documentation on specifying more than one device with one parameter. I might just try guessing something, but I'm going to look at trying to force /dev/md14 to get started to see if that helps. After finding documentation that said I can use rd.luks.uid more than once, I double check my uids and found one was missing a character. Fixing that got this to work without waiting at the spot it was previously. It still seems odd that something was able to determine that /dev/md14 didn't need to get started (for early boot), but couldn't figure out that the luks file system on it didn't need to get started as well. (In reply to comment #31) > It still seems odd that something was able to determine that /dev/md14 > didn't need to get started (for early boot), but couldn't figure out that > the luks file system on it didn't need to get started as well. will fix that one dracut-027-18.git20130409.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/dracut-027-18.git20130409.fc19 Thanks. I'll test that today and report back. I tested dracut-027-18.git20130409.fc20 and while it works with the two rd.luks.uuid parameters, it still doesn't work if I take them out. When I did, I got warnings about both my root and swap luks devices and the boot process stopped there. Package dracut-027-18.git20130409.fc19: * should fix your issue, * was pushed to the Fedora 19 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing dracut-027-18.git20130409.fc19' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-5216/dracut-027-18.git20130409.fc19 then log in and leave karma (feedback). I looked to see how the fix was supposed to work and saw that the intention was not to include unneeded crypttab entries in the initramfs copy of /etc/crypttab. I went and checked what I got and the initramfs created with dracut-027-18.git20130409.fc20 had no lines in its copy of /etc/crypttab. Where as an initramfs had a line for swap, root and home in its copy of /etc/crypttab. Making it look like there is some issue with the filtering. The following doesn't look like it is doing what was intented: if [[ $_hdev -ef $_dev ]] || [[ /dev/block/$_hdev -ef $_dev ]]; then On my system $_hdev has values of /dev/md12 and /dev/md13 and $_dev has values of UUID=451045dc-5ca4-49a7-9e2d-95a78e6fabca, UUID=9a976b86-8aaa-40d9-8039-89d710eac5c9 and UUID=987f98d1-0144-4bb4-a433-f7233784ca40. I do see some names that could be used for an -ef test, for example: ls -l /dev/disk/by-uuid/451045dc-5ca4-49a7-9e2d-95a78e6fabca lrwxrwxrwx. 1 root root 10 Apr 9 11:45 /dev/disk/by-uuid/451045dc-5ca4-49a7-9e2d-95a78e6fabca -> ../../md14 But the name format does not include 'UUID=', so that would need to get stripped off. The man page suggests that if isn't a path then it should be UUID=. So probably adding another compare to the first two will work. Created attachment 733458 [details]
modified to handle UUID= luks entries
I modified module-setup.sh to test for UUID= entries similar to path based entries. The modified file work correctly on my system. Only the two expected entries were in the initramfs crypttab file and the machine booted normally without using any rd.luks.uuid= kernel parameters. I just tested this one case.
Created attachment 733503 [details]
Proposed patch
(In reply to comment #41) > Created attachment 733503 [details] > Proposed patch ok, probably should be only UUID=, as the others are not supported anyway. Created attachment 733504 [details]
Proposed revised patch
dracut-027-19.git20130410.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/dracut-027-19.git20130410.fc19 While looking at the patch I noticed that the line [[ $_mapper = \#* ]] && continue uses a single = comparison. Isn't == needed to get globbing? I tested dracut-027-19.git20130410.fc19 and things looked good. I'll test the f20 version later this morning. (I am running updates now.) (In reply to comment #46) > I tested dracut-027-19.git20130410.fc19 and things looked good. I'll test > the f20 version later this morning. (I am running updates now.) the push to testing has reset the karma.. care to +1 ? (In reply to comment #45) > While looking at the patch I noticed that the line > [[ $_mapper = \#* ]] && continue > uses a single = comparison. Isn't == needed to get globbing? no, works as well I have tested dracut-027-19.git20130410.fc20.i686 now as well and things look good. Thanks. (In reply to comment #47) > (In reply to comment #46) > > I tested dracut-027-19.git20130410.fc19 and things looked good. I'll test > > the f20 version later this morning. (I am running updates now.) > > the push to testing has reset the karma.. care to +1 ? ah.. no.. sorry, misread the log message.. all fine :) Yeah, I just checked and saw the karma stayed at +1. dracut-027-19.git20130410.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report. |
Created attachment 707602 [details] dmesg log Description of problem: Start with dracut-026-1.fc19.i686 the boot process doesn't complete for new initramfs images. I was able to boot an older initramfs image. The logging indicated that a test for udev settling was continually not passing and after a couple of minutes the process exited to an error shell. I captured a dmesg log, but it doesn't show the detailed logs. I looked for init.log but couldn't find it. The cmdline was: ro root=/dev/mapper/luks-9a976b86-8aaa-40d9-8039-89d710eac5c9 SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us radeon.agpmode=-1 noreplace-smp slub_debug=- rdshell rdinitdebug