Bug 919752 - boot process hangs until timeout
boot process hangs until timeout
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: dracut (Show other bugs)
19
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Harald Hoyer
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-03-09 17:36 EST by Bruno Wolff III
Modified: 2013-04-20 15:20 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-04-20 15:20:26 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
dmesg log (42.84 KB, text/plain)
2013-03-09 17:36 EST, Bruno Wolff III
no flags Details
journalctl -a output (782.40 KB, text/plain)
2013-03-09 21:42 EST, Bruno Wolff III
no flags Details
fstab (749 bytes, text/plain)
2013-03-09 21:44 EST, Bruno Wolff III
no flags Details
mdadm.conf (459 bytes, text/plain)
2013-03-09 21:48 EST, Bruno Wolff III
no flags Details
Actual raid setup (383 bytes, text/plain)
2013-03-09 21:51 EST, Bruno Wolff III
no flags Details
dracut --debug for dracut-026-19.git20130313.fc19.i686 (8.89 MB, text/plain)
2013-03-13 17:17 EDT, Bruno Wolff III
no flags Details
dracut --debug for dracut-026-33.git20130313.fc19.i686 (8.89 MB, text/plain)
2013-03-13 17:27 EDT, Bruno Wolff III
no flags Details
journalctl -a output (3.00 MB, text/plain)
2013-03-15 15:10 EDT, Bruno Wolff III
no flags Details
modified to handle UUID= luks entries (3.04 KB, text/plain)
2013-04-09 23:29 EDT, Bruno Wolff III
no flags Details
Proposed patch (893 bytes, patch)
2013-04-10 03:37 EDT, Harald Hoyer
no flags Details | Diff
Proposed revised patch (666 bytes, patch)
2013-04-10 03:41 EDT, Harald Hoyer
no flags Details | Diff

  None (edit)
Description Bruno Wolff III 2013-03-09 17:36:37 EST
Created attachment 707602 [details]
dmesg log

Description of problem:
Start with dracut-026-1.fc19.i686 the boot process doesn't complete for new initramfs images. I was able to boot an older initramfs image.

The logging indicated that a test for udev settling was continually not passing and after a couple of minutes the process exited to an error shell.

I captured a dmesg log, but it doesn't show the detailed logs. I looked for init.log but couldn't find it.

The cmdline was:
ro root=/dev/mapper/luks-9a976b86-8aaa-40d9-8039-89d710eac5c9 SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us radeon.agpmode=-1 noreplace-smp slub_debug=- rdshell rdinitdebug
Comment 1 Tom London 2013-03-09 18:54:15 EST
kernel-3.9.0-0.rc1.git1.2.fc19.x86_64 from nodebug repo boots for me.

[root@tlondon ~]# uname -a; rpm -q dracut;
Linux tlondon.localhost.org 3.9.0-0.rc1.git1.2.fc19.x86_64 #1 SMP Fri Mar 8 16:15:51 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
dracut-026-1.fc19.x86_64
[root@tlondon ~]#
Comment 2 Bruno Wolff III 2013-03-09 21:42:08 EST
Created attachment 707686 [details]
journalctl -a output

I forgot that the dracut debugging wiki page was out of date. I reran the boot and saved journalctl -a output which has the important info. I also updated the wiki page to suggest journalctl -a instead of /init.log.
Comment 3 Bruno Wolff III 2013-03-09 21:44:26 EST
Created attachment 707687 [details]
fstab

Since not everyone is seeing this, I included my fstab info.
Comment 4 Bruno Wolff III 2013-03-09 21:48:48 EST
Created attachment 707688 [details]
mdadm.conf

Software raid setup.
Comment 5 Bruno Wolff III 2013-03-09 21:51:35 EST
Created attachment 707689 [details]
Actual raid setup

One odd thing is that I split md13 a while back in order to try to track down the cause of a bug introduced in F13 that hasn't been fixed yet. So there is a mismatch between mdadm.conf and the actual arrays right now. I don't know if that has any bearing, but it is unusual.
Comment 6 Harald Hoyer 2013-03-11 06:29:46 EDT
Mar 10 02:24:51 localhost kernel: Kernel command line: ro root=/dev/mapper/luks-9a976b86-8aaa-40d9-8039-89d710eac5c9 SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us radeon.agpmode=-1 noreplace-smp slub_debug=- rdshell rdinitdebug


You are either missing "rd.auto=1" or 
"rd.luks.uuid=9a976b86-8aaa-40d9-8039-89d710eac5c9"
on the kernel command line.

Can you try either option?
Comment 7 Harald Hoyer 2013-03-11 06:40:07 EDT
oh, and what is the output of:

# lsinitrd /boot/initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img
# lsinitrd /boot/initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img etc/crypttab
# lsinitrd /boot/initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img etc/mdadm.conf
# lsinitrd /boot/initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img 'etc/cmdline.d/*'
Comment 8 Harald Hoyer 2013-03-11 06:41:18 EDT
And the output of:

# grep hostonly /etc/dracut.conf /etc/dracut.conf.d/* /usr/lib/dracut/dracut.conf.d/*
Comment 9 Bruno Wolff III 2013-03-11 08:30:56 EDT
# grep hostonly /etc/dracut.conf /etc/dracut.conf.d/* /usr/lib/dracut/dracut.conf.d/*
/etc/dracut.conf:#hostonly="yes"
grep: /etc/dracut.conf.d/*: No such file or directory
Comment 10 Bruno Wolff III 2013-03-11 08:34:24 EDT
lsinitrd doesn't seem to work properly:
# lsinitrd /boot/initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img
No <initramfs file> specified and the default image '/boot/initramfs-3.9.0-0.rc1.git0.5.fc19.i686.PAE.img}' cannot be accessed!

Usage: lsinitrd [-s] [<initramfs file> [<filename>]]

-h, --help     print a help message and exit.
-s, --size     sort the contents of the initramfs by size.

[root@bruno bruno]# ls /boot/initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img
/boot/initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img
Comment 11 Bruno Wolff III 2013-03-11 08:55:53 EDT
Adding rd.auto=1 to the command line got the machine booting. The suggested raid device names weren't used which is causing a problem with mounting /boot again after the root pivot, since it is referenced by device name in /etc/fstab. I'll change that to use uuid or label either of which is a more modern approach than what I am currently doing.

My kernel boot parameters were initially set several releases ago (with tweaks to avoid some hardware driver issues) and probably at that time rd.auto=1 wasn't being added by default. It might be worth noting that things don't work without this on the yum upgrade page for upgrading to f19.
Comment 12 Bruno Wolff III 2013-03-11 09:13:58 EDT
My mdadm.conf file does specify the raid devices and which devices they are supposed to end up on. Is rd.auto=1 supposed to bypass that? I found a page describing raid options for dracut and there was an option to not use the mdadm.conf file, but I didn't see any to tell it to use it.
Comment 14 Harald Hoyer 2013-03-13 07:01:06 EDT
(In reply to comment #13)
> systemd upstream fixes:
> 
> http://cgit.freedesktop.org/systemd/systemd/commit/
> ?id=4d4fe926cb2ad1e624ddaea26f143fa93d867296
> http://cgit.freedesktop.org/systemd/systemd/commit/
> ?id=6bde0b3220e95a08cefb70846f73b2cf24b7734a

oops.. wrong bug :)

nevertheless, I fixed systemd and dracut.

dracut-026-19.git20130313
and
systemd-198-4

should fix your issues and booting without "rd.auto" or "rd.luks.uuid" should be possible now.
Comment 15 Bruno Wolff III 2013-03-13 09:19:25 EDT
I tried the updates out, but I am now not able to successfully enter my luks password. When typing at the luks prompt I sometimes get asterisks, sometimes the character I typed and somtimes space greater than. Guessing, I'd say two processes are fighting over the input.

I noticed that the new initramfs is smaller than I was getting before:
-rw-------. 1 root root 34898514 Mar 10 11:11 initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img
-rw-------. 1 root root 34898868 Mar 11 00:36 initramfs-3.9.0-0.rc1.git2.2.fc19.i686.PAE.img
-rw-------. 1 root root  9119685 Mar 13 07:51 initramfs-3.9.0-0.rc2.git0.3.fc19.i686.PAE.img

When I ran the kernel script to rebuild the initramfs I got the following error:
dracut-install: ERROR: failed to install '/var/tmp/initramfs.G03kNJ/etc/modules-
load.d/apmd.conf'' for '/var/tmp/initramfs.G03kNJ/etc/modules-load.d/apmd.conf'

This was with dracut-026-19.git20130313.fc19.i686 and systemd-198-4.fc19.i686.
Comment 16 Harald Hoyer 2013-03-13 10:14:05 EDT
(In reply to comment #15)
> I tried the updates out, but I am now not able to successfully enter my luks
> password. When typing at the luks prompt I sometimes get asterisks,
> sometimes the character I typed and somtimes space greater than. Guessing,
> I'd say two processes are fighting over the input.

sounds like a plymouth problem, which I might have fixed already... will debug

use "rd.plymouth=0 plymouth.enable=0" for now

> 
> I noticed that the new initramfs is smaller than I was getting before:
> -rw-------. 1 root root 34898514 Mar 10 11:11
> initramfs-3.9.0-0.rc1.git1.2.fc19.i686.PAE.img
> -rw-------. 1 root root 34898868 Mar 11 00:36
> initramfs-3.9.0-0.rc1.git2.2.fc19.i686.PAE.img
> -rw-------. 1 root root  9119685 Mar 13 07:51
> initramfs-3.9.0-0.rc2.git0.3.fc19.i686.PAE.img

Yes, this is https://fedoraproject.org/wiki/Features/DracutHostOnly
> 
> When I ran the kernel script to rebuild the initramfs I got the following
> error:
> dracut-install: ERROR: failed to install
> '/var/tmp/initramfs.G03kNJ/etc/modules-
> load.d/apmd.conf'' for
> '/var/tmp/initramfs.G03kNJ/etc/modules-load.d/apmd.conf'
> 
> This was with dracut-026-19.git20130313.fc19.i686 and
> systemd-198-4.fc19.i686.

Can you attach the debug.txt of:

# dracut -H test.img --debug &> debug.txt
Comment 17 Bruno Wolff III 2013-03-13 11:24:56 EDT
OK. I'll try again tonight. (I can't do reboots remotely and need to wait until I get home from work.)
Comment 18 Harald Hoyer 2013-03-13 12:58:01 EDT
oh, and please try dracut-026-33.git20130313.fc19 (currently building)
Comment 19 Bruno Wolff III 2013-03-13 17:17:02 EDT
Created attachment 709797 [details]
dracut --debug for dracut-026-19.git20130313.fc19.i686

Sorry I didn't get this done a bit sooner. I didn't need to do a reboot for this test.
Comment 20 Bruno Wolff III 2013-03-13 17:27:42 EDT
Created attachment 709809 [details]
dracut --debug for dracut-026-33.git20130313.fc19.i686

I tested rebuilding the initramfs with dracut-026-33.git20130313.fc19.i686 and am still seeing the error message referring to apmd.conf.

I'll try rebooting in a few hours.
Comment 21 Bruno Wolff III 2013-03-13 20:12:10 EDT
Using the plymouth options didn't work. I got a dracut shell prompt at about the point where I would normally be asked for the luks password.
Comment 22 Harald Hoyer 2013-03-14 05:58:24 EDT
(In reply to comment #20)
> Created attachment 709809 [details]
> dracut --debug for dracut-026-33.git20130313.fc19.i686
> 
> I tested rebuilding the initramfs with dracut-026-33.git20130313.fc19.i686
> and am still seeing the error message referring to apmd.conf.

/etc/modules-load.d/apmd.conf seems to have the executable bit set and dracut is running "ldd" on it :-)

Will fix "dracut-install" to ignore this specific ldd error message.
Comment 23 Harald Hoyer 2013-03-15 07:56:17 EDT
# koji download-build --arch=$(arch) systemd-198-6.fc19
# koji download-build --arch=$(arch) dracut-026-48.git20130315.fc19
# rpm -Fvh systemd*.rpm dracut*.rpm
# dracut -f --kver=<kernel_version>
Comment 24 Bruno Wolff III 2013-03-15 15:10:22 EDT
Created attachment 710785 [details]
journalctl -a output

That seems to be a lot better. The raid arrays were named as expected and all file systems ended up mounted.

I am still booting with the rd.plymouth=0 plymouth.enable=0 .

There were a couple of oddities. There was a timeout for one of the devices (perhaps the array used for swap) early in the boot, though it did get set up later.

Also I got asked for the password for each device (they all use the same pass phrase) when normally I wouldn't. I suspect this was just to using the boot parameters listed above, but figured I should mention it just in case.
Comment 25 Harald Hoyer 2013-03-18 05:43:10 EDT
(In reply to comment #24)
> Created attachment 710785 [details]
> journalctl -a output
> 
> That seems to be a lot better. The raid arrays were named as expected and
> all file systems ended up mounted.
> 
> I am still booting with the rd.plymouth=0 plymouth.enable=0 .
> 
> There were a couple of oddities. There was a timeout for one of the devices
> (perhaps the array used for swap) early in the boot, though it did get set
> up later.

Hmm, strange.

What is the layout of the 
/dev/disk/by-uuid/451045dc-5ca4-49a7-9e2d-95a78e6fabca
luks-451045dc-5ca4-49a7-9e2d-95a78e6fabca device?

> 
> Also I got asked for the password for each device (they all use the same
> pass phrase) when normally I wouldn't. I suspect this was just to using the
> boot parameters listed above, but figured I should mention it just in case.

yeah, that was because "rd.plymouth=0 plymouth.enable=0" turns off plymouth.
Comment 26 Bruno Wolff III 2013-03-18 22:28:33 EDT
/dev/md14: UUID="451045dc-5ca4-49a7-9e2d-95a78e6fabca" TYPE="crypto_LUKS"
/dev/mapper/luks-451045dc-5ca4-49a7-9e2d-95a78e6fabca: LABEL="home" UUID="cf157d80-cd85-469a-992c-20f3d617b061" TYPE="ext4"

That device is my /home which is ext4 on top of luks on top of software raid 1.
Comment 27 Harald Hoyer 2013-03-19 04:21:11 EDT
(In reply to comment #26)
> /dev/md14: UUID="451045dc-5ca4-49a7-9e2d-95a78e6fabca" TYPE="crypto_LUKS"
> /dev/mapper/luks-451045dc-5ca4-49a7-9e2d-95a78e6fabca: LABEL="home"
> UUID="cf157d80-cd85-469a-992c-20f3d617b061" TYPE="ext4"
> 
> That device is my /home which is ext4 on top of luks on top of software raid
> 1.

Ah, that is not needed for mounting the root filesystem and booting into it.

Please add "rd.luks.uuid=9a976b86-8aaa-40d9-8039-89d710eac5c9" to the kernel command line, which should fix the timeout.

Anaconda would have done this for you.
Comment 28 Fedora End Of Life 2013-04-03 15:22:10 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19
Comment 29 Bruno Wolff III 2013-04-09 00:36:14 EDT
That didn't help. I also tried rd.luks.uuid=451045dc-5ca4-49a7-9e2d-95a78e6fabca as that was the device not getting mounted. It also looks like /dev/md14 isn't getting started and if not, maybe that is why 451045dc-5ca4-49a7-9e2d-95a78e6fabca isn't getting found?

Things have gotten worse today. The timeout is much longer. Perhaps this is related to fixing timeouts for encrypted devices. Unfortunately this timeout isn't waiting on a passphrase and isn't ever going to happen. And since it is /home which gets mounted later anyway, the previous situation was tolerable, but the current one is not.

I didn't notice any raid options to be used for system / kernel similar to the luks options. Though since /dev/md14 is defined in /etc/mdadm.conf, I am not sure why it wouldn't get enabled early in the boot process.
Comment 30 Bruno Wolff III 2013-04-09 00:56:04 EDT
I now see that rd.luks.uuid doesn't add another luks device to try to open, but actually restricts the luks devices to open to just that one. I have encrypted swap as well as root, so that causes a problem. Specifying two rd.luks.uid parameters doesn't work, as only one gets used. I don't see any documentation on specifying more than one device with one parameter. I might just try guessing something, but I'm going to look at trying to force /dev/md14 to get started to see if that helps.
Comment 31 Bruno Wolff III 2013-04-09 01:41:21 EDT
After finding documentation that said I can use rd.luks.uid more than once, I double check my uids and found one was missing a character. Fixing that got this to work without waiting at the spot it was previously.

It still seems odd that something was able to determine that /dev/md14 didn't need to get started (for early boot), but couldn't figure out that the luks file system on it didn't need to get started as well.
Comment 32 Harald Hoyer 2013-04-09 04:11:59 EDT
(In reply to comment #31)
> It still seems odd that something was able to determine that /dev/md14
> didn't need to get started (for early boot), but couldn't figure out that
> the luks file system on it didn't need to get started as well.

will fix that one
Comment 33 Fedora Update System 2013-04-09 08:59:46 EDT
dracut-027-18.git20130409.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/dracut-027-18.git20130409.fc19
Comment 34 Bruno Wolff III 2013-04-09 10:17:34 EDT
Thanks. I'll test that today and report back.
Comment 35 Bruno Wolff III 2013-04-09 12:57:56 EDT
I tested dracut-027-18.git20130409.fc20 and while it works with the two rd.luks.uuid parameters, it still doesn't work if I take them out. When I did, I got warnings about both my root and swap luks devices and the boot process stopped there.
Comment 36 Fedora Update System 2013-04-09 13:02:17 EDT
Package dracut-027-18.git20130409.fc19:
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing dracut-027-18.git20130409.fc19'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-5216/dracut-027-18.git20130409.fc19
then log in and leave karma (feedback).
Comment 37 Bruno Wolff III 2013-04-09 17:01:12 EDT
I looked to see how the fix was supposed to work and saw that the intention was not to include unneeded crypttab entries in the initramfs copy of /etc/crypttab. I went and checked what I got and the initramfs created with dracut-027-18.git20130409.fc20 had no lines in its copy of /etc/crypttab. Where as an initramfs had a line for swap, root and home in its copy of /etc/crypttab. Making it look like there is some issue with the filtering.
Comment 38 Bruno Wolff III 2013-04-09 18:12:05 EDT
The following doesn't look like it is doing what was intented:

if [[ $_hdev -ef $_dev ]] || [[ /dev/block/$_hdev -ef $_dev ]]; then

On my system $_hdev has values of /dev/md12 and /dev/md13 and $_dev has values of UUID=451045dc-5ca4-49a7-9e2d-95a78e6fabca, UUID=9a976b86-8aaa-40d9-8039-89d710eac5c9 and UUID=987f98d1-0144-4bb4-a433-f7233784ca40.
Comment 39 Bruno Wolff III 2013-04-09 18:27:27 EDT
I do see some names that could be used for an -ef test, for example:
ls -l /dev/disk/by-uuid/451045dc-5ca4-49a7-9e2d-95a78e6fabca
lrwxrwxrwx. 1 root root 10 Apr  9 11:45 /dev/disk/by-uuid/451045dc-5ca4-49a7-9e2d-95a78e6fabca -> ../../md14

But the name format does not include 'UUID=', so that would need to get stripped off. The man page suggests that if isn't a path then it should be UUID=. So probably adding another compare to the first two will work.
Comment 40 Bruno Wolff III 2013-04-09 23:29:01 EDT
Created attachment 733458 [details]
modified to handle UUID= luks entries

I modified module-setup.sh to test for UUID= entries similar to path based entries. The modified file work correctly on my system. Only the two expected entries were in the initramfs crypttab file and the machine booted normally without using any rd.luks.uuid= kernel parameters. I just tested this one case.
Comment 41 Harald Hoyer 2013-04-10 03:37:13 EDT
Created attachment 733503 [details]
Proposed patch
Comment 42 Harald Hoyer 2013-04-10 03:39:26 EDT
(In reply to comment #41)
> Created attachment 733503 [details]
> Proposed patch

ok, probably should be only UUID=, as the others are not supported anyway.
Comment 43 Harald Hoyer 2013-04-10 03:41:38 EDT
Created attachment 733504 [details]
Proposed revised patch
Comment 44 Fedora Update System 2013-04-10 04:41:29 EDT
dracut-027-19.git20130410.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/dracut-027-19.git20130410.fc19
Comment 45 Bruno Wolff III 2013-04-10 09:17:15 EDT
While looking at the patch I noticed that the line
[[ $_mapper = \#* ]] && continue
uses a single = comparison. Isn't == needed to get globbing?
Comment 46 Bruno Wolff III 2013-04-10 09:46:04 EDT
I tested dracut-027-19.git20130410.fc19 and things looked good. I'll test the f20 version later this morning. (I am running updates now.)
Comment 47 Harald Hoyer 2013-04-10 11:20:17 EDT
(In reply to comment #46)
> I tested dracut-027-19.git20130410.fc19 and things looked good. I'll test
> the f20 version later this morning. (I am running updates now.)

the push to testing has reset the karma.. care to +1 ?
Comment 48 Harald Hoyer 2013-04-10 11:20:48 EDT
(In reply to comment #45)
> While looking at the patch I noticed that the line
> [[ $_mapper = \#* ]] && continue
> uses a single = comparison. Isn't == needed to get globbing?

no, works as well
Comment 49 Bruno Wolff III 2013-04-10 11:22:35 EDT
I have tested dracut-027-19.git20130410.fc20.i686 now as well and things look good. Thanks.
Comment 50 Harald Hoyer 2013-04-10 11:22:44 EDT
(In reply to comment #47)
> (In reply to comment #46)
> > I tested dracut-027-19.git20130410.fc19 and things looked good. I'll test
> > the f20 version later this morning. (I am running updates now.)
> 
> the push to testing has reset the karma.. care to +1 ?

ah.. no.. sorry, misread the log message.. all fine :)
Comment 51 Bruno Wolff III 2013-04-10 11:28:57 EDT
Yeah, I just checked and saw the karma stayed at +1.
Comment 52 Fedora Update System 2013-04-20 15:20:30 EDT
dracut-027-19.git20130410.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.