Bug 1815374 - rngd module introduced in dracut 050 breaks LUKS prompt/boot
Summary: rngd module introduced in dracut 050 breaks LUKS prompt/boot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 31
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: dracut-maint-list
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1816421 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-20 04:27 UTC by Dimitris
Modified: 2020-10-18 15:48 UTC (History)
23 users (show)

Fixed In Version: dracut-050-61.git20200529.fc31
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-18 15:48:01 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
working initrd manifest (233.36 KB, text/plain)
2020-03-21 18:04 UTC, Dimitris
no flags Details
regression initrd manifest (238.94 KB, text/plain)
2020-03-21 18:04 UTC, Dimitris
no flags Details
/tmp/lsinitrd.5.5.10.dracut049 (234.67 KB, text/plain)
2020-03-25 15:55 UTC, Dimitris
no flags Details
/tmp/lsinitrd.5.5.10.dracut050 (239.21 KB, text/plain)
2020-03-25 15:56 UTC, Dimitris
no flags Details

Description Dimitris 2020-03-20 04:27:43 UTC
1. Please describe the problem:

Booted with newly installed 5.5.10 kernel from updates-testing, LUKS graphical password prompt displays briefly then both screens (laptop panel and external via Dock+DisplayPort) go blank,

2. What is the Version-Release number of the kernel:

5.5.10-200.fc31.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Yes, I can still boot 5.5.9 and previous kernels

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

- Leave kernel command line unchanged in grub, includes quiet as per default.
- Screen goes blank soon after the LUKS password prompt displays.
- Deleting "quiet" from the grub kernel command line, OR booting 5.5.9, resolves the issue.


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Haven't tested yet


6. Are you running any modules that not shipped with directly Fedora's kernel?:

No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Dimitris 2020-03-20 04:45:09 UTC
When this happens I end up not able to even soft-reboot the machine, have to hard-reboot.  I also can't get these failed boot-attemps' journal, unfortunately.

Another data point:  I can also work around this if I remove only "rhgb" from the kernel command line.  Smells like a display-related race condition.

The laptop is a ThinkPad T495, AMD Ryzen 3700U (amdgpu Vega graphics)

Comment 2 Dimitris 2020-03-20 14:35:53 UTC
Another quick data point: I can also reproduce this with the laptop undocked and no external display.

Comment 3 Dimitris 2020-03-21 17:50:10 UTC
I've reproduced this with an older kernel (5.5.8) so it appears to be a dracut regression.  Steps:

- Kernel 5.5.8 was originally installed under dracut 049-27.git20181204.fc31
- On March 18th, dracut was updated to 050-26.git20200316.fc31
- On March 19th, kernel was updated to 5.5.10.
- 5.5.10 is unbootable with the symptoms in the description, requires removal of "quiet" or "rhgb" to boot.
- I regenerated a new initramfs for 5.5.8 today. Booting with that also results in the same symptoms.
- 5.5.8 still works if I let it boot with its original image.

I noticed https://bugzilla.redhat.com/show_bug.cgi?id=1801315.  FWIW, I use LUKS but for an encrypted LVM which then contains root and swap, so the only luks entry is in /etc/crypttab, not in /etc/fstab.

Comment 4 Dimitris 2020-03-21 18:04:08 UTC
Created attachment 1672234 [details]
working initrd manifest

Comment 5 Dimitris 2020-03-21 18:04:42 UTC
Created attachment 1672235 [details]
regression initrd manifest

Comment 6 Hans de Goede 2020-03-23 10:03:21 UTC
Hi Dimitris,

Thank you for the bug report and thank you for figuring out that regenerating the initrd causes this, so that it is not a kernel issue.

I'm the Fedora plymouth maintainer so I have taken an interest in this bug as I was worried it might be a plymouth issue, but I've not yet pushed the recent Fedora 32 plymouth changes/fixes to Fedora 31, so plymouth in Fedora 31 has not changed since October ruling plymouth out.

My next hunch was linux-firmware, a new linux-firmware package was build and pushed to updates-testing on March 16th and pushed to stable on March 18th. But looking at your lsinitrd outputs for the initramfs the firmware seems unchanged (based on file sizes) except that some files have become 0 bytes!, e.g.:

usr/lib/firmware/amdgpu/bonaire_sdma1.bin

is 0 bytes and many more firmwares, but e.g. also

usr/lib/kbd/consolefonts/iso01-12x22.psfu.gz

is 0 bytes.

I wonder if this was a glitch during initrd generation (should not happen but you never know) can you regenerate the initrd with 050 and see if this happens again?  Hmm, you did say that you found out it was dracut (or at least the initrd) by regenerating the 5.5.9 initrd and then 5.5.9 would not boot either, right ?  So I guess that rules out a glitch.

I'm pretty sure this really is a weird dracut issue, but just to rule out something else is broken and regenerating the initrd just adds that other broken thing to it, can you try downgrading your dracut to the 049 release, regenerate the initrd and see if that fixes things?

To downgrade, do:

1. "rpm -qa | grep dracut" and note which dracut packages you have installed
2. Go to: https://koji.fedoraproject.org/koji/buildinfo?buildID=1286558 and download the x86_64 version of the packages you have installed
3. From a dir with these packages run: "sudo rpm -Uvh --oldpackage dracut *.rpm"
4. Regenerate the initrd, see if the problem is fixed.

Comment 7 Hans de Goede 2020-03-23 10:05:00 UTC
Ugh, the command in step 3. should be:

sudo rpm -Uvh --oldpackage dracut*.rpm

Note no space between dracut and the '*' !

Comment 8 Hans de Goede 2020-03-23 10:32:42 UTC
One more observation on my own F32 beta system usr/lib/kbd/consolefonts/gr737c-8x16.psfu.gz and usr/lib/kbd/consolefonts/iso01-12x22.psfu.gz are 0 bytes, and so are they in both the lsinitrd outputs attached here, which is weird. The 0 sized font files do not change between 049 and 050 (I tried downgrading on my own system).

The 0 sized firmwares however are only present in the attached lsinitrd output for the dracut-050 generated initrd. I'm not seeing this on my own system, but I do not have AMD graphics and the 0 sized files are in the amdgpu dir, which my initrd does not have.

So it seems like the 0 sized font-file and firmware issues might be unrelated. I still think the 0 sized font files are a bug too, and this could potentially have the same root cause, but lets focus on the firmware issue now as that seems to actually be causing issues for people.

Comment 9 Dimitris 2020-03-25 15:55:44 UTC
Created attachment 1673562 [details]
/tmp/lsinitrd.5.5.10.dracut049

Comment 10 Dimitris 2020-03-25 15:56:27 UTC
Created attachment 1673563 [details]
/tmp/lsinitrd.5.5.10.dracut050

Comment 11 Dimitris 2020-03-25 15:57:01 UTC
Couple of updates:

- The "remove quiet" workaround isn't very reliable, works about 50% of the time, so reboots are a little more laborious.

- I repeated the experiment with kernel 5.5.10 and it's the same, 049 works when 050 doesn't:
  - Downgraded with `dnf distro-sync dracut --disablerepo updates`, so now on 049-27.git20181204.fc31.1.
  - Generated image: `dracut /boot/initramfs-5.5.10-200.fc31.x86_64.img.dracut049 5.5.10-200.fc31.x86_64`.
  - Editing the image to the .dracut049 one on boot works reliably (booted 3 times in a row).

Comment 12 Dimitris 2020-03-25 16:29:20 UTC
BTW the zero-sized amdgpu firmware files are the same between 049 and 050:

diff <(egrep ' 0 .+amdgpu' /tmp/lsinitrd.5.5.10.dracut049 |awk '{print $9}') <(egrep ' 0 .+amdgpu' /tmp/lsinitrd.5.5.10.dracut050 |awk '{print $9}')

shows no differences.

Comment 13 Dimitris 2020-03-25 17:11:34 UTC
Thanks for jumping in Hans!

The symptoms (graphical/plymouth LUKS displays very briefly then back to unresponsive (to passphrase) text console), combined with the "workaround" of removing quiet also being a coin toss, smell to me like some race condition.  Unfortunately that's the extent of my understanding of dracut/the boot process.

BTW I've now installed kernel 5.5.11 from updates-testing and it's booting fine with the default initrd (dracut held back to 049).

I'd be happy to try dracut config options or other troubleshooting if anyone has ideas where to look.

Comment 14 Hans de Goede 2020-03-26 12:41:00 UTC
*** Bug 1816421 has been marked as a duplicate of this bug. ***

Comment 15 Roland Askew 2020-03-28 02:49:58 UTC
Hello,

I am encountering a similar problem in Fedora 31, x86_64 with kernel 5.5.10-200.fc31, but not 5.5.8-200.fc31.


With 5.5.10 the problem in intermittent, sometimes things work, but often the boot up sequence hangs, apparently during a plymouth operation. Laptop screen cuts out with rythmic white screen briefly flashing every second. A secondary display shows the bootup console and where it hangs.

No problems with kernel 5.5.8.

Haven't tried the latest rawhide version.

I am having this problem on a Toshiba SatPro P50-TA Laptop.

Originally reported in duplicate bug 1816421, closed when it appeared the problem disappeared, but as described above it is sometimes reoccurring.

The dnf transaction when kernel 5,5,10 was installed, has scripting errors related to EFI, but the kernel appears to have been installed correctly and the computer can boot, if intermittent as described above.

----------------


I'm assuming this is a dracut problem as described above, and would like to downgrade and see if this fixes things, but need clear instructions on how to do this?

My apologies, I am not savvy with low level operations outside common basics (such as using dnf/yum).

Comment 16 Dimitris 2020-03-28 03:06:03 UTC
Roland, you should be able to downgrade dracut with the dnf command from comment 11.  Full steps to be safe:

- Downgrade dracut; we're "lucky" in that 050 is available only from the updates repo, so:

  dnf distro-sync dracut --disablerepo updates

- Generate *alternate* initrd image, so we don't break anything.  Assuming your initrd images are in the default /boot location, and kernel 5.5.10, then as root (or sudo):

  dracut /boot/initramfs-5.5.10-200.fc31.x86_64.img.dracut049 5.5.10-200.fc31.x86_64

- Upgrade back to dracut 050 for now

  dnf distro-sync dracut

- Reboot, cursor up/down to select kernel 5.5.10 as needed in the grub menu, then `e` to edit the boot parameters.  Change the name of the initrd image to the special .dracut049 one you created above, then Ctrl-X to boot.

Please let us know how it goes.

Comment 17 Dimitris 2020-03-29 22:46:08 UTC
The problem seems related to the rngd module in my case:

An obvious difference between the manifests (lsinitrd.5.5.10.dracut049 and lsinitrd.5.5.10.dracut050 (see attachments)) is that 050 includes the rngd module, whereas 049 doesn't.  So I went ahead and upgraded back to 050, then generated one image under the default configuration and one with /etc/dracut.conf.d/omit-rngd.conf with contents:

# https://bugzilla.redhat.com/show_bug.cgi?id=1815374
omit_dracutmodules+="rngd"

The default-config image reproduces the problem, and the one generated under this modified config succeeded through multiple reboots.

Not sure how to debug further at this point, but at least this seems to remove the need to hold dracut back to 049.

Comment 18 Roland Askew 2020-03-30 00:02:30 UTC
Following from Comment 16:

I downgraded dracut as directed, and rebooted.

In general, I am still having occasional boot failures in kernel 5.5.10 as previously described. However these boot failures have become rare. 

Other notes:

Before downgrading, some of the NVidia nouveau firmware had 0-byte sized files, and still does (of course), so I suspect dracut as well.

Comment 19 Roland Askew 2020-03-30 00:05:09 UTC
Oh, and the boot failure still has the strobing / flashing screen problem, except it now shows the toshiba logo on the main laptop screen between strobes, and continues to show the boot console on the second screen.

Comment 20 Roland Askew 2020-03-30 23:07:31 UTC
Followup to Comment 17:

I have upgraded back to dracut 50, but omitted the rngd module from the config, as described previously, to see if removes the problem too.

I can confirm the modified config, with rngd module removed, is succeeding without the problem.

I am curious what role the rngd module is for, and if it replaced a different module, given it is introduced in dracut 50 but was not in previous versions?

Is there another module we can safely use in place of rngd?

Comment 21 Hans de Goede 2020-04-03 08:32:06 UTC
Good job on figuring out that the rngd module is the cause here. I'm afraid I don't really know how to debug this further. I've pinged one of the dracut maintainers about this asking him to take a look.

Comment 22 Harald Hoyer 2020-04-03 08:41:33 UTC
(In reply to Dimitris from comment #17)
> The problem seems related to the rngd module in my case:
> 
> An obvious difference between the manifests (lsinitrd.5.5.10.dracut049 and
> lsinitrd.5.5.10.dracut050 (see attachments)) is that 050 includes the rngd
> module, whereas 049 doesn't.  So I went ahead and upgraded back to 050, then
> generated one image under the default configuration and one with
> /etc/dracut.conf.d/omit-rngd.conf with contents:
> 
> # https://bugzilla.redhat.com/show_bug.cgi?id=1815374
> omit_dracutmodules+="rngd"
> 
> The default-config image reproduces the problem, and the one generated under
> this modified config succeeded through multiple reboots.
> 
> Not sure how to debug further at this point, but at least this seems to
> remove the need to hold dracut back to 049.

Please retry with the correct config line:
omit_dracutmodules+=" rngd "

*NOTE* the spaces as mentioned in the man page. I know.. stupid..

Comment 23 Harald Hoyer 2020-04-03 08:57:40 UTC
(In reply to Dimitris from comment #12)
> BTW the zero-sized amdgpu firmware files are the same between 049 and 050:
> 
> diff <(egrep ' 0 .+amdgpu' /tmp/lsinitrd.5.5.10.dracut049 |awk '{print $9}')
> <(egrep ' 0 .+amdgpu' /tmp/lsinitrd.5.5.10.dracut050 |awk '{print $9}')
> 
> shows no differences.

❯ LANG=C hardlink -cvn /usr/lib/firmware/amdgpu/
Directories:           1
Objects:             337
Regular files:       336
Comparisons:          74
Would link:           74
Would save:      9424896

There are a lot of similar files. dracut uses hardlink to save space. So these zero sized files are actual hardlinks.

Comment 24 Dimitris 2020-04-05 21:03:34 UTC
@Harald, I changed the config to use spaces:

omit_dracutmodules+=" rngd "

But it seems that rngd is omitted both with and without spaces.  My previous config was without spaces.  Comparing the two initrd manifests:

$ diff <(sudo lsinitrd -m /boot/initramfs-5.5.15-200.fc31.x86_64.img) <(sudo lsinitrd -m /boot/initramfs-5.5.15-200.fc31.x86_64.img.config_with_spaces)

finds no differences:


/boot/initramfs-5.5.15-200.fc31.x86_64.img (generated with omit_dracutmodules+="rngd") :

initrd in UEFI: : 39M
========================================================================
Early CPIO image
========================================================================
drwxr-xr-x   3 root     root            0 Mar 16 08:15 .
-rw-r--r--   1 root     root            2 Mar 16 08:15 early_cpio
drwxr-xr-x   3 root     root            0 Mar 16 08:15 kernel
drwxr-xr-x   3 root     root            0 Mar 16 08:15 kernel/x86
drwxr-xr-x   2 root     root            0 Mar 16 08:15 kernel/x86/microcode
-rw-r--r--   1 root     root         6476 Mar 16 08:15 kernel/x86/microcode/AuthenticAMD.bin
========================================================================
Version: dracut-050-26.git20200316.fc31

dracut modules:
bash
systemd
systemd-initrd
nss-softokn
i18n
network-manager
network
ifcfg
drm
plymouth
crypt
dm
kernel-modules
kernel-modules-extra
kernel-network-modules
lvm
resume
rootfs-block
terminfo
udev-rules
dracut-systemd
usrmount
base
fs-lib
shutdown
========================================================================


/boot/initramfs-5.5.15-200.fc31.x86_64.img.config_with_spaces (generated with omit_dracutmodules-" rngd ") :

initrd in UEFI: : 39M
========================================================================
Early CPIO image
========================================================================
drwxr-xr-x   3 root     root            0 Mar 16 08:15 .
-rw-r--r--   1 root     root            2 Mar 16 08:15 early_cpio
drwxr-xr-x   3 root     root            0 Mar 16 08:15 kernel
drwxr-xr-x   3 root     root            0 Mar 16 08:15 kernel/x86
drwxr-xr-x   2 root     root            0 Mar 16 08:15 kernel/x86/microcode
-rw-r--r--   1 root     root         6476 Mar 16 08:15 kernel/x86/microcode/AuthenticAMD.bin
========================================================================
Version: dracut-050-26.git20200316.fc31

dracut modules:
bash
systemd
systemd-initrd
nss-softokn
i18n
network-manager
network
ifcfg
drm
plymouth
crypt
dm
kernel-modules
kernel-modules-extra
kernel-network-modules
lvm
resume
rootfs-block
terminfo
udev-rules
dracut-systemd
usrmount
base
fs-lib
shutdown
========================================================================

This is probably because rngd is the only module I'm omitting?  Anyway, I'm leaving the config to have spaces since that seems to be the canonical form.

Comment 25 Fedora Update System 2020-05-29 19:03:29 UTC
FEDORA-2020-03e14f6120 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-03e14f6120

Comment 26 Fedora Update System 2020-05-30 02:04:24 UTC
FEDORA-2020-03e14f6120 has been pushed to the Fedora 31 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-03e14f6120`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-03e14f6120

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 27 Dimitris 2020-05-31 18:49:21 UTC
I've tested the four combinations of 050-26/050-61 and with/without rngd.

The "cannot boot" issue I was facing actually seems to have been resolved before the new dracut, presumably by some other update.  No longer able to reproduce it with either 050-61 or 050-26.

The only lingering issue I see when rngd is included (as is the default), and across both of those dracut versions, is that if I choose to reboot the machine at the LUKS prompt, systemd-shutdown takes about a minute waiting for rngd before rebooting.  Rebooting at the same point without rngd in the initrd is "instant".

The only interesting update I can think of between the time I reported/troubleshooted this and now is a single upgrade of systemd on April 2nd.  Here it is in case it helps find the root cause:

Transaction ID : 207
Begin time     : Thu 02 Apr 2020 09:31:25 AM PDT
Begin rpmdb    : 1823:e7742dbdc2e1a3a20074121661436d3df64d62f7
End time       : Thu 02 Apr 2020 09:31:32 AM PDT (7 seconds)
End rpmdb      : 1823:13f4833c50d917d7195be3ef10cf5c2bdab5025a
User           : D <d>
Return-Code    : Success
Releasever     : 31
Command Line   : upgrade --refresh
Packages Altered:
    Upgrade  coreutils-8.31-9.fc31.x86_64           @updates
    Upgraded coreutils-8.31-6.fc31.x86_64           @@System
    Upgrade  coreutils-common-8.31-9.fc31.x86_64    @updates
    Upgraded coreutils-common-8.31-6.fc31.x86_64    @@System
    Upgrade  daxctl-libs-68-1.fc31.x86_64           @updates
    Upgraded daxctl-libs-67-1.fc31.x86_64           @@System
    Upgrade  libhandy-0.0.13-1.fc31.x86_64          @updates
    Upgraded libhandy-0.0.11-1.fc31.x86_64          @@System
    Upgrade  lmdb-libs-0.9.24-1.fc31.x86_64         @updates
    Upgraded lmdb-libs-0.9.23-3.fc31.x86_64         @@System
    Upgrade  mutter-3.34.5-1.fc31.x86_64            @updates
    Upgraded mutter-3.34.4-2.fc31.x86_64            @@System
    Upgrade  ndctl-68-1.fc31.x86_64                 @updates
    Upgraded ndctl-67-1.fc31.x86_64                 @@System
    Upgrade  ndctl-libs-68-1.fc31.x86_64            @updates
    Upgraded ndctl-libs-67-1.fc31.x86_64            @@System
    Upgrade  python3-setools-4.2.2-2.fc31.x86_64    @updates
    Upgraded python3-setools-4.2.2-1.fc31.x86_64    @@System
    Upgrade  systemd-243.8-1.fc31.x86_64            @updates
    Upgraded systemd-243.7-1.fc31.x86_64            @@System
    Upgrade  systemd-container-243.8-1.fc31.x86_64  @updates
    Upgraded systemd-container-243.7-1.fc31.x86_64  @@System
    Upgrade  systemd-libs-243.8-1.fc31.x86_64       @updates
    Upgraded systemd-libs-243.7-1.fc31.x86_64       @@System
    Upgrade  systemd-pam-243.8-1.fc31.x86_64        @updates
    Upgraded systemd-pam-243.7-1.fc31.x86_64        @@System
    Upgrade  systemd-rpm-macros-243.8-1.fc31.noarch @updates
    Upgraded systemd-rpm-macros-243.7-1.fc31.noarch @@System
    Upgrade  systemd-udev-243.8-1.fc31.x86_64       @updates
    Upgraded systemd-udev-243.7-1.fc31.x86_64       @@System
Scriptlet output:
   1 Warning: The unit file, source configuration file or drop-ins of systemd-udevd.service changed on disk. Run 'systemctl daemon-reload' to reload units.

Comment 28 Fedora Update System 2020-10-18 15:48:01 UTC
FEDORA-2020-03e14f6120 has been pushed to the Fedora 31 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.