Bug 1726617

Summary: clevis takes up to 20 minutes or more to unlock the root devices at boot
Product: Red Hat Enterprise Linux 8 Reporter: Renaud Métrich <rmetrich>
Component: dracutAssignee: Lukáš Nykrýn <lnykryn>
Status: CLOSED ERRATA QA Contact: qe-baseos-daemons
Severity: high Docs Contact:
Priority: high    
Version: 8.0CC: aklimov, cww, dapospis, dpal, dracut-maint-list, jpazdziora, jstodola, mbroz, mzeleny, onatalen, pkotvan, shivgupt, tjaros
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: dracut-049-18.git20190725.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-05 22:27:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Proposed updated module introducing rngd none

Description Renaud Métrich 2019-07-03 09:46:40 UTC
Created attachment 1587014 [details]
Proposed updated module introducing rngd

Description of problem:

A customer experiences a high delay in random number generator initialization, causing clevis to take 20 minutes or more to unlock the root devices at boot.
This is seen on Dell PowerEdge R210 II.

A workaround is to boot with rd.debug which apparently produces entropy, causing a 3 minutes delay to occur anyway, but this is not really acceptable since rd.debug has impacts on the boot itself.


Version-Release number of selected component (if applicable):

clevis-dracut-11-2.el8.x86_64

How reproducible:

Always on customer's hardware.


Additional info:

A solution is to backport the "random.trust_cpu=on" fix to RHEL8 kernel and enable this tunable at boot, but this may have security implications.
The best solution I could find so far is to instead integrate the rngd service in the initramfs, which definitely helps booting within a couple of seconds:

Without rngd service:

random number generation initialization:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Jul 02 11:05:14.278284 localhost kernel: random: get_random_bytes called from start_kernel+0x36b/0x55b with crng_init=0
Jul 02 11:05:14.294944 localhost kernel: random: systemd: uninitialized urandom read (16 bytes read)
Jul 02 11:05:14.294958 localhost kernel: random: systemd: uninitialized urandom read (16 bytes read)
Jul 02 11:05:14.294972 localhost kernel: random: systemd: uninitialized urandom read (16 bytes read)
Jul 02 11:05:14.295006 localhost kernel: random: systemd: uninitialized urandom read (16 bytes read)
Jul 02 11:05:14.295013 localhost kernel: random: systemd: uninitialized urandom read (16 bytes read)
Jul 02 11:05:14.295019 localhost kernel: random: systemd: uninitialized urandom read (16 bytes read)
Jul 02 11:05:19.969756 localhost kernel: random: fast init done
Jul 02 11:30:45.266108 localhost kernel: random: crng init done
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

dracut-initqueue running (25 minutes) (includes clevis unlocking, used "set -x" in /usr/libexec/clevis-luks-askpass):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Jul 02 11:05:23.800629 localhost dracut-initqueue[641]: RTNETLINK answers: Network is unreachable
[...]
Jul 02 11:30:50.169845 localhost dracut-initqueue[641]: Allowing activation with --readonly --sysinit.
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------


With rngd service:

random number generation initialization:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Jul 02 11:46:31.951370 localhost kernel: random: get_random_bytes called from start_kernel+0x36b/0x55b with crng_init=0
Jul 02 11:46:31.968649 localhost kernel: random: systemd: uninitialized urandom read (16 bytes read)
Jul 02 11:46:31.968665 localhost kernel: random: systemd: uninitialized urandom read (16 bytes read)
Jul 02 11:46:31.968679 localhost kernel: random: systemd: uninitialized urandom read (16 bytes read)
Jul 02 11:46:31.968713 localhost kernel: random: systemd: uninitialized urandom read (16 bytes read)
Jul 02 11:46:31.968719 localhost kernel: random: systemd: uninitialized urandom read (16 bytes read)
Jul 02 11:46:32.891188 localhost kernel: random: crng init done
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

dracut-initqueue running (7 seconds) (includes clevis unlocking, used "set -x" in /usr/libexec/clevis-luks-askpass):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Jul 02 11:46:42.106159 localhost dracut-initqueue[647]: RTNETLINK answers: Network is unreachable
[...]
Jul 02 11:46:49.829477 localhost dracut-initqueue[647]: Allowing activation with --readonly --sysinit.
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Comment 1 Renaud Métrich 2019-07-03 09:50:26 UTC
For now, I propose to integrate the rngd service in the initramfs for Clevis only.
Maybe this could be done as a standard module instead if some other modules require random number generation.

Comment 2 Renaud Métrich 2019-07-12 09:38:38 UTC
Redirecting to dracut since this issue may happen with other decryption modules.
Please check my PR: https://github.com/dracutdevs/dracut/pull/600

Comment 10 Jan Stodola 2019-09-12 12:47:30 UTC
I wasn't able to reproduce it locally on RHEL-8.0.0 GA with dracut-049-10.git20190115.el8, so doing just a sanity verification.

rngd is installed in initrd created by dracut-049-26.git20190806.el8 and rngd is executed early during boot (before unlocking the LUKS device with rootfs):

Sep 12 08:36:30 localhost.localdomain rngd[205]: Initalizing available sources
Sep 12 08:36:30 localhost.localdomain rngd[205]: Initalizing entropy source hwrng
Sep 12 08:36:30 localhost.localdomain rngd[205]: Enabling RDSEED rng support
Sep 12 08:36:30 localhost.localdomain rngd[205]: Initalizing entropy source rdrand
Sep 12 08:36:30 localhost.localdomain systemd[1]: Started Setup Virtual Console.
Sep 12 08:36:30 localhost.localdomain systemd[1]: Starting dracut ask for additional cmdline parameters...
Sep 12 08:36:30 localhost.localdomain systemd[1]: Started dracut ask for additional cmdline parameters.
Sep 12 08:36:30 localhost.localdomain systemd[1]: Starting dracut cmdline hook...
Sep 12 08:36:30 localhost.localdomain dracut-cmdline[234]: dracut-8.0 (Ootpa) dracut-049-26.git20190806.el8
Sep 12 08:36:30 localhost.localdomain dracut-cmdline[234]: Using kernel command line parameters: rd.neednet=1 BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-80.el8>
Sep 12 08:36:31 localhost.localdomain rngd[205]: Enabling JITTER rng support
Sep 12 08:36:31 localhost.localdomain rngd[205]: Initalizing entropy source jitter
...
Sep 12 08:36:31 localhost.localdomain systemd[1]: Found device /dev/disk/by-uuid/92264f35-c964-4333-bb16-077b1822bc30.
Sep 12 08:36:31 localhost.localdomain systemd[1]: Starting Cryptography Setup for luks-92264f35-c964-4333-bb16-077b1822bc30...
...
Sep 12 08:36:31 localhost.localdomain dracut-initqueue[429]: Starting dhcp for interface enp1s0
Sep 12 08:36:31 localhost.localdomain dracut-initqueue[429]: dhcp: PREINIT enp1s0 up
Sep 12 08:36:31 localhost.localdomain dhclient[579]: Created duid "\000\004\267l\202:\351\353DO\241\343\370\005@\310\312\247".
Sep 12 08:36:31 localhost.localdomain dhclient[579]: DHCPDISCOVER on enp1s0 to 255.255.255.255 port 67 interval 4 (xid=0xd407f615)
Sep 12 08:36:31 localhost.localdomain dhclient[579]: DHCPREQUEST on enp1s0 to 255.255.255.255 port 67 (xid=0xd407f615)
Sep 12 08:36:31 localhost.localdomain dhclient[579]: DHCPOFFER from 192.168.122.1
Sep 12 08:36:31 localhost.localdomain dhclient[579]: DHCPACK from 192.168.122.1 (xid=0xd407f615)
Sep 12 08:36:31 localhost.localdomain dracut-initqueue[429]: dhcp: BOND setting enp1s0
Sep 12 08:36:32 localhost.localdomain kernel: ata1: SATA link down (SStatus 0 SControl 300)
Sep 12 08:36:32 localhost.localdomain kernel: ata3: SATA link down (SStatus 0 SControl 300)
Sep 12 08:36:32 localhost.localdomain kernel: ata4: SATA link down (SStatus 0 SControl 300)
Sep 12 08:36:32 localhost.localdomain kernel: ata2: SATA link down (SStatus 0 SControl 300)
Sep 12 08:36:32 localhost.localdomain kernel: ata5: SATA link down (SStatus 0 SControl 300)
Sep 12 08:36:32 localhost.localdomain kernel: ata6: SATA link down (SStatus 0 SControl 300)
Sep 12 08:36:32 localhost.localdomain kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0: link becomes ready
Sep 12 08:36:33 localhost.localdomain dhclient[579]: bound to 192.168.122.222 -- renewal in 1490 seconds.
Sep 12 08:36:34 localhost.localdomain dracut-initqueue[429]: RTNETLINK answers: File exists
Sep 12 08:36:34 localhost.localdomain systemd-cryptsetup[476]: Set cipher aes, mode xts-plain64, key size 512 bits for device /dev/disk/by-uuid/92264f35-c9>
Sep 12 08:36:36 localhost.localdomain systemd[1]: Received SIGRTMIN+20 from PID 436 (plymouthd).
Sep 12 08:36:37 localhost.localdomain kernel: device-mapper: crypt: xts(aes) using implementation "xts-aes-aesni"
Sep 12 08:36:37 localhost.localdomain systemd[1]: Started Cryptography Setup for luks-92264f35-c964-4333-bb16-077b1822bc30.
Sep 12 08:36:37 localhost.localdomain systemd[1]: Reached target Local Encrypted Volumes.
Sep 12 08:36:37 localhost.localdomain dracut-initqueue[429]: Scanning devices dm-0  for LVM logical volumes rhel00/root rhel00/swap
Sep 12 08:36:37 localhost.localdomain dracut-initqueue[429]: inactive '/dev/rhel00/swap' [1.00 GiB] inherit
Sep 12 08:36:37 localhost.localdomain dracut-initqueue[429]: inactive '/dev/rhel00/root' [<7.99 GiB] inherit
Sep 12 08:36:37 localhost.localdomain systemd[1]: Found device /dev/mapper/rhel00-root.

The prompt for entering the passphrase to unlock the encrypted device is visible for about 5 seconds, then it's unlocked automatically and the boot process continues.

Moving to VERIFIED.

Comment 13 errata-xmlrpc 2019-11-05 22:27:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3604

Comment 14 Sergio Correia 2020-01-09 15:11:55 UTC
*** Bug 1758643 has been marked as a duplicate of this bug. ***