1833335 – RHCOS not starting with TPM encryption enabled

Bug 1833335 - RHCOS not starting with TPM encryption enabled

Summary: RHCOS not starting with TPM encryption enabled

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	RHCOS
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Jonathan Lebon
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1842980
TreeView+	depends on / blocked

Reported:	2020-05-08 13:13 UTC by David Sanz
Modified:	2023-09-14 05:57 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, using RHCOS encryption via TPM would sometimes cause boot failures due to lack of entropy. Now, RHCOS has access to more entropy in early boot by allowing the kernel to leverage CPU support for random number generation if available.
Clone Of:
Clones:	1842980 (view as bug list)
Environment:
Last Closed:	2020-07-13 17:36:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:2409	0	None	None	None	2020-07-13 17:36:28 UTC

Description David Sanz 2020-05-08 13:13:34 UTC

Description of problem:
Installing cluster using 4.5.0-0.nightly-2020-05-08-075620.

When any host is configured to enable TPM disk encryption, encryption success on the bootstrap RHCOS version 44.81.202004250133-0 but, after it is automatically updated to 45.81.202005080327-0 (version required by the MCO),  host never comes up.


Attached:
 - console log for booting on 44.81.202004250133-0
 - journal log during upgrade from 44.81.202004250133-0 to 45.81.202005080327-0


Version-Release number of selected component (if applicable):
Installer: 4.5.0-0.nightly-2020-05-08-075620
Bootstrap RHCOS Version: 44.81.202004250133-0
MCO RHCOS Version: 45.81.202005080327-0

How reproducible:


Steps to Reproduce:
1.Create manifest to enable TPM encryption
2.Install cluster on baremetal
3.Watch for host installation

Actual results:
Installation is completed. Server boot on bootstrap RHCOS version and encrypt disk, then it reboots and boot MCO RHCOS version, but never ends

Expected results:
Server boots on MCO RHCOS version with disk encrypted

Additional info:

Comment 3 Jonathan Lebon 2020-05-08 13:49:57 UTC

Ahh OK, I see you have console logs for the bootstrap boot, so I'm guessing this is *not* on Packet, right?
Does this reproduce in a VM with a virtual TPM? Will try it out.

Comment 6 Jonathan Lebon 2020-05-08 18:15:38 UTC

OK, I'm fairly confident this is another entropy issue. The logs here are a telltale sign:

```
May 08 14:13:30 localhost systemd[1]: Starting dracut pre-mount hook...
May 08 14:13:30 localhost coreos-cryptfs[1275]: coreos-cryptfs: /dev/sdc4 is configured for Clevis pin 'tpm2'
May 08 14:13:30 localhost systemd[1]: Started dracut pre-mount hook.
May 08 14:14:56 localhost systemd-journald[664]: Missed 3 kernel messages
May 08 14:14:56 localhost kernel: random: crng init done
May 08 14:14:56 localhost kernel: random: 7 urandom warning(s) missed due to ratelimiting
May 08 14:14:58 localhost systemd[1]: dev-disk-by\x2dlabel-root.device: Job dev-disk-by\x2dlabel-root.device/start timed out.
May 08 14:14:58 localhost systemd[1]: Timed out waiting for device dev-disk-by\x2dlabel-root.device.
```

Note the `random: crng init done` message, which happens a full 1m30s later.

One can easily reproduce this locally by provisioning a VM without a virtio-rng device and turning on TPM2 encryption. It does not reproduce with a virtio-rng device attached.

*** This bug has been marked as a duplicate of bug 1778762 ***

Comment 7 Jonathan Lebon 2020-05-14 14:22:19 UTC

Note we merged https://gitlab.cee.redhat.com/coreos/redhat-coreos/merge_requests/937 into master, which is targeting 4.6. Once that's in an ART build, we can sanity check that it helps with TPM testing on Packet and backport to 4.5 and 4.4.

Comment 8 Wei Sun 2020-05-18 09:49:25 UTC

Let's re-open this as one tracker for OCP issue,so that any OCP issue will not be missing from OCP 4.5 release blocker list.. Once the https://bugzilla.redhat.com/show_bug.cgi?id=1778762 , OpenShift QE will check this bug and verify it.

Comment 13 David Sanz 2020-05-20 13:19:15 UTC

Working on this.

Nodes sfails to boot on 45.81.202005200134-0

Only adding following parameter in the grub linux line, encryption is complete and host boots correctly: `random.trust_cpu=on`

Comment 14 David Sanz 2020-05-22 13:50:34 UTC

I can confirm that 4.5.0-0.nightly-2020-05-22-111153 using recent RHCOS version can complete TPM enabled installation.

You can mark this bug as verified

Comment 15 errata-xmlrpc 2020-07-13 17:36:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Comment 16 Red Hat Bugzilla 2023-09-14 05:57:40 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.