Bug 1833335
| Summary: | RHCOS not starting with TPM encryption enabled | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | David Sanz <dsanzmor> | |
| Component: | RHCOS | Assignee: | Jonathan Lebon <jlebon> | |
| Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.5 | CC: | imcleod, jlebon, jligon, mnguyen, nstielau, walters, wsun, yufchang | |
| Target Milestone: | --- | Keywords: | Reopened | |
| Target Release: | 4.5.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Previously, using RHCOS encryption via TPM would sometimes cause boot failures due to lack of entropy. Now, RHCOS has access to more entropy in early boot by allowing the kernel to leverage CPU support for random number generation if available.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1842980 (view as bug list) | Environment: | ||
| Last Closed: | 2020-07-13 17:36:10 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1842980 | |||
|
Description
David Sanz
2020-05-08 13:13:34 UTC
Ahh OK, I see you have console logs for the bootstrap boot, so I'm guessing this is *not* on Packet, right? Does this reproduce in a VM with a virtual TPM? Will try it out. OK, I'm fairly confident this is another entropy issue. The logs here are a telltale sign: ``` May 08 14:13:30 localhost systemd[1]: Starting dracut pre-mount hook... May 08 14:13:30 localhost coreos-cryptfs[1275]: coreos-cryptfs: /dev/sdc4 is configured for Clevis pin 'tpm2' May 08 14:13:30 localhost systemd[1]: Started dracut pre-mount hook. May 08 14:14:56 localhost systemd-journald[664]: Missed 3 kernel messages May 08 14:14:56 localhost kernel: random: crng init done May 08 14:14:56 localhost kernel: random: 7 urandom warning(s) missed due to ratelimiting May 08 14:14:58 localhost systemd[1]: dev-disk-by\x2dlabel-root.device: Job dev-disk-by\x2dlabel-root.device/start timed out. May 08 14:14:58 localhost systemd[1]: Timed out waiting for device dev-disk-by\x2dlabel-root.device. ``` Note the `random: crng init done` message, which happens a full 1m30s later. One can easily reproduce this locally by provisioning a VM without a virtio-rng device and turning on TPM2 encryption. It does not reproduce with a virtio-rng device attached. *** This bug has been marked as a duplicate of bug 1778762 *** Note we merged https://gitlab.cee.redhat.com/coreos/redhat-coreos/merge_requests/937 into master, which is targeting 4.6. Once that's in an ART build, we can sanity check that it helps with TPM testing on Packet and backport to 4.5 and 4.4. Let's re-open this as one tracker for OCP issue,so that any OCP issue will not be missing from OCP 4.5 release blocker list.. Once the https://bugzilla.redhat.com/show_bug.cgi?id=1778762 , OpenShift QE will check this bug and verify it. Working on this. Nodes sfails to boot on 45.81.202005200134-0 Only adding following parameter in the grub linux line, encryption is complete and host boots correctly: `random.trust_cpu=on` I can confirm that 4.5.0-0.nightly-2020-05-22-111153 using recent RHCOS version can complete TPM enabled installation. You can mark this bug as verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |