We're seeing rhcos-encrypt.service fail in a Packet.net provisioning - early analysis looks like some sort of race condition between rhcos-encrypt.service and the opener service around UUIDs.
Scott, can you give https://storage.cloud.google.com/walters-scratch/rhcos-walters-luks-43.81.201911212229.0-metal.x86_64.raw.gz a try ?
Got to an emergency shell again # systemctl --failed UNIT LOAD ACTIVE SUB DESCRIPTION ● coreos-encrypt.service loaded failed failed CoreOS Firstboot encryption of ro> :/# journalctl -b -u coreos-encrypt.service --no-pager -- Logs begin at Fri 2019-11-22 21:00:33 UTC, end at Fri 2019-11-22 21:01:17 UTC. -- Nov 22 21:00:49 localhost systemd[1]: Starting CoreOS Firstboot encryption of root device... Nov 22 21:00:49 localhost coreos-cryptfs[1655]: coreos-cryptfs: Fetching clevis config Nov 22 21:00:49 localhost coreos-cryptfs[1655]: coreos-cryptfs: No Clevis config provided Nov 22 21:00:49 localhost coreos-cryptfs[1655]: coreos-cryptfs: Detected bare metal system (virt none) Nov 22 21:00:49 localhost coreos-cryptfs[1655]: coreos-cryptfs: Enabling TPM requirement by default Nov 22 21:00:49 localhost coreos-cryptfs[1655]: coreos-cryptfs: detected pin=tpm2 Nov 22 21:00:49 localhost coreos-cryptfs[1655]: Token 0 is not in use. Nov 22 21:00:49 localhost systemd[1]: coreos-encrypt.service: Main process exited, code=exited, status=1/FAILURE Nov 22 21:00:49 localhost systemd[1]: coreos-encrypt.service: Failed with result 'exit-code'. Nov 22 21:00:49 localhost systemd[1]: Failed to start CoreOS Firstboot encryption of root device. Nov 22 21:00:49 localhost systemd[1]: coreos-encrypt.service: Triggering OnFailure= dependencies.
> Nov 22 21:00:49 localhost coreos-cryptfs[1655]: Token 0 is not in use. Is the operative thing here... :/# cryptsetup luksDump /dev/disk/by-partlabel/luks_root LUKS header information Version: 2 Epoch: 5 Metadata area: 16384 [bytes] Keyslots area: 16744448 [bytes] UUID: 00000000-0000-4000-a000-000000000002 Label: (no label) Subsystem: (no subsystem) Flags: (no flags) Data segments: 0: crypt offset: 16777216 [bytes] length: (whole device) cipher: aes-cbc-essiv:sha256 sector: 512 [bytes] Keyslots: 0: luks2 Key: 256 bits Priority: normal Cipher: aes-cbc-essiv:sha256 Cipher key: 256 bits PBKDF: argon2i Time cost: 10 Memory: 1048576 Threads: 4 Salt: 3a e4 b6 81 bd f2 7d d2 ee d0 ec ec 94 e8 58 6a 9d cf 45 55 23 76 ca 33 1a 91 75 5b 9b 6b d6 8b AF stripes: 4000 AF hash: sha256 Area offset:32768 [bytes] Area length:131072 [bytes] Digest ID: 0 Tokens: Digests: 0: pbkdf2 Hash: sha256 Iterations: 277107 Salt: e8 e6 d7 6e 57 a4 f6 46 fb d0 59 5b 80 2d 69 e0 f1 9b ce 58 9f b4 9d 83 9b 6a 64 74 bb ce 81 e0 Digest: 93 a8 db 24 d7 76 ce 2d 8a 51 ce 61 17 3d 14 f1 3e b4 e3 a9 3c 44 1f ca 92 e4 02 e5 e3 0c 8e 3d :/# Interesting; the disk is encrypted, but with no tokens at all?
Current attempt where I got serial console setup on first boot. # systemctl --failed UNIT LOAD ACTIVE SUB DESCRIPTION ● coreos-encrypt.service loaded failed failed CoreOS Firstboot encryption of ro> LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. 1 loaded units listed. Pass --all to see loaded but inactive units, too. To show all installed unit files use 'systemctl list-unit-files'. :/# journalctl -b --no-pager -u coreos-encrypt.service -- Logs begin at Fri 2019-11-22 22:09:05 UTC, end at Fri 2019-11-22 22:10:30 UTC. -- Nov 22 22:09:20 localhost systemd[1]: Starting CoreOS Firstboot encryption of root device... Nov 22 22:09:20 localhost coreos-cryptfs[1670]: coreos-cryptfs: Fetching clevis config Nov 22 22:09:20 localhost coreos-cryptfs[1670]: coreos-cryptfs: No Clevis config provided Nov 22 22:09:20 localhost coreos-cryptfs[1670]: coreos-cryptfs: Detected bare metal system (virt none) Nov 22 22:09:20 localhost coreos-cryptfs[1670]: coreos-cryptfs: Enabling TPM requirement by default Nov 22 22:09:20 localhost coreos-cryptfs[1670]: coreos-cryptfs: detected pin=tpm2 Nov 22 22:09:20 localhost coreos-cryptfs[1670]: coreos-cryptfs: Cleared token LUKS token on /dev/disk/by-partlabel/luks_root Nov 22 22:09:20 localhost coreos-cryptfs[1670]: coreos-cryptfs: generating new key Nov 22 22:09:38 localhost coreos-cryptfs[1670]: Reencryption will change: volume key, set cipher to aes-cbc-essiv:sha256. Nov 22 22:09:38 localhost coreos-cryptfs[1670]: LUKS2 header backup of device /dev/disk/by-partlabel/luks_root created. Nov 22 22:09:38 localhost coreos-cryptfs[1670]: New LUKS header for device /dev/disk/by-partlabel/luks_root created. Nov 22 22:09:38 localhost coreos-cryptfs[1670]: Key slot 0 created. Nov 22 22:09:38 localhost coreos-cryptfs[1670]: Setting LUKS2 offline reencrypt flag on device /dev/disk/by-partlabel/luks_root. Nov 22 22:09:38 localhost coreos-cryptfs[1670]: Activating temporary device using old LUKS header. Nov 22 22:09:38 localhost coreos-cryptfs[1670]: Activating temporary device using new LUKS header. Nov 22 22:09:38 localhost coreos-cryptfs[1670]: Progress: 36.4%, ETA 00:08, 960 MiB written, speed 191.6 MiB/s Nov 22 22:09:43 localhost coreos-cryptfs[1670]: Progress: 72.7%, ETA 00:03, 1920 MiB written, speed 191.8 MiB/s Nov 22 22:09:46 localhost coreos-cryptfs[1670]: Finished, time 00:13.761, 2639 MiB written, speed 191.8 MiB/s Nov 22 22:09:47 localhost coreos-cryptfs[1670]: LUKS2 header on device /dev/disk/by-partlabel/luks_root restored. Nov 22 22:09:47 localhost coreos-cryptfs[1670]: A TPM2 device with the in-kernel resource manager is needed! Nov 22 22:09:47 localhost systemd[1]: coreos-encrypt.service: Main process exited, code=exited, status=1/FAILURE Nov 22 22:09:47 localhost systemd[1]: coreos-encrypt.service: Failed with result 'exit-code'. Nov 22 22:09:47 localhost systemd[1]: Failed to start CoreOS Firstboot encryption of root device. Nov 22 22:09:47 localhost systemd[1]: coreos-encrypt.service: Triggering OnFailure= dependencies. :/# cryptsetup luksDump /dev/disk/by-partlabel/luks_root LUKS header information Version: 2 Epoch: 5 Metadata area: 16384 [bytes] Keyslots area: 16744448 [bytes] UUID: 00000000-0000-4000-a000-000000000002 Label: (no label) Subsystem: (no subsystem) Flags: (no flags) Data segments: 0: crypt offset: 16777216 [bytes] length: (whole device) cipher: aes-cbc-essiv:sha256 sector: 512 [bytes] Keyslots: 0: luks2 Key: 256 bits Priority: normal Cipher: aes-cbc-essiv:sha256 Cipher key: 256 bits PBKDF: argon2i Time cost: 10 Memory: 1048576 Threads: 4 Salt: 1d 90 aa 15 96 08 06 34 e0 07 95 5c 45 65 4f 12 ec 15 9d ca f5 52 e1 ed db 01 29 eb 5c a0 11 a2 AF stripes: 4000 AF hash: sha256 Area offset:32768 [bytes] Area length:131072 [bytes] Digest ID: 0 Tokens: Digests: 0: pbkdf2 Hash: sha256 Iterations: 276523 Salt: ce fd 46 fb ab a9 94 97 9c 52 a5 37 59 79 49 1e 89 32 a8 b1 b6 96 f6 f4 60 76 a6 6a d1 89 9d 98 Digest: 67 66 35 8d 34 0a 62 79 35 80 4e c9 03 33 d6 48 4c 1d 0e f9 32 8a d6 89 27 00 ad 33 be 63 43 ed
*** Bug 1777035 has been marked as a duplicate of this bug. ***
We should decide whether we want to support TPM1.2, and if not...let's make the error more obvious at least.
Realistically we should be supporting TPM 2.0 for clevis. .....but we still have to boot on hardware with a 1.2 chip.
*** Bug 1773108 has been marked as a duplicate of this bug. ***
Scott, any chance you have tried to re-test with the latest 4.4 boot images on the Packet hosts with TPM 1.2?
Yes, both 4.4 and 4.3 are installing fine on the previously used hardware on Packet.
Thanks Scott! Moving to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581