Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1819215

Summary: Cannot reboot into a tang encrypted disk
Product: OpenShift Container Platform Reporter: Michael Nguyen <mnguyen>
Component: DocumentationAssignee: Vikram Goyal <vigoyal>
Status: CLOSED CURRENTRELEASE QA Contact: Xiaoli Tian <xtian>
Severity: high Docs Contact: Vikram Goyal <vigoyal>
Priority: medium    
Version: 4.4CC: aos-bugs, bbreard, imcleod, jligon, jokerman, kalexand, miabbott, nstielau, smilner
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-09 21:57:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael Nguyen 2020-03-31 13:42:02 UTC
Description of problem:
I am able to enable Tang on first boot (and verify the disk is encrypted), but it will fail decrypting the disk after rebooting.

First boot has rd.neednet=1 that is removed after the first boot.  If I `rpm-ostree kargs --append rd.neednet=1` before the reboot it will work.  This is happening on rhcos-4.3 and rhcos-4.4


Version-Release number of selected component (if applicable):
rhcos-43.81.202001142154.0
rhcos-44.81.202003230949-0

How reproducible:
Always

Steps to Reproduce:
1. Enable tang using an ignition config with the following snippet

Ignition Snippet
-------------------------
"storage": {
  "files": [
    {
      "filesystem": "root",
      "path": "/etc/clevis.json",
      "contents": {
        "source": "data:text/plain;base64,<your base64 tang pin>"
       },
      "mode": 420
    }
  ]
}


Sample base64 tang config
---------------------------
cat << EOF | base64 -w0
{
  "url": "http://10.0.2.2",
  "thp": "ABCDEFGHIJKLMNO"
}


2. Boot RHCOS with the ignition file above
3. Verify Tang encryption is working after the system is booted
   `sudo cryptsetup luksDump /dev/disk/by-partlabel/luks_root`
4. Reboot
5. Verify system drops into the emergency shell and never completed booting into RHCOS.

Actual results:
System never reboots into RHCOS and drops into the emergency shell.

Expected results:
System reboots into RHCOS

Additional info:
https://docs.openshift.com/container-platform/4.3/installing/install_config/installing-customizing.html

Comment 1 Ben Howard 2020-03-31 19:47:51 UTC
This is a known limitation, see step three in the docs: 
https://docs.openshift.com/container-platform/4.3/installing/install_config/installing-customizing.html#installation-special-config-encrypt-disk-tang_installing-customizing

Can you verify that you have the `ip=` line?

Comment 2 Michael Nguyen 2020-04-01 13:31:22 UTC
Sorry I totally missed that part of the documentation.  

With ip=dhcp only, I don't get networking in the initramfs and coreos-luks-open.service fails.  See output below of the emergency shell.   If I run dhclient inside the emergency shell then restart coreos-luks-open.service it will boot into RHCOS.

In terms of purely kargs, adding `rd.neednet=1` is the only thing that worked for me.  I'm testing on libvirt if that makes any difference.


Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.


:/# 
:/# systemctl --failed
  UNIT                     LOAD   ACTIVE SUB    DESCRIPTION                    
● coreos-luks-open.service loaded failed failed CoreOS LUKS Opener             

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

1 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
:/# cat /proc/cmdline 
BOOT_IMAGE=(hd0,gpt1)/ostree/rhcos-25bd477a16f8c9f6ef9dbbac1e2ebf96254988bf7bbb8dcf29a393ca13d6523c/vmlinuz-4.18.0-147.5.1.el8_1.x86_64 ip=dhcp rhcos.root=crypt_rootfs console=tty0 console=ttyS0,115200n8 ignition.platform.id=qemu rd.luks.options=discard ostree=/ostree/boot.1/rhcos/25bd477a16f8c9f6ef9dbbac1e2ebf96254988bf7bbb8dcf29a393ca13d6523c/0
:/# journalctl -b -u coreos-luks-open.service

:/# journalctl -b -u coreos-luks-open.service --no-pager
-- Logs begin at Wed 2020-04-01 13:15:22 UTC, end at Wed 2020-04-01 13:15:55 UTC. --
Apr 01 13:15:24 localhost systemd[1]: Starting CoreOS LUKS Opener...
Apr 01 13:15:24 localhost coreos-cryptfs[634]: base64: invalid input
Apr 01 13:15:24 localhost coreos-cryptfs[634]: coreos-cryptfs: /dev/vda4 is configured for Clevis pin 'tang'
Apr 01 13:15:24 localhost coreos-cryptfs[634]: coreos-cryptfs: Checking for default route.
Apr 01 13:15:24 localhost coreos-cryptfs[634]: coreos-cryptfs: Waiting for DNS resolver to appear.
Apr 01 13:15:55 localhost coreos-cryptfs[634]: coreos-cryptfs: failed to find /etc/resolv.conf
Apr 01 13:15:55 localhost systemd[1]: coreos-luks-open.service: Main process exited, code=exited, status=1/FAILURE
Apr 01 13:15:55 localhost systemd[1]: coreos-luks-open.service: Failed with result 'exit-code'.
Apr 01 13:15:55 localhost systemd[1]: Failed to start CoreOS LUKS Opener.
Apr 01 13:15:55 localhost systemd[1]: coreos-luks-open.service: Triggering OnFailure= dependencies.

Comment 3 Micah Abbott 2020-04-06 17:10:17 UTC
(In reply to Michael Nguyen from comment #2)

> In terms of purely kargs, adding `rd.neednet=1` is the only thing that
> worked for me.  I'm testing on libvirt if that makes any difference.

Sounds like this might just be a docs issue; we probably want to instruct customers to provide both `ip=` and `rd.neednet=1`?

Mike, could you test this configuration with a static ip via `ip=` and the inclusion of `rd.neednet=1`?  I'm curious how they will play together.

Comment 5 Ben Howard 2020-04-14 15:36:25 UTC
For some background, `ip=...` is only activated when `rd.neednet=1` is used. See [1], which makes `ip=...` a noop with out the `rd.neednet=1`. 

We need to get the docs updated. 

[1] https://github.com/dracutdevs/dracut/blob/RHEL-8/modules.d/35network-legacy/net-genrules.sh#L3

Comment 6 Ben Howard 2020-04-14 15:36:55 UTC
For some background, `ip=...` is only activated when `rd.neednet=1` is used. See [1], which makes `ip=...` a noop with out the `rd.neednet=1`. 

We need to get the docs updated. 

[1] https://github.com/dracutdevs/dracut/blob/RHEL-8/modules.d/35network-legacy/net-genrules.sh#L3

Comment 7 Micah Abbott 2020-04-15 14:50:45 UTC
Doc update - https://github.com/openshift/openshift-docs/pull/21177

Comment 8 Steve Milner 2020-04-27 18:55:18 UTC
Chris asked for confirmation in https://github.com/openshift/openshift-docs/pull/21177

Comment 9 Micah Abbott 2020-05-22 13:28:57 UTC
This was a Docs fix (see attached PR), so moving to the Docs team to shepherd the BZ along.