Bug 1928537

Summary: Cannot IPI with tang/tpm disk encryption
Product: OpenShift Container Platform Reporter: Yuval Kashtan <ykashtan>
Component: InstallerAssignee: Beth White <beth.white>
Installer sub component: OpenShift on Bare Metal IPI QA Contact: Ori Michaeli <omichael>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: mstaeble, nstielau, rbartal, sdasu, tsedovic, wking
Version: 4.7Keywords: Triaged
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1930106 (view as bug list) Environment:
Last Closed: 2021-07-27 22:44:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1930106    

Description Yuval Kashtan 2021-02-14 19:30:01 UTC
Thanks for opening a bug report!
Before hitting the button, please fill in as much of the template below as you can.
If you leave out information, it's harder to help you.
Be ready for follow-up questions, and please respond in a timely manner.
If we can't reproduce a bug we might close your issue.
If we're wrong, PLEASE feel free to reopen it and explain why.

Version: ocp 4.7 nightly

$ openshift-install version
4.7

Platform: baremetal

Please specify: IPI

What happened?
I've tried to enable Tang Disk Encryption, according to https://github.com/openshift/openshift-docs/blob/enterprise-4.7/modules/installation-special-config-encrypt-disk-tang.adoc
ie with ignition 3.2 storage.luks entry.

this get me the following msg from machine-config-server on bootstrap VM:
```
E0212 12:52:47.137281       1 api.go:154] couldn't convert config for req: {master 0xc0008dc0c0}, error: unable to convert Ignition spec v3_2 config to v3_1: LUKS is not supported on 3.1
```

I think this is because (from same log):
```
I0212 12:52:51.138250       1 api.go:117] Pool master requested by address:"10.19.17.79:38840" User-Agent:"Go-http-client/1.1" Accept-Header: "application/vnd.coreos.ignition+json;version=3.1.0"
```
ie - masters requests version=3.1.0 explicitly which doesnt include this.

IIUC this is set here: https://github.com/openshift/installer/blob/abd33c58e751122e38070e6e022560339579bc93/pkg/tfvars/baremetal/baremetal.go#L183

and is requesting 3.1 because that's the version it imports: https://github.com/openshift/installer/blob/release-4.7/pkg/tfvars/baremetal/baremetal.go#L12

I even went further and created a quick fix, which seems to work: https://github.com/yuvalk/installer/commit/c44717a4745c390fcf1bc83c6bcec27952edd6e6



BTW: when trying with the old format https://github.com/openshift/openshift-docs/blob/enterprise-4.6/modules/installation-special-config-encrypt-disk-tang.adoc 
ie - with /etc/clevis.json
RHCOS refuse to boot because seems that rhcos4.7 (ships with ignition-2.9.0-2.rhaos4.7.git1d56dc8.el8.x86_64) refuses the old format and support just the new one



What did you expect to happen?

ocp to be installed with disks encrypted with clevis and tang

Comment 1 Yuval Kashtan 2021-02-14 19:33:38 UTC
my naive fixing attempt:
https://github.com/openshift/installer/pull/4653

Comment 4 W. Trevor King 2021-03-17 20:13:09 UTC
We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z.  The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way.  Sample answers are provided to give more context and the UpgradeBlocker keyword has been added to this bug.  The expectation is that the assignee answers these questions.

Who is impacted?  If we have to block upgrade edges based on this issue, which edges would need blocking?
* example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet
* example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time

What is the impact?  Is it serious enough to warrant blocking edges?
* example: Up to 2 minute disruption in edge routing
* example: Up to 90 seconds of API downtime
* example: etcd loses quorum and you have to restore from backup

How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)?
* example: Issue resolves itself after five minutes
* example: Admin uses oc to fix things
* example: Admin must SSH to hosts, restore from backups, or other non standard admin activities

Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)?
* example: No, it’s always been like this we just never noticed
* example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1

Comment 5 Yuval Kashtan 2021-03-17 20:28:04 UTC
> Who is impacted?
any customer utilizing disk encryption as well as any other ignition config shown here https://github.com/coreos/ignition/blob/master/docs/migrating-configs.md#from-version-310-to-320
I dont know if we have any data on how many such customers do we have

> What is the impact?
can't provision new nodes
for existing (upgraded) clusters, that also means cannot replace nodes so this can actually cause downtime and data loss

> How involved is remediation ?
I cant think of any way to resolve this beside amending the installer

> Is this a regression?
Yes, from any 4.6 to 4.7.0 till 4.7.z where this will be fixed (probably 4.7.4)

note:
disk encryption resolution also needs https://bugzilla.redhat.com/show_bug.cgi?id=1934863 and https://bugzilla.redhat.com/show_bug.cgi?id=1934557

Comment 6 Yuval Kashtan 2021-03-18 13:44:16 UTC
I've removed the UpgradeBlocker
as I was confused with all the related NBDE bugs.
and obviously mistaken, the installer bug cannot block upgrades.
I'll add the keyword on the other, relevant bugs (Such as the growfs one)

Comment 9 errata-xmlrpc 2021-07-27 22:44:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438