Bug 1928537 - Cannot IPI with tang/tpm disk encryption
Summary: Cannot IPI with tang/tpm disk encryption
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Beth White
QA Contact: Ori Michaeli
URL:
Whiteboard:
Depends On:
Blocks: 1930106
TreeView+ depends on / blocked
 
Reported: 2021-02-14 19:30 UTC by Yuval Kashtan
Modified: 2021-07-27 22:45 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1930106 (view as bug list)
Environment:
Last Closed: 2021-07-27 22:44:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4653 0 None closed Bug 1928537: bump ignition to v3_2 2021-02-21 07:28:30 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:45:11 UTC

Description Yuval Kashtan 2021-02-14 19:30:01 UTC
Thanks for opening a bug report!
Before hitting the button, please fill in as much of the template below as you can.
If you leave out information, it's harder to help you.
Be ready for follow-up questions, and please respond in a timely manner.
If we can't reproduce a bug we might close your issue.
If we're wrong, PLEASE feel free to reopen it and explain why.

Version: ocp 4.7 nightly

$ openshift-install version
4.7

Platform: baremetal

Please specify: IPI

What happened?
I've tried to enable Tang Disk Encryption, according to https://github.com/openshift/openshift-docs/blob/enterprise-4.7/modules/installation-special-config-encrypt-disk-tang.adoc
ie with ignition 3.2 storage.luks entry.

this get me the following msg from machine-config-server on bootstrap VM:
```
E0212 12:52:47.137281       1 api.go:154] couldn't convert config for req: {master 0xc0008dc0c0}, error: unable to convert Ignition spec v3_2 config to v3_1: LUKS is not supported on 3.1
```

I think this is because (from same log):
```
I0212 12:52:51.138250       1 api.go:117] Pool master requested by address:"10.19.17.79:38840" User-Agent:"Go-http-client/1.1" Accept-Header: "application/vnd.coreos.ignition+json;version=3.1.0"
```
ie - masters requests version=3.1.0 explicitly which doesnt include this.

IIUC this is set here: https://github.com/openshift/installer/blob/abd33c58e751122e38070e6e022560339579bc93/pkg/tfvars/baremetal/baremetal.go#L183

and is requesting 3.1 because that's the version it imports: https://github.com/openshift/installer/blob/release-4.7/pkg/tfvars/baremetal/baremetal.go#L12

I even went further and created a quick fix, which seems to work: https://github.com/yuvalk/installer/commit/c44717a4745c390fcf1bc83c6bcec27952edd6e6



BTW: when trying with the old format https://github.com/openshift/openshift-docs/blob/enterprise-4.6/modules/installation-special-config-encrypt-disk-tang.adoc 
ie - with /etc/clevis.json
RHCOS refuse to boot because seems that rhcos4.7 (ships with ignition-2.9.0-2.rhaos4.7.git1d56dc8.el8.x86_64) refuses the old format and support just the new one



What did you expect to happen?

ocp to be installed with disks encrypted with clevis and tang

Comment 1 Yuval Kashtan 2021-02-14 19:33:38 UTC
my naive fixing attempt:
https://github.com/openshift/installer/pull/4653

Comment 4 W. Trevor King 2021-03-17 20:13:09 UTC
We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z.  The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way.  Sample answers are provided to give more context and the UpgradeBlocker keyword has been added to this bug.  The expectation is that the assignee answers these questions.

Who is impacted?  If we have to block upgrade edges based on this issue, which edges would need blocking?
* example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet
* example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time

What is the impact?  Is it serious enough to warrant blocking edges?
* example: Up to 2 minute disruption in edge routing
* example: Up to 90 seconds of API downtime
* example: etcd loses quorum and you have to restore from backup

How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)?
* example: Issue resolves itself after five minutes
* example: Admin uses oc to fix things
* example: Admin must SSH to hosts, restore from backups, or other non standard admin activities

Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not increase exposure)?
* example: No, it’s always been like this we just never noticed
* example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1

Comment 5 Yuval Kashtan 2021-03-17 20:28:04 UTC
> Who is impacted?
any customer utilizing disk encryption as well as any other ignition config shown here https://github.com/coreos/ignition/blob/master/docs/migrating-configs.md#from-version-310-to-320
I dont know if we have any data on how many such customers do we have

> What is the impact?
can't provision new nodes
for existing (upgraded) clusters, that also means cannot replace nodes so this can actually cause downtime and data loss

> How involved is remediation ?
I cant think of any way to resolve this beside amending the installer

> Is this a regression?
Yes, from any 4.6 to 4.7.0 till 4.7.z where this will be fixed (probably 4.7.4)

note:
disk encryption resolution also needs https://bugzilla.redhat.com/show_bug.cgi?id=1934863 and https://bugzilla.redhat.com/show_bug.cgi?id=1934557

Comment 6 Yuval Kashtan 2021-03-18 13:44:16 UTC
I've removed the UpgradeBlocker
as I was confused with all the related NBDE bugs.
and obviously mistaken, the installer bug cannot block upgrades.
I'll add the keyword on the other, relevant bugs (Such as the growfs one)

Comment 9 errata-xmlrpc 2021-07-27 22:44:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.