Bug 2038166 - Starting from Go 1.17 invalid certificates will render a cluster non-functional
Summary: Starting from Go 1.17 invalid certificates will render a cluster non-functional
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.10
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.10.0
Assignee: Pierre Prinetti
QA Contact: Itzik Brown
Depends On:
TreeView+ depends on / blocked
Reported: 2022-01-07 13:53 UTC by Pierre Prinetti
Modified: 2022-03-10 16:37 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Cause: HTTPS certificates only using the CommonName field are rejected in Go v1.17, and therefore in OpenShift 4.10. Consequence: When installing or upgrading a cluster to v4.10 on an OpenStack infrastructure which exposes endpoints with invalid certificates, the cluster will not be able to perform operations on OpenStack and possibly cease to function. Workaround (if any): Check and replace invalid HTTPS certificates. The documentation provides a script to check each certificate in the OpenStack catalog. To avoid disruption, replace the invalid certificates BEFORE installing or upgrading to OpenShift v4.10. Result: Once the invalid certificates are replaced by adding server names or IPs in the Subject Alternative Names fields, the operators should become functional again. Restarts might be necessary.
Clone Of:
Last Closed: 2022-03-10 16:37:38 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift installer pull 5544 0 None Merged Bug 2038166: openstack: Document legacy HTTPS cert validation 2022-01-25 08:51:40 UTC
Github openshift installer pull 5576 0 None Merged Bug 2038166: openstack: Fix invalid-https-certificate detection 2022-01-25 09:34:25 UTC
Github openshift openshift-docs pull 40785 0 None open [BZ2038166] Add ShiftStack legacy certs script 2022-01-25 09:34:28 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:37:54 UTC

Internal Links: 2040345

Description Pierre Prinetti 2022-01-07 13:53:25 UTC
NOTE: In this context, "invalid certificate" is defined as an HTTPS certificate that does not list the current domain name in the SAN field. This happens in legacy certificates where the alternate name appeared in the CN field instead.


OpenStack-specific operators for OpenShift 4.10 are being rebased on Kubernetes 1.23. This requires using Go 1.17.
However, starting with Go 1.17 support for "invalid certificates" is going to be removed, see https://go.dev/doc/go1.17: the support for the runtime environment variable `GODEBUG=x509ignoreCN=0` is removed.

The OCP base images incorporate that GODEBUG environment variable since Go 1.15[1], and therefore made OpenShift accept "invalid certificates".

This implies that starting from OpenShift 4.10 "invalid certificates" will not be trusted any more as they will fail verification.

When OpenStack exposes API endpoints with "invalid certificates", upgrading a cluster to 4.10 will render the OpenShift cluster non-functional, as operators will refuse to communicate with said API endpoint.

[1]: https://github.com/openshift/images/blob/1d2d8f002f4d1bde4e03269e0b0365fcd3d91cc3/base/Dockerfile.rhel#L24

Additional info:
* the originating control-plane bug: Bug 2031839
* draft enhancement proposal proposing to monitor HTTPS certificates and setting NoUpgrade on affected 4.9 clusters: https://github.com/openshift/enhancements/pull/980
* the patch to the Go programming language removing support for CommonName: https://cs.opensource.google/go/go/+/02ce4118219dc51a14680a0c5fa24cf6e73deeed:src/crypto/x509/verify.go;dlc=b211fe005860db3ceff5fd56af9951d6d1f44325

Comment 1 Pierre Prinetti 2022-01-07 14:05:41 UTC
I have marked this bug as Urgent because a decision WRT priority should be made urgently, but it should be mentioned that this decision could very well be WONTFIX.

Comment 2 Pierre Prinetti 2022-01-10 10:28:48 UTC
The team is currently considering three options:

1. moving forward with 1.23 rebases and accept the risk of breaking clusters given the release notes since 4.6 have said that certs must properly set SANs though we ended up not enforcing that until now
2. rush validation into 4.10 and backport to 4.9
3. write an external tool that validates OpenStack certificates and warns the user in case of "invalid certificates"

Comment 3 ShiftStack Bugwatcher 2022-01-13 07:04:16 UTC
Removing the Triaged keyword because:
* the priority assessment is missing

Comment 5 Pierre Prinetti 2022-01-18 10:01:25 UTC
The proposed patch documents the issue and provides a Bash script to validate the OpenStack infrastructure.

Comment 9 Itzik Brown 2022-01-25 09:18:52 UTC
Ran the script against a cluster without Subject Alternative Name and all the endpoints marked as failed.
OSP RHOS-16.1-RHEL-8-20211126.n.1

Comment 12 Max Bridges 2022-02-14 19:11:46 UTC
Downstream docs change is merged.

Comment 14 errata-xmlrpc 2022-03-10 16:37:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.