NOTE: In this context, "invalid certificate" is defined as an HTTPS certificate that does not list the current domain name in the SAN field. This happens in legacy certificates where the alternate name appeared in the CN field instead.
OpenStack-specific operators for OpenShift 4.10 are being rebased on Kubernetes 1.23. This requires using Go 1.17.
However, starting with Go 1.17 support for "invalid certificates" is going to be removed, see https://go.dev/doc/go1.17: the support for the runtime environment variable `GODEBUG=x509ignoreCN=0` is removed.
The OCP base images incorporate that GODEBUG environment variable since Go 1.15, and therefore made OpenShift accept "invalid certificates".
This implies that starting from OpenShift 4.10 "invalid certificates" will not be trusted any more as they will fail verification.
When OpenStack exposes API endpoints with "invalid certificates", upgrading a cluster to 4.10 will render the OpenShift cluster non-functional, as operators will refuse to communicate with said API endpoint.
* the originating control-plane bug: Bug 2031839
* draft enhancement proposal proposing to monitor HTTPS certificates and setting NoUpgrade on affected 4.9 clusters: https://github.com/openshift/enhancements/pull/980
* the patch to the Go programming language removing support for CommonName: https://cs.opensource.google/go/go/+/02ce4118219dc51a14680a0c5fa24cf6e73deeed:src/crypto/x509/verify.go;dlc=b211fe005860db3ceff5fd56af9951d6d1f44325
I have marked this bug as Urgent because a decision WRT priority should be made urgently, but it should be mentioned that this decision could very well be WONTFIX.
The team is currently considering three options:
1. moving forward with 1.23 rebases and accept the risk of breaking clusters given the release notes since 4.6 have said that certs must properly set SANs though we ended up not enforcing that until now
2. rush validation into 4.10 and backport to 4.9
3. write an external tool that validates OpenStack certificates and warns the user in case of "invalid certificates"
Removing the Triaged keyword because:
* the priority assessment is missing
The proposed patch documents the issue and provides a Bash script to validate the OpenStack infrastructure.
Ran the script against a cluster without Subject Alternative Name and all the endpoints marked as failed.
Downstream docs change is merged.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.