Bug 2031839 - Starting from Go 1.17 invalid certificates will render a cluster dysfunctional
Summary: Starting from Go 1.17 invalid certificates will render a cluster dysfunctional
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 4.10
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.10.0
Assignee: Sergiusz Urbaniak
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On:
Blocks: 2036650 2037274 2052467 2054256 2055494
TreeView+ depends on / blocked
 
Reported: 2021-12-13 14:28 UTC by Sergiusz Urbaniak
Modified: 2022-03-12 04:39 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2036650 (view as bug list)
Environment:
Last Closed: 2022-03-12 04:39:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift library-go pull 1269 0 None Merged Bug 2031839: operator/metricscontroller: initial commit 2022-01-12 15:49:38 UTC
Github openshift library-go pull 1284 0 None Merged Bug 2031839: metricscontroller: handle empty vector query result 2022-01-12 15:49:42 UTC
Github openshift library-go pull 1290 0 None Merged Bug 2031839: pkg/operator/metricscontroller: remove legacy CN sync function 2022-01-19 14:21:49 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-12 04:39:46 UTC

Description Sergiusz Urbaniak 2021-12-13 14:28:38 UTC
OpenShift 4.10 is going to be rebased against Kubernetes 1.23. This requires using Go 1.17.
However, starting with Go 1.17 support for invalid certificates is going to be removed, see https://go.dev/doc/go1.17.
Formally, the temporary `GODEBUG=x509ignoreCN=0` flag has been removed.
This implies that starting from OpenShift 4.10 invalid certificates will not be trusted any more as they will fail verification.

Example:

Given the following certificate:
```
Certificate:
    Data:
        ...
        Subject: CN=foo-domain.com
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
```

Verification against the `foo-domain.com` hostname of such certificate will fail with the following error in Go 1.17:
```
x509: certificate relies on legacy Common Name field, use SANs instead
```

Verification of server certificates is executed during TLS client handshakes,
a TLS (https) client observing an invalid certificate will reject the connection attempt.

Cluster internal issued certificates are not affected,
however custom certificates can be configured in various cases:
- custom serving certificates for kube-apiserver
- custom API webhooks
- custom aggregated API endpoints
- custom certificates for route endpoints
- certificates of external auth identity providers

This will lead to broken connections to critical core parts of OpenShift and thus to a degraded cluster if invalid custom certificates are configured.

Comment 1 Sergiusz Urbaniak 2021-12-13 14:30:49 UTC
An OEP has been submitted with more details about mitigations: https://github.com/openshift/enhancements/pull/980

Comment 4 Xingxing Xia 2021-12-22 10:31:15 UTC
Will work on testing this.

Comment 5 Sergiusz Urbaniak 2021-12-22 13:35:02 UTC
sorry, this is not fixed yet just the initial parts and needs much more work, setting back to assigned.

Comment 6 Sergiusz Urbaniak 2022-01-05 09:42:52 UTC
Note: this bugzilla refers to changes that make sense to be merged in 4.10 and which need to be backported to 4.9.

Background: It doesn't make sense to have most changes present in OpenShift 4.10 as it is already based on Go 1.17.

Comment 10 Xingxing Xia 2022-01-13 15:26:32 UTC
Have read https://github.com/openshift/enhancements/pull/980/files , got to know this is a release blocker because 4.9 must implement related metrics and upgrade prevention, sorry for late allocating time on it :)
Checked related PRs and 4.9 PRs, got to know they intend to expose metrics when invalid non-SAN CN certs are used.

No test is needed for 4.10.

(
But tried to test `Verification against the `foo-domain.com` hostname of such certificate will fail with the following error in Go 1.17` of comment 0 with below cert that uses CN and no SAN:
Creating a customer apiserver cert (below openssl commands refer to https://github.com/giantswarm/grumpy/blob/instance_migration/gen_certs.sh):
# CREATE THE PRIVATE KEY FOR OUR CUSTOM CA
openssl genrsa -out certs/ca.key 2048

# GENERATE A CA CERT WITH THE PRIVATE KEY
openssl req -new -x509 -key certs/ca.key -out certs/ca.crt -config certs/ca_config.txt

# CREATE THE PRIVATE KEY FOR OUR SERVER
openssl genrsa -out certs/apiserver.key 2048

# CREATE A CSR FROM THE CONFIGURATION FILE AND OUR PRIVATE KEY
SERVER_HOST=`oc whoami --show-server | grep -o 'api[^:]*'`
openssl req -new -key certs/apiserver.key -subj "/CN=$SERVER_HOST" -out apiserver.csr -config certs/grumpy_config.txt

# CREATE THE CERT SIGNING THE CSR WITH THE CA CREATED BEFORE
openssl x509 -req -in apiserver.csr -CA certs/ca.crt -CAkey certs/ca.key -CAcreateserial -out certs/apiserver.crt

oc create secret tls api-certs --cert=certs/apiserver.crt --key=certs/apiserver.key -n openshift-config

This apiservert.crt is a custom cert of CN and no SAN, will be invalid in 4.10:
$ oc version
...
Server Version: 4.10.0-0.nightly-2022-01-13-061145
Kubernetes Version: v1.23.0+50f645e

oc patch --type=merge apiserver/cluster -p "
spec:
  servingCerts:
    namedCertificates:
    - servingCertificate:
        name: api-certs
"

But found KAS can rollout with new pods, and oc get co does not show abnormal thing, strange.
Though, checked the cert via `echo | openssl s_client -connect api...:6443`, its cert is not above custom one, this seems to mean the custom cert that uses CN and no SAN is not taking effect, i.e. it is invalid.
)

Comment 11 Sergiusz Urbaniak 2022-01-18 11:01:55 UTC
temporarily reassigning to remove code from 4.10/master.

Comment 13 Xingxing Xia 2022-01-21 00:55:01 UTC
Understood that 4.10 (master) does not need it given Go 1.17 already ensures it. Closing directly.

Comment 15 Xingxing Xia 2022-02-10 04:41:51 UTC
(In reply to Xingxing Xia from comment #10)
> Have read https://github.com/openshift/enhancements/pull/980/files , got to know this is a release blocker because 4.9 must implement related metrics and upgrade prevention, sorry for late allocating time on it :)
> Checked related PRs and 4.9 PRs, got to know they intend to expose metrics when invalid non-SAN CN certs are used.
> No test is needed for 4.10.
> But tried to test `Verification against the `foo-domain.com` hostname of such certificate will fail with the following error in Go 1.17` of comment 0 with below cert that uses CN and no SAN:
> Creating a customer apiserver cert (below openssl commands refer to https://github.com/giantswarm/grumpy/blob/instance_migration/gen_certs.sh):
> ...
> openssl req -new -key certs/apiserver.key -subj "/CN=$SERVER_HOST" -out apiserver.csr -config certs/grumpy_config.txt
> ...
> But found KAS can rollout with new pods, and oc get co does not show abnormal thing, strange.

My above cert had SAN set in grumpy_config.txt. That's why I got above strange result. Today re-commenting here with right verified 4.10 result of no-SAN cert: https://bugzilla.redhat.com/show_bug.cgi?id=2052467#c2 .

Comment 17 errata-xmlrpc 2022-03-12 04:39:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.