Bug 1797123 - Cluster-version operator loads proxy certs from the trustedCA source, and so is vulnerable to data-entry errors
Summary: Cluster-version operator loads proxy certs from the trustedCA source, and so ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: W. Trevor King
QA Contact: liujia
URL:
Whiteboard:
: 1918816 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-31 23:57 UTC by W. Trevor King
Modified: 2021-01-21 19:36 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The cluster-version operator used to load trusted trusted CAs from the ConfigMap referenced by the Proxy configuration's trustedCA property. Consequence: The referenced ConfigMap is user maintained, so a user setting corrupted certificates would break cluster-version operator access to the proxy. Fix: The cluster-version operator now loads trusted CAs from openshift-config-managed/trusted-ca-bundle, which the network operator populates when the Proxy configuration's referenced trustedCA ConfigMap is valid. Result: If a user corrupts the referenced trustedCA ConfigMap, the network operator will not copy the corrupted content into openshift-config-managed/trusted-ca-bundle, and the cluster-version operator will continue to connect to the proxy using those old certificates.
Clone Of:
Environment:
Last Closed: 2020-10-27 15:55:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 311 0 None closed Bug 1797123: pkg/cvo: Fetch proxy CA certs from openshift-config-managed/trusted-ca-bundle 2021-02-14 21:31:19 UTC
Github openshift cluster-version-operator pull 441 0 None closed Bug 1797123: pkg/cvo: Separate ConfigMap informer for openshift-config-managed 2021-02-14 21:31:20 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 15:55:31 UTC

Description W. Trevor King 2020-01-31 23:57:25 UTC
The API docs [1] recommend avoiding trustedCA (unless you happen to be the "proxy validator") and instead pulling the trust bundle from this managed namespace.  This protects from cluster-admin data-entry errors, because openshift-config-managed/trusted-ca-bundle will remain unchanged until the proxy validator sees a config in openshift-config/$trustedCA that it considers valid.  We should pivot the CVO to load the proxy certs from trusted-ca-bundle instead.

[1]: https://github.com/openshift/api/blob/f2a771e1a90ceb4e65f1ca2c8b11fc1ac6a66da8/config/v1/types_proxy.go#L44-L52

Comment 1 W. Trevor King 2020-02-01 00:06:36 UTC
Some discussion around the MCO making a similar pivot to the post-validator trusted-ca-bundle in bug 1784201.

Comment 2 W. Trevor King 2020-07-10 19:43:52 UTC
Lala and Ben were going to take a look, but don't seem to have had time yet (although Abinav approved it 3d ago [1]).  So  waiting on review, which is unlikely to happen today.  Adding UpcomingSprint.

[1]: https://github.com/openshift/cluster-version-operator/pull/311#issuecomment-655024387

Comment 5 W. Trevor King 2020-08-01 05:33:57 UTC
ON_QA, but might get kicked back to me if verification fails, so adding UpcomingSprint.  Hopefully the last time for this bug :)

Comment 6 liujia 2020-08-04 08:22:55 UTC
Hi W. Trevor King

Could u help provide some background information for this bug? I'm not sure how should I check it? Or some reproduce steps are also welcome:)

Comment 7 W. Trevor King 2020-08-04 22:23:04 UTC
Two things that would be useful to check:

* Narrowly, for this bug:
  1. Install a cluster with Proxy configured appropriately, including a valid trustedCA.
  2. Edit the trustedCA-referenced ConfigMap to inject some garbage, non-PEM content.
  3. Before this change, that should break the CVO's attempts to connect to the configured upstream (Cincinnati) service, and also its ability to connect to external HTTPS signature stores.  With this change, the CVO will continue to connect to both locations using the previous, good trustedCA content.

     The network operator, which is in charge of copying the trustedCA-referenced ConfigMap into openshift-config-managed/trusted-ca-bundle, will ideally complain about the new, broken trustedCA-referenced ConfigMap.  If it doesn't, probably file a bug against them.

* More broadly, as a workaround for bug 1773419, you can:
  1. Set up a Cincinnati service using a TLS certificate signed by a non-standard CA.
  2. Configure Proxy with a trustedCA-referenced ConfigMap that includes the non-standard CA.
  3. Update your ClusterVersion.spec.upstream to point at your service.
  4. Before this change, that should break the CVO's attempts to connect to the configured upstream, because without httpsProxy set in Proxy, the CVO would ignore the configured trustedCA.  With this change, the CVO will connect to the configured upstream, trusting its non-standard-CA-signed certificate.

     The same handling applies to HTTPS signature fetches, but because the base URIs for those are not easily configurable, it's a bit harder to test.  If you feel so inclined, you could verify this angle by configuring networking to route all outgoing HTTPS traffic through a proxy with a non-standard-CA-signed certificate, and continue to leave httpsProxy unset.

Comment 8 W. Trevor King 2020-08-04 22:25:20 UTC
Ah, left out in comment 7 for the narrow case: you'll want to set httpsProxy, at least for the "before this change" case, because without it the CVO will completely ignore the trustedCA-referenced ConfigMap.

Comment 14 W. Trevor King 2020-08-21 22:25:26 UTC
Not clear to me what went wrong in the latest verification attempt.  Will continue to look next sprint.

Comment 15 W. Trevor King 2020-08-22 00:04:29 UTC
Pointing a recent build at the Kube API as an "upstream" (because it's self-signed, and we can distinguish between "X.509 failure" and "not Cincy JSON" in the error message):

$ oc get -o jsonpath='{.status.desired.version}{"\n"}' clusterversion version
4.6.0-0.ci-2020-08-20-163422
$ oc get -o jsonpath='{.spec.trustedCA}{"\n"}' proxy cluster
map[name:]
$ yaml2json < "${KUBECONFIG}" | jq -r '.clusters[0].cluster.server'
https://api.ci-ln-5fqp6qt-f76d1.origin-ci-int-gce.dev.openshift.com:6443
$ oc patch clusterversion version --type json -p '[{"op": "add", "path": "/spec/channel", "value": "whatever"},{"op": "add", "path": "/spec/upstream", "value": "https://api.ci-ln-5fqp6qt-f76d1.origin-ci-int-gce.dev.openshift.com:6443"}]'
$ oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' | sort
2020-08-21T22:11:03Z RetrievedUpdates=False RemoteFailed: Unable to retrieve available updates: Get "https://api.ci-ln-5fqp6qt-f76d1.origin-ci-int-gce.dev.openshift.com:6443?arch=amd64&channel=whatever&id=662fb0a6-04c7-4363-8c11-8c6fd9d2d5e8&version=4.6.0-0.ci-2020-08-20-163422": x509: certificate signed by unknown authority
2020-08-21T22:50:03Z Failing=False : 
2020-08-21T22:50:33Z Available=True : Done applying 4.6.0-0.ci-2020-08-20-163422
2020-08-21T22:50:33Z Progressing=False : Cluster version is 4.6.0-0.ci-2020-08-20-163422

So "x509: certificate signed by unknown authority".  Good.  Now fill in the Proxy trustedCA.

$ yaml2json < "${KUBECONFIG}" | jq -r '.clusters[0].cluster["certificate-authority-data"]' | base64 -d >ca-bundle.crt
$ head -n2 ca-bundle.crt
-----BEGIN CERTIFICATE-----
MIIDkjCCAnqgAwIBAgIIWm+b1NvzbOYwDQYJKoZIhvcNAQELBQAwJjEkMCIGA1UE
$ oc -n openshift-config create configmap user-ca-bundle --from-file=ca-bundle.crt
$ oc patch proxy cluster --type json -p '[{"op": "add", "path": "/spec/trustedCA/name", "value": "user-ca-bundle"}]'
$ sleep 20  # or whatever, waiting for the network manager to populate the trusted bundle
$ diff -u ca-bundle.crt <(oc -n openshift-config-managed get -o json configmap trusted-ca-bundle | jq -r '.data["ca-bundle.crt"]') | head -n7
--- ca-bundle.crt	2020-08-21 15:58:33.146533690 -0700
+++ /dev/fd/63	2020-08-21 16:18:40.399523102 -0700
@@ -98,3 +98,3754 @@
 1eDPBGGc2pxk2eshDeX4THjpzF+GWksGmYc+5Az6+Qd7ImYDKReFnbPQz3OIDcq+
 egBKR65U
 -----END CERTIFICATE-----
+# ACCVRAIZ1

But I was still seeing the "certificate signed by unknown authority".  Poking around locally, I think the issue is cvo#441 (fixup PR).  With that in place, the flow above results in:

$ oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' | sort
2020-08-21T22:11:03Z RetrievedUpdates=False ResponseFailed: Unable to retrieve available updates: unexpected HTTP status: 403 Forbidden
...

Which makes sense, because the CVO making an upstream request is hitting the Kube API server, which says "who are you?".  Anyhow, it shows that the CVO trusts the X.509 cert guarding the Kube API endpoint.

Comment 17 liujia 2020-08-27 02:44:40 UTC
Verified on 4.6.0-0.nightly-2020-08-25-222652, it works well for both of scenarios.

Comment 19 errata-xmlrpc 2020-10-27 15:55:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 20 W. Trevor King 2021-01-21 19:36:43 UTC
*** Bug 1918816 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.