Bug 1797123
Summary: | Cluster-version operator loads proxy certs from the trustedCA source, and so is vulnerable to data-entry errors | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> |
Component: | Cluster Version Operator | Assignee: | W. Trevor King <wking> |
Status: | CLOSED ERRATA | QA Contact: | liujia <jiajliu> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.4 | CC: | anowak, aos-bugs, jokerman |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: The cluster-version operator used to load trusted trusted CAs from the ConfigMap referenced by the Proxy configuration's trustedCA property.
Consequence: The referenced ConfigMap is user maintained, so a user setting corrupted certificates would break cluster-version operator access to the proxy.
Fix: The cluster-version operator now loads trusted CAs from openshift-config-managed/trusted-ca-bundle, which the network operator populates when the Proxy configuration's referenced trustedCA ConfigMap is valid.
Result: If a user corrupts the referenced trustedCA ConfigMap, the network operator will not copy the corrupted content into openshift-config-managed/trusted-ca-bundle, and the cluster-version operator will continue to connect to the proxy using those old certificates.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 15:55:05 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
W. Trevor King
2020-01-31 23:57:25 UTC
Some discussion around the MCO making a similar pivot to the post-validator trusted-ca-bundle in bug 1784201. Lala and Ben were going to take a look, but don't seem to have had time yet (although Abinav approved it 3d ago [1]). So waiting on review, which is unlikely to happen today. Adding UpcomingSprint. [1]: https://github.com/openshift/cluster-version-operator/pull/311#issuecomment-655024387 ON_QA, but might get kicked back to me if verification fails, so adding UpcomingSprint. Hopefully the last time for this bug :) Hi W. Trevor King Could u help provide some background information for this bug? I'm not sure how should I check it? Or some reproduce steps are also welcome:) Two things that would be useful to check: * Narrowly, for this bug: 1. Install a cluster with Proxy configured appropriately, including a valid trustedCA. 2. Edit the trustedCA-referenced ConfigMap to inject some garbage, non-PEM content. 3. Before this change, that should break the CVO's attempts to connect to the configured upstream (Cincinnati) service, and also its ability to connect to external HTTPS signature stores. With this change, the CVO will continue to connect to both locations using the previous, good trustedCA content. The network operator, which is in charge of copying the trustedCA-referenced ConfigMap into openshift-config-managed/trusted-ca-bundle, will ideally complain about the new, broken trustedCA-referenced ConfigMap. If it doesn't, probably file a bug against them. * More broadly, as a workaround for bug 1773419, you can: 1. Set up a Cincinnati service using a TLS certificate signed by a non-standard CA. 2. Configure Proxy with a trustedCA-referenced ConfigMap that includes the non-standard CA. 3. Update your ClusterVersion.spec.upstream to point at your service. 4. Before this change, that should break the CVO's attempts to connect to the configured upstream, because without httpsProxy set in Proxy, the CVO would ignore the configured trustedCA. With this change, the CVO will connect to the configured upstream, trusting its non-standard-CA-signed certificate. The same handling applies to HTTPS signature fetches, but because the base URIs for those are not easily configurable, it's a bit harder to test. If you feel so inclined, you could verify this angle by configuring networking to route all outgoing HTTPS traffic through a proxy with a non-standard-CA-signed certificate, and continue to leave httpsProxy unset. Ah, left out in comment 7 for the narrow case: you'll want to set httpsProxy, at least for the "before this change" case, because without it the CVO will completely ignore the trustedCA-referenced ConfigMap. Not clear to me what went wrong in the latest verification attempt. Will continue to look next sprint. Pointing a recent build at the Kube API as an "upstream" (because it's self-signed, and we can distinguish between "X.509 failure" and "not Cincy JSON" in the error message): $ oc get -o jsonpath='{.status.desired.version}{"\n"}' clusterversion version 4.6.0-0.ci-2020-08-20-163422 $ oc get -o jsonpath='{.spec.trustedCA}{"\n"}' proxy cluster map[name:] $ yaml2json < "${KUBECONFIG}" | jq -r '.clusters[0].cluster.server' https://api.ci-ln-5fqp6qt-f76d1.origin-ci-int-gce.dev.openshift.com:6443 $ oc patch clusterversion version --type json -p '[{"op": "add", "path": "/spec/channel", "value": "whatever"},{"op": "add", "path": "/spec/upstream", "value": "https://api.ci-ln-5fqp6qt-f76d1.origin-ci-int-gce.dev.openshift.com:6443"}]' $ oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' | sort 2020-08-21T22:11:03Z RetrievedUpdates=False RemoteFailed: Unable to retrieve available updates: Get "https://api.ci-ln-5fqp6qt-f76d1.origin-ci-int-gce.dev.openshift.com:6443?arch=amd64&channel=whatever&id=662fb0a6-04c7-4363-8c11-8c6fd9d2d5e8&version=4.6.0-0.ci-2020-08-20-163422": x509: certificate signed by unknown authority 2020-08-21T22:50:03Z Failing=False : 2020-08-21T22:50:33Z Available=True : Done applying 4.6.0-0.ci-2020-08-20-163422 2020-08-21T22:50:33Z Progressing=False : Cluster version is 4.6.0-0.ci-2020-08-20-163422 So "x509: certificate signed by unknown authority". Good. Now fill in the Proxy trustedCA. $ yaml2json < "${KUBECONFIG}" | jq -r '.clusters[0].cluster["certificate-authority-data"]' | base64 -d >ca-bundle.crt $ head -n2 ca-bundle.crt -----BEGIN CERTIFICATE----- MIIDkjCCAnqgAwIBAgIIWm+b1NvzbOYwDQYJKoZIhvcNAQELBQAwJjEkMCIGA1UE $ oc -n openshift-config create configmap user-ca-bundle --from-file=ca-bundle.crt $ oc patch proxy cluster --type json -p '[{"op": "add", "path": "/spec/trustedCA/name", "value": "user-ca-bundle"}]' $ sleep 20 # or whatever, waiting for the network manager to populate the trusted bundle $ diff -u ca-bundle.crt <(oc -n openshift-config-managed get -o json configmap trusted-ca-bundle | jq -r '.data["ca-bundle.crt"]') | head -n7 --- ca-bundle.crt 2020-08-21 15:58:33.146533690 -0700 +++ /dev/fd/63 2020-08-21 16:18:40.399523102 -0700 @@ -98,3 +98,3754 @@ 1eDPBGGc2pxk2eshDeX4THjpzF+GWksGmYc+5Az6+Qd7ImYDKReFnbPQz3OIDcq+ egBKR65U -----END CERTIFICATE----- +# ACCVRAIZ1 But I was still seeing the "certificate signed by unknown authority". Poking around locally, I think the issue is cvo#441 (fixup PR). With that in place, the flow above results in: $ oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' | sort 2020-08-21T22:11:03Z RetrievedUpdates=False ResponseFailed: Unable to retrieve available updates: unexpected HTTP status: 403 Forbidden ... Which makes sense, because the CVO making an upstream request is hitting the Kube API server, which says "who are you?". Anyhow, it shows that the CVO trusts the X.509 cert guarding the Kube API endpoint. Verified on 4.6.0-0.nightly-2020-08-25-222652, it works well for both of scenarios. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 *** Bug 1918816 has been marked as a duplicate of this bug. *** |