Bug 1773419 - OpenShift Cluster Version Operator doesn't correctly mount SSL certificates from host preventing cluster version update in MITM scenario
Summary: OpenShift Cluster Version Operator doesn't correctly mount SSL certificates f...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.2.z
Hardware: All
OS: All
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: W. Trevor King
QA Contact: liujia
URL:
Whiteboard:
: 1833041 1889006 (view as bug list)
Depends On:
Blocks: 1833041
TreeView+ depends on / blocked
 
Reported: 2019-11-18 06:13 UTC by Joel Pearson
Modified: 2020-10-19 22:15 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1833041 (view as bug list)
Environment:
Last Closed: 2020-09-13 05:00:18 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift enhancements pull 325 None open Bug 1773419: enhancements/update/cluster-version-operator-x509-upstream-trust: Propose a new enhancement 2020-10-26 18:10:48 UTC
Red Hat Knowledge Base (Solution) 4860481 None None None 2020-08-10 13:28:07 UTC

Description Joel Pearson 2019-11-18 06:13:00 UTC
Description of problem:
The "OpenShift Cluster Version Operator (CVO)" doesn't mount the ssl certificates from the host (masters) correctly. This means if you are using an MITM proxy checking for new cluster versions fails with: "x509: certificate signed by unknown authority"

I looked into it deeper and I realized that the CVO only mounts /etc/ssl/certs from the host, however the certificates are a symlink on the host: /etc/ssl/certs/ca-bundle.crt -> /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem

So it just ends up using the certificates bundled with the CVO, instead of any customized certificates from the host (master).

I updated the cluster-version-operator and also mounted in /etc/pki as readonly and then it could download version updates correctly.

How reproducible:

Prerequisites:
MITM internet via a transparent proxy (explicit proxy probably has the same problem)

Steps to Reproduce:
1. Prepare an env
2. In the install-config.yaml configure your MITM CA cert in "additionalTrustBundle"
3. In the "Proxy" config (can be done in the manifest before the cluster is deployed) ensure spec.trustedCA.name = "user-ca-bundle"
4. Version updates in the "Cluster Settings" part of the console will say that "Update Status" is "Failing" or "Failed" or something to that effect.

Actual results:

"Update Status" says Failed and it won't let you upgrade your cluster.

Expected results:

"Update Status" says "Up to date" or offers an upgrade.

Comment 1 W. Trevor King 2019-11-18 06:27:38 UTC
[1] is in this space.  At the moment, yeah, CVO isn't going to like transparent proxies.  It will work with an explicit proxy though, as long as httpsProxy is set [2,3].

[1]: https://github.com/openshift/enhancements/pull/115
[2]: https://github.com/openshift/installer/pull/2658#issuecomment-553656222
[3]: https://github.com/openshift/cluster-version-operator/blob/8240a9b3711fa6938129d06ee8c6957a8f3b6464/pkg/cvo/availableupdates.go#L226

Comment 2 Scott Dodson 2019-11-18 14:15:55 UTC
Related'ish https://bugzilla.redhat.com/show_bug.cgi?id=1771564

Comment 3 Joel Pearson 2019-11-18 22:10:56 UTC
Trevor, I presume that an explicit proxy that is also doing MITM will have the exact same problem? I'm increasingly seeing MITM pop up in explicit proxies in the clients that I see.

Comment 4 W. Trevor King 2019-11-18 22:35:22 UTC
Scott, Ben, and I were sorting out nomenclature out of band, and settled on:

* Transparent proxies pass through requests unmodified.  Non-transparent may modify requests (e.g. by decrypting and re-encrypting them).  Probably a grey scale here; e.g. adding a Via header [1] would still be pretty transparent.
* Explicit proxies are configured by explicitly pointing the application at a proxy (e.g. via the proxy config object [2]).  Implicit proxies are set up via network settings to route packets to the proxy regardless of app/host configuration.

We should have good support since 4.2 for explicit proxies, transparent or non-transparent.  You don't need anything special for transparent, implicit proxies, so that's fine too.  There are problems like this bug for non-transparent, implicit proxies that expect to be able to decrypt outgoing traffic, because we don't always inject the additional trust needed into the OpenShift component, and the component complains about not trusting the proxy on the other side of the TLS connection.  Explicitly configuring your proxy [2] should be a reasonable stopgap while we work the kinks out of the non-transparent, implicit case.

[1]: https://tools.ietf.org/html/rfc7230#section-5.7.1
[2]: https://github.com/openshift/api/blob/d0b31d707c464221d1eb24846b0d0bbe57040102/config/v1/types_proxy.go#L11-L66

Comment 5 Scott Dodson 2019-12-12 15:45:52 UTC
Docs bugs created requesting workaround documentation, moving to 4.4 to be implemented as part of https://github.com/openshift/enhancements/pull/115

Comment 7 Scott Dodson 2020-04-21 17:26:27 UTC
We've not seen significant customer activity around this pattern so we're deferring this for now. Will reconsider if this becomes a more widespread usecase.

Comment 9 W. Trevor King 2020-05-07 23:23:30 UTC
*** Bug 1833041 has been marked as a duplicate of this bug. ***

Comment 10 Joel Pearson 2020-05-08 05:30:23 UTC
For what it's worth it doesn't look like the cluster version operator gets proxy settings added either, that is if you configure an explicit proxy after installation, I haven't tried adding a proxy during installation, only the certificates in "additionalTrustBundle" as I mentioned when I opened this bug.

My use case was adding an explicit proxy instead of the transparent proxy that was in place already, as my transparent proxy is a little bit flakey at the moment.

Comment 11 W. Trevor King 2020-05-08 05:33:33 UTC
> My use case was adding an explicit proxy instead of the transparent proxy...

Explicit proxy should be fine if you have something in the httpsProxy property (more on this in comment 1).  If that's not working for you, open a new bug.  This bug is, as I understand it, about the case where httpsProxy is not set but trustedCA is.

Comment 13 W. Trevor King 2020-05-15 05:53:42 UTC
Enhancement up.  Moving back to NEW, because this bug should certainly not get swept into MODIFIED if/when the enhancement PR lands.  We'd still need to actually implement it ;).

Comment 15 Lalatendu Mohanty 2020-05-19 18:35:28 UTC
We have higher priority bugs planned for 4.5.0. Also this is now tracked in https://issues.redhat.com/browse/OTA-211 which is planned for 4.6.0. Hence moving this to 4.6.0.

Comment 16 Lalatendu Mohanty 2020-05-19 18:56:28 UTC
Changing to priority to match with https://issues.redhat.com/browse/OTA-211 priority

Comment 17 W. Trevor King 2020-06-21 14:17:38 UTC
Haven't had time to get back around to this.  Adding UpcomingSprint

Comment 19 W. Trevor King 2020-07-10 19:41:58 UTC
https://github.com/openshift/cluster-version-operator/pull/311 (for bug 1797123) will provide a safety valve here.  Lala and Ben were going to take a look, but don't seem to have had time yet (although Abinav approved it 3d ago [1]).  If/when that lands, we can revisit whether we need more than the safety valve for this bug.  Until then, the UpcomingSprint Lala just added makes sense.

[1]: https://github.com/openshift/cluster-version-operator/pull/311#issuecomment-655024387

Comment 21 W. Trevor King 2020-08-01 05:21:04 UTC
#311 (discussed in comment 19) landed today, which means that folks using 4.6 nightlies (once we promote the next one) and later 4.6 RCs and eventually GA releases will be able to work around this by adding their Cincinnati/update-recommendation-service CA to the Proxy config object's trustedCA bundle.  Once we test that out in bug 1797123 and get some feedback on whether it's a workable stopgap, we may backport it to earlier releases.

Comment 22 W. Trevor King 2020-08-21 22:24:23 UTC
Still working through verification for bug 1797123.

Comment 23 W. Trevor King 2020-08-27 03:26:53 UTC
@liujia verified the bug 1797123 changes as a stopgap workaround for this issue [1], as described in comment 19 and [2].  This bug will remain open to track whether and how we implement a more targeted MitM fix, based on enhancement#325 as mentioned in comment 13 or some other effort.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1797123#c17
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1797123#c7

Comment 24 W. Trevor King 2020-08-28 22:45:38 UTC
4.6 mitigation is in place.  Punting future work to 4.7.

Comment 25 W. Trevor King 2020-09-13 05:00:18 UTC
To avoid having to tag this UpcomingSprint each sprint, I'm just going to close this DEFERRED.  We can re-open if folks are not satisfied with the stopgap workaround once they've had time to kick that around.

Comment 26 W. Trevor King 2020-10-16 20:00:52 UTC
*** Bug 1889006 has been marked as a duplicate of this bug. ***

Comment 27 Kai-Uwe Rommel 2020-10-18 19:01:01 UTC
I read through the case and it looks like a lot of time has been spent discussing this and thinking over it and if it really needs fixing and when to fix it ... but the reporter has already described in quite detail that the bug can _easily_ be fixed. I don't understand why it has not simply been done already as it looks like that fix would have required less time doing it than all the time spent discussing it ... :-(

@Joel Pearson, could you clarify "I updated the cluster-version-operator and also mounted in /etc/pki as readonly and then it could download version updates correctly" a little bit? I could use this. I mean did you really just add another mount into the container? Where exactly (on which level) did you modify the deployment? Thanks!

Comment 28 Joel Pearson 2020-10-19 22:15:51 UTC
Hi Kai-Uwe,

I did this for a particular client which I don't have access to at the moment and they're using explicit proxies now as I had reliability issues with their transparent setup (nothing to do with OpenShift).
But from memory, I went in and modified the cluster-version-operator kubernetes deployment yaml.  I think there was already a reference to /etc/ssl, so I just added another mount using the same mount type (hostPath I think?).  But basically, just look for /etc/ssl and duplicate all volume references that it uses but use /etc/pki instead.

Good Luck.

Thanks,

Joel


Note You need to log in before you can comment on or make changes to this bug.