Description of problem:
The "OpenShift Cluster Version Operator (CVO)" doesn't mount the ssl certificates from the host (masters) correctly. This means if you are using an MITM proxy checking for new cluster versions fails with: "x509: certificate signed by unknown authority"
I looked into it deeper and I realized that the CVO only mounts /etc/ssl/certs from the host, however the certificates are a symlink on the host: /etc/ssl/certs/ca-bundle.crt -> /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
So it just ends up using the certificates bundled with the CVO, instead of any customized certificates from the host (master).
I updated the cluster-version-operator and also mounted in /etc/pki as readonly and then it could download version updates correctly.
MITM internet via a transparent proxy (explicit proxy probably has the same problem)
Steps to Reproduce:
1. Prepare an env
2. In the install-config.yaml configure your MITM CA cert in "additionalTrustBundle"
3. In the "Proxy" config (can be done in the manifest before the cluster is deployed) ensure spec.trustedCA.name = "user-ca-bundle"
4. Version updates in the "Cluster Settings" part of the console will say that "Update Status" is "Failing" or "Failed" or something to that effect.
"Update Status" says Failed and it won't let you upgrade your cluster.
"Update Status" says "Up to date" or offers an upgrade.
 is in this space. At the moment, yeah, CVO isn't going to like transparent proxies. It will work with an explicit proxy though, as long as httpsProxy is set [2,3].
Trevor, I presume that an explicit proxy that is also doing MITM will have the exact same problem? I'm increasingly seeing MITM pop up in explicit proxies in the clients that I see.
Scott, Ben, and I were sorting out nomenclature out of band, and settled on:
* Transparent proxies pass through requests unmodified. Non-transparent may modify requests (e.g. by decrypting and re-encrypting them). Probably a grey scale here; e.g. adding a Via header  would still be pretty transparent.
* Explicit proxies are configured by explicitly pointing the application at a proxy (e.g. via the proxy config object ). Implicit proxies are set up via network settings to route packets to the proxy regardless of app/host configuration.
We should have good support since 4.2 for explicit proxies, transparent or non-transparent. You don't need anything special for transparent, implicit proxies, so that's fine too. There are problems like this bug for non-transparent, implicit proxies that expect to be able to decrypt outgoing traffic, because we don't always inject the additional trust needed into the OpenShift component, and the component complains about not trusting the proxy on the other side of the TLS connection. Explicitly configuring your proxy  should be a reasonable stopgap while we work the kinks out of the non-transparent, implicit case.
Docs bugs created requesting workaround documentation, moving to 4.4 to be implemented as part of https://github.com/openshift/enhancements/pull/115
We've not seen significant customer activity around this pattern so we're deferring this for now. Will reconsider if this becomes a more widespread usecase.
*** Bug 1833041 has been marked as a duplicate of this bug. ***
For what it's worth it doesn't look like the cluster version operator gets proxy settings added either, that is if you configure an explicit proxy after installation, I haven't tried adding a proxy during installation, only the certificates in "additionalTrustBundle" as I mentioned when I opened this bug.
My use case was adding an explicit proxy instead of the transparent proxy that was in place already, as my transparent proxy is a little bit flakey at the moment.
> My use case was adding an explicit proxy instead of the transparent proxy...
Explicit proxy should be fine if you have something in the httpsProxy property (more on this in comment 1). If that's not working for you, open a new bug. This bug is, as I understand it, about the case where httpsProxy is not set but trustedCA is.
Enhancement up. Moving back to NEW, because this bug should certainly not get swept into MODIFIED if/when the enhancement PR lands. We'd still need to actually implement it ;).
We have higher priority bugs planned for 4.5.0. Also this is now tracked in https://issues.redhat.com/browse/OTA-211 which is planned for 4.6.0. Hence moving this to 4.6.0.
Changing to priority to match with https://issues.redhat.com/browse/OTA-211 priority
Haven't had time to get back around to this. Adding UpcomingSprint
https://github.com/openshift/cluster-version-operator/pull/311 (for bug 1797123) will provide a safety valve here. Lala and Ben were going to take a look, but don't seem to have had time yet (although Abinav approved it 3d ago ). If/when that lands, we can revisit whether we need more than the safety valve for this bug. Until then, the UpcomingSprint Lala just added makes sense.
#311 (discussed in comment 19) landed today, which means that folks using 4.6 nightlies (once we promote the next one) and later 4.6 RCs and eventually GA releases will be able to work around this by adding their Cincinnati/update-recommendation-service CA to the Proxy config object's trustedCA bundle. Once we test that out in bug 1797123 and get some feedback on whether it's a workable stopgap, we may backport it to earlier releases.
Still working through verification for bug 1797123.
@liujia verified the bug 1797123 changes as a stopgap workaround for this issue , as described in comment 19 and . This bug will remain open to track whether and how we implement a more targeted MitM fix, based on enhancement#325 as mentioned in comment 13 or some other effort.
4.6 mitigation is in place. Punting future work to 4.7.
To avoid having to tag this UpcomingSprint each sprint, I'm just going to close this DEFERRED. We can re-open if folks are not satisfied with the stopgap workaround once they've had time to kick that around.
*** Bug 1889006 has been marked as a duplicate of this bug. ***
I read through the case and it looks like a lot of time has been spent discussing this and thinking over it and if it really needs fixing and when to fix it ... but the reporter has already described in quite detail that the bug can _easily_ be fixed. I don't understand why it has not simply been done already as it looks like that fix would have required less time doing it than all the time spent discussing it ... :-(
@Joel Pearson, could you clarify "I updated the cluster-version-operator and also mounted in /etc/pki as readonly and then it could download version updates correctly" a little bit? I could use this. I mean did you really just add another mount into the container? Where exactly (on which level) did you modify the deployment? Thanks!
I did this for a particular client which I don't have access to at the moment and they're using explicit proxies now as I had reliability issues with their transparent setup (nothing to do with OpenShift).
But from memory, I went in and modified the cluster-version-operator kubernetes deployment yaml. I think there was already a reference to /etc/ssl, so I just added another mount using the same mount type (hostPath I think?). But basically, just look for /etc/ssl and duplicate all volume references that it uses but use /etc/pki instead.