Description of problem: The cluster network operator is rejecting an HTTPS readiness endpoint with a chain of trust rooted in an organizational CA configured with trustedCA. Version-Release number of selected component (if applicable): OCP 4.2.16 How reproducible: Happened in test and production clusters Steps to Reproduce: 1. Configure MITM proxy, trustedCA, and both HTTP and HTTPS readinessEndpoints (HTTP may not be needed to reproduce this) 2. View Cluster Network Operator logs Actual results: - lastTransitionTime: "2020-02-06T08:31:27Z" message: 'The configuration is invalid for proxy ''cluster'' (readinessEndpoint probe failed for endpoint ''https://www.google.com'': endpoint probe failed for endpoint ''https://www.google.com'' using proxy ''http://proxy.example.com:8080'': Get https://www.google.com: x509: certificate signed by unknown authority). Use ''oc edit proxy.config.openshift.io cluster'' to fix.' reason: InvalidProxyConfig status: "True" type: Degraded Expected results: HTTPS readiness endpoint should pass validation. Additional info: Confirmed chain of trust by doing an `oc rsh` to the network-operator pod, creating /tmp/ca-bundle.crt from the contents of the CM referenced by proxy/cluster trustedCA, and doing: https_proxy=http://proxy.example.com:8080 curl https://www.google.com/ --cacert /tmp/ca-bundle.crt Curl reported no errors.
I can confirm same issue in 4.2.26
The issue is here: https://github.com/openshift/cluster-network-operator/blob/d69bd9eff18d142e33bfd380273edf386c30f1e5/pkg/controller/proxyconfig/validation.go#L246-L252 I believe the `proxy.Scheme == schemeHTTPS` needs to be changed to `proxy.Scheme == schemeHTTPS || endpoint.Scheme == schemeHTTPS`. A MITM proxy will send back a certificate to the network operator performing a probe even if the proxy Scheme is HTTP. The presence of TLS is based on the endpoint Scheme. Ben or Daneyon, does this seem like the right fix?
I pushed https://github.com/openshift/cluster-network-operator/pull/613 to fix the issue.
Waiting for the associated PR to merge. This should be considered as a candidate for backport.
*** Bug 1791948 has been marked as a duplicate of this bug. ***
Retargeting to 4.6. The SDN team will handle the backport.
Tagged UpcomingSprint as multiple CI jobs failed after the PR was tagged /lgtm.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196