+++ This bug was initially created as a clone of Bug #1798887 +++
Description of problem:
The cluster network operator is rejecting an HTTPS readiness endpoint with a chain of trust rooted in an organizational CA configured with trustedCA.
Version-Release number of selected component (if applicable):
OCP 4.2.16
How reproducible:
Happened in test and production clusters
Steps to Reproduce:
1. Configure MITM proxy, trustedCA, and both HTTP and HTTPS readinessEndpoints (HTTP may not be needed to reproduce this)
2. View Cluster Network Operator logs
Actual results:
- lastTransitionTime: "2020-02-06T08:31:27Z"
message: 'The configuration is invalid for proxy ''cluster'' (readinessEndpoint
probe failed for endpoint ''https://www.google.com'': endpoint probe failed for
endpoint ''https://www.google.com'' using proxy ''http://proxy.example.com:8080'':
Get https://www.google.com: x509: certificate signed by unknown authority). Use
''oc edit proxy.config.openshift.io cluster'' to fix.'
reason: InvalidProxyConfig
status: "True"
type: Degraded
Expected results:
HTTPS readiness endpoint should pass validation.
Additional info:
Confirmed chain of trust by doing an `oc rsh` to the network-operator pod, creating /tmp/ca-bundle.crt from the contents of the CM referenced by proxy/cluster trustedCA, and doing:
https_proxy=http://proxy.example.com:8080 curl https://www.google.com/ --cacert /tmp/ca-bundle.crt
Curl reported no errors.
--- Additional comment from Robert Bost on 2020-04-07 20:36:42 UTC ---
I can confirm same issue in 4.2.26
--- Additional comment from Robert Bost on 2020-04-07 23:15:03 UTC ---
The issue is here:
https://github.com/openshift/cluster-network-operator/blob/d69bd9eff18d142e33bfd380273edf386c30f1e5/pkg/controller/proxyconfig/validation.go#L246-L252
I believe the `proxy.Scheme == schemeHTTPS` needs to be changed to `proxy.Scheme == schemeHTTPS || endpoint.Scheme == schemeHTTPS`. A MITM proxy will send back a certificate to the network operator performing a probe even if the proxy Scheme is HTTP. The presence of TLS is based on the endpoint Scheme.
Ben or Daneyon, does this seem like the right fix?
I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.
I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.