Bug 1821956

Summary: [4.4] readinessEndpoint not using trustedCA for trust validation
Product: OpenShift Container Platform Reporter: Robert Bost <rbost>
Component: NetworkingAssignee: Daneyon Hansen <dhansen>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED EOL Docs Contact:
Severity: low    
Priority: low CC: aconstan, amcdermo, anbhat, bbennett, ChetRHosey, dhansen, mmasters, openshift-bugzilla-robot, rbost, vlaad, zzhao
Version: 4.3.zKeywords: Reopened
Target Milestone: ---   
Target Release: 4.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1798887 Environment:
Last Closed: 2021-03-09 13:57:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1849154    
Bug Blocks: 1855366    

Description Robert Bost 2020-04-07 23:27:47 UTC
+++ This bug was initially created as a clone of Bug #1798887 +++

Description of problem:

The cluster network operator is rejecting an HTTPS readiness endpoint with a chain of trust rooted in an organizational CA configured with trustedCA.


Version-Release number of selected component (if applicable):

OCP 4.2.16


How reproducible:

Happened in test and production clusters


Steps to Reproduce:
1. Configure MITM proxy, trustedCA, and both HTTP and HTTPS readinessEndpoints (HTTP may not be needed to reproduce this)
2. View Cluster Network Operator logs

Actual results:

    - lastTransitionTime: "2020-02-06T08:31:27Z"
      message: 'The configuration is invalid for proxy ''cluster'' (readinessEndpoint
        probe failed for endpoint ''https://www.google.com'': endpoint probe failed for
        endpoint ''https://www.google.com'' using proxy ''http://proxy.example.com:8080'':
        Get https://www.google.com: x509: certificate signed by unknown authority). Use
        ''oc edit proxy.config.openshift.io cluster'' to fix.'
      reason: InvalidProxyConfig
      status: "True"
      type: Degraded


Expected results:

HTTPS readiness endpoint should pass validation.

Additional info:

Confirmed chain of trust by doing an `oc rsh` to the network-operator pod, creating /tmp/ca-bundle.crt from the contents of the CM referenced by proxy/cluster trustedCA, and doing:

    https_proxy=http://proxy.example.com:8080 curl https://www.google.com/ --cacert /tmp/ca-bundle.crt

Curl reported no errors.

--- Additional comment from Robert Bost on 2020-04-07 20:36:42 UTC ---

I can confirm same issue in 4.2.26

--- Additional comment from Robert Bost on 2020-04-07 23:15:03 UTC ---

The issue is here:

  https://github.com/openshift/cluster-network-operator/blob/d69bd9eff18d142e33bfd380273edf386c30f1e5/pkg/controller/proxyconfig/validation.go#L246-L252

I believe the `proxy.Scheme == schemeHTTPS` needs to be changed to `proxy.Scheme == schemeHTTPS || endpoint.Scheme == schemeHTTPS`. A MITM proxy will send back a certificate to the network operator performing a probe even if the proxy Scheme is HTTP. The presence of TLS is based on the endpoint Scheme. 

Ben or Daneyon, does this seem like the right fix?

Comment 1 Daneyon Hansen 2020-05-04 23:06:06 UTC
I pushed a fix for https://bugzilla.redhat.com/show_bug.cgi?id=1798887 and will cherry-pick to this PR when merged.

Comment 3 Daneyon Hansen 2020-05-27 16:37:27 UTC
> We may need this backported to 4.3.z and 4.2.z as well?

Aniket, yes.

Comment 4 Miciah Dashiel Butler Masters 2020-05-28 14:55:18 UTC
4.2 goes out of support when 4.5 is released, so we shouldn't need to backport to 4.2:  https://access.redhat.com/support/policy/updates/openshift#dates

Comment 5 Daneyon Hansen 2020-06-19 17:17:50 UTC
Tagged UpcomingSprint as multiple CI jobs failed after the PR of the dependent bug was tagged /lgtm.

Comment 6 Miciah Dashiel Butler Masters 2020-07-09 16:23:20 UTC
*** Bug 1855356 has been marked as a duplicate of this bug. ***

Comment 7 Daneyon Hansen 2020-07-10 21:46:59 UTC
I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 8 Daneyon Hansen 2020-07-30 15:28:53 UTC
I’m adding UpcomingSprint because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 10 Ben Bennett 2020-12-18 19:04:19 UTC
This is fixed in 4.6.