Bug 2028061 - Configuring mTLS on default Ingress breaks ingress canary check & console health checks
Summary: Configuring mTLS on default Ingress breaks ingress canary check & console hea...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Ryan Fredette
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-01 12:19 UTC by Nirupma Kashyap
Modified: 2023-03-09 01:09 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-09 01:09:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
KCS Link : https://access.redhat.com/solutions/6551251 (43 bytes, text/plain)
2021-12-21 10:40 UTC, Nirupma Kashyap
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 6551251 0 None None None 2021-12-21 10:47:38 UTC

Description Nirupma Kashyap 2021-12-01 12:19:26 UTC
Description of problem:
Configuring mTLS on default IngressController breaks ingress canary check & console health checks which in turn makes the ingress and console cluster operators into a degraded state.

OpenShift release version:
OCP-4.9.5

Cluster Platform:
UPI on Baremetal (Disconnected cluster)

How reproducible:
Configure mutual TLS/mTLS using default IngressController as described in the doc(https://docs.openshift.com/container-platform/4.9/networking/ingress-operator.html#nw-mutual-tls-auth_configuring-ingress)

Steps to Reproduce (in detail):
1. Create a config map that is in the openshift-config namespace.
2. Edit the IngressController resource in the openshift-ingress-operator project
3.Add the spec.clientTLS field and subfields to configure mutual TLS:
~~~
  apiVersion: operator.openshift.io/v1
  kind: IngressController
  metadata:
    name: default
    namespace: openshift-ingress-operator
  spec:
    clientTLS:
      clientCertificatePolicy: Required
      clientCA:
        name: router-ca-certs-default
      allowedSubjectPatterns:
      - "^/CN=example.com/ST=NC/C=US/O=Security/OU=OpenShift$"
~~~
Actual results:
setting up mTLS using documented steps breaks canary and console health checks as clientCertificatePolicy is set as Required these health checks are looking for the client Certs and hence failing and in turn Ingress and Console operators are in a degraded state.

Expected results:
mTLS setup should work properly without degrading the Ingress and Console operators.

Impact of the problem:
Instable cluster with Ingress and Console operators into Degraded state.

Additional info:
The following is the Error message for your reference:
The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)

// Canary checks looking for required tls certificate.
2021-11-19T17:17:58.237Z    ERROR    operator.canary_controller    wait/wait.go:155    error performing canary route check    {"error": "error sending canary HTTP request to \"canary-openshift-ingress-canary.apps.bruce.openshift.local\": Get \"https://canary-openshift-ingress-canary.apps.bruce.openshift.local\": remote error: tls: certificate required"}

// Console operator:
RouteHealthDegraded: failed to GET route (https://console-openshift-console.apps.bruce.openshift.local): Get "https://console-openshift-console.apps.bruce.openshift.local": remote error: tls: certificate required


** Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report.  You may also mark the bug private if you wish.

Comment 2 Andrew McDermott 2021-12-02 18:37:41 UTC
Using the openshift documentation and creating a configmap for the clientCA (i.e., "router-ca-certs-default") specified here:

  apiVersion: operator.openshift.io/v1
  kind: IngressController
  metadata:
    name: default
    namespace: openshift-ingress-operator
  spec:
    clientTLS:
      clientCertificatePolicy: Required
      clientCA:
        name: router-ca-certs-default

I see the same issue; both console and ingress go degraded:

console                                    4.9.0-0.nightly-2021-12-01-185844   False       False         False      22m     RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.ci-ln-l2gfksk-72292.origin-ci-int-gce.dev.rhcloud.com): Get "https://console-openshift-console.apps.ci-ln-l2gfksk-72292.origin-ci-int-gce.dev.rhcloud.com": remote error: tls: certificate required
csi-snapshot-controller                    4.9.0-0.nightly-2021-12-01-185844   True        False         False      94m     
dns                                        4.9.0-0.nightly-2021-12-01-185844   True        False         False      93m     
etcd                                       4.9.0-0.nightly-2021-12-01-185844   True        False         False      93m     
image-registry                             4.9.0-0.nightly-2021-12-01-185844   True        False         False      88m     
ingress                                    4.9.0-0.nightly-2021-12-01-185844   True        False         True       4m13s   The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
insights                                   4.9.0-0.nightly-2021-12-01-185844   True        False         False      87m     
 

If I revert the change so that clientTLS has:

  spec:
    clientTLS:
      clientCertificatePolicy: ""
      clientCA:
        name: ""

then the console and ingress are no longer degraded.

Marking this as blocker+ and we will investigate the must gather to see if this is a configuration issue.

Comment 4 Miciah Dashiel Butler Masters 2021-12-07 17:24:54 UTC
Setting blocker- as this is not a regression or upgrade issue but rather a caveat in certain configurations involving a new but already shipped feature.  

We can make this configuration work by doing the following:

* Add an additional canary route that uses passthrough and use this route when the default ingresscontroller requires client certificates.

* Add logic in the console operator’s health check to report healthy if the health probe gets a “tls: certificate required” error.

Comment 5 Nirupma Kashyap 2021-12-21 10:40:47 UTC
Created attachment 1847178 [details]
KCS Link : https://access.redhat.com/solutions/6551251

Comment 7 Miciah Dashiel Butler Masters 2022-01-24 18:13:59 UTC
Moving off of 4.10.0.  We'll work on this in the next release.  Meanwhile, users should not configure the default ingresscontroller to require client certificates.

Comment 9 Nirupma Kashyap 2022-07-12 14:23:14 UTC
Hello Team,

Can we get some traction on this please, Cu is looking for an update on this? Can you please assist with what release this fix has been targeted for?

Regards,
Nirupma

Comment 19 sakshi 2023-02-24 06:36:55 UTC
Hi Team, 

The customer mentioned that this bug has been open for over 1 year and wants to know is there any timeline for when it will be fixed.

Comment 21 Shiftzilla 2023-03-09 01:09:30 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9037


Note You need to log in before you can comment on or make changes to this bug.