Bug 1740258 - After adding second ingresscontroller produces TLS handshake error coming from prometheus.
Summary: After adding second ingresscontroller produces TLS handshake error coming fro...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.1.z
Assignee: Dan Mace
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On: 1724498
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-12 14:27 UTC by Ryan Howe
Modified: 2019-10-03 08:58 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-25 07:27:53 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 290 None closed Bug 1740258: fix prometheus integration for multiple ingresscontrollers 2020-07-30 12:40:28 UTC
Github openshift origin pull 23655 None closed Bug 1740258: e2e: test ingress controller prometheus integration 2020-07-30 12:40:27 UTC
Red Hat Product Errata RHBA-2019:2820 None None None 2019-09-25 07:28:02 UTC

Description Ryan Howe 2019-08-12 14:27:26 UTC
Description of problem:

When a 2nd ingresscontroller object is added to cluster router logs for all routers shows the following error which is coming from the prometheus pod ips. 

```
http: TLS handshake error from 10.131.0.218:59612: remote error: tls: bad certificate
```

Version-Release number of selected component (if applicable):
4.1.9 

How reproducible:
100%

Steps to Reproduce:
1. Add second ingresscontroller to cluster. 

# oc create -n  openshift-ingress-operator - -f - <<EOF
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  finalizers:
  - ingresscontroller.operator.openshift.io/finalizer-ingresscontroller
  generation: 1
  name: test
  selfLink: /apis/operator.openshift.io/v1/namespaces/openshift-ingress-operator/ingresscontrollers/test
spec:
  domain: apps2.example.ocp.com
  replicas: 1


2. Review logs 

# oc logs router-test-xxx
# oc logs router-default-xxxx


Actual results:

LOGS:

I0812 14:19:41.952951       1 logs.go:49] http: TLS handshake error from 10.128.2.204:54680: remote error: tls: bad certificate
I0812 14:19:41.953043       1 logs.go:49] http: TLS handshake error from 10.131.0.225:36654: remote error: tls: bad certificate


prometheus is unable to get router metrics.  


Expected results:

Able to add 2nd ingress controller with out breaking prometheus metrics.

Comment 1 Hongan Li 2019-08-13 09:14:19 UTC
This issue has been fixed in 4.2 by PR: https://github.com/openshift/cluster-ingress-operator/pull/242

and same root cause to https://bugzilla.redhat.com/show_bug.cgi?id=1724498

Comment 5 Dan Mace 2019-08-15 15:35:42 UTC
(In reply to Hongan Li from comment #3)
> workaround is updating the selector in servicemonitor resource for each
> ingresscontroller, for example:
> 
> ### update servicemonitor for default ingresscontroller
> $ oc get servicemonitor router-default -o yaml -n openshift-ingress
> <---snip--->
> spec:
> <---snip--->
>   selector: {}
> 
> $ oc edit servicemonitor router-default -n openshift-ingress
>   selector:
>     matchLabels:
>       ingresscontroller.operator.openshift.io/owning-ingresscontroller:
> default
> 
> 
> ### update servicemonitor for test ingresscontroller
> $ oc edit servicemonitor router-test -n openshift-ingress
>   selector:
>     matchLabels:
>       ingresscontroller.operator.openshift.io/owning-ingresscontroller: test

Just to be clear, while this is a possible solution in the context of a formal support exception, we don't have an exception yet, and manually editing this resource IS NOT SUPPORTED. Doing so could make the cluster unsupported or unable to be upgraded.

Please DO NOT execute this patch in a production cluster for which support is expected.

Comment 7 Hongan Li 2019-09-20 05:26:04 UTC
Verified with 4.1.17 and issue has been fixed. 

$ oc -n openshift-ingress-operator get ingresscontroller
NAME      AGE
default   74m
test      3m50s

$ oc -n openshift-ingress logs router-test-6b4ddc8b47-bnxcx | grep -i error

Comment 9 errata-xmlrpc 2019-09-25 07:27:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2820


Note You need to log in before you can comment on or make changes to this bug.