1740258 – After adding second ingresscontroller produces TLS handshake error coming from prometheus.

Bug 1740258 - After adding second ingresscontroller produces TLS handshake error coming from prometheus.

Summary: After adding second ingresscontroller produces TLS handshake error coming fro...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.1.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.1.z
Assignee:	Dan Mace
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Depends On:	1724498
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-12 14:27 UTC by Ryan Howe
Modified:	2024-10-01 16:19 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-09-25 07:27:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-ingress-operator pull 290	None	closed	Bug 1740258: fix prometheus integration for multiple ingresscontrollers	2021-01-18 16:38:50 UTC
Github	openshift origin pull 23655	None	closed	Bug 1740258: e2e: test ingress controller prometheus integration	2021-01-18 16:38:50 UTC
Red Hat Product Errata	RHBA-2019:2820	None	None	None	2019-09-25 07:28:02 UTC

Description Ryan Howe 2019-08-12 14:27:26 UTC

Description of problem:

When a 2nd ingresscontroller object is added to cluster router logs for all routers shows the following error which is coming from the prometheus pod ips. 

```
http: TLS handshake error from 10.131.0.218:59612: remote error: tls: bad certificate
```

Version-Release number of selected component (if applicable):
4.1.9 

How reproducible:
100%

Steps to Reproduce:
1. Add second ingresscontroller to cluster. 

# oc create -n  openshift-ingress-operator - -f - <<EOF
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  finalizers:
  - ingresscontroller.operator.openshift.io/finalizer-ingresscontroller
  generation: 1
  name: test
  selfLink: /apis/operator.openshift.io/v1/namespaces/openshift-ingress-operator/ingresscontrollers/test
spec:
  domain: apps2.example.ocp.com
  replicas: 1


2. Review logs 

# oc logs router-test-xxx
# oc logs router-default-xxxx


Actual results:

LOGS:

I0812 14:19:41.952951       1 logs.go:49] http: TLS handshake error from 10.128.2.204:54680: remote error: tls: bad certificate
I0812 14:19:41.953043       1 logs.go:49] http: TLS handshake error from 10.131.0.225:36654: remote error: tls: bad certificate


prometheus is unable to get router metrics.  


Expected results:

Able to add 2nd ingress controller with out breaking prometheus metrics.

Comment 1 Hongan Li 2019-08-13 09:14:19 UTC

This issue has been fixed in 4.2 by PR: https://github.com/openshift/cluster-ingress-operator/pull/242

and same root cause to https://bugzilla.redhat.com/show_bug.cgi?id=1724498

Comment 5 Dan Mace 2019-08-15 15:35:42 UTC

(In reply to Hongan Li from comment #3)
> workaround is updating the selector in servicemonitor resource for each
> ingresscontroller, for example:
> 
> ### update servicemonitor for default ingresscontroller
> $ oc get servicemonitor router-default -o yaml -n openshift-ingress
> <---snip--->
> spec:
> <---snip--->
>   selector: {}
> 
> $ oc edit servicemonitor router-default -n openshift-ingress
>   selector:
>     matchLabels:
>       ingresscontroller.operator.openshift.io/owning-ingresscontroller:
> default
> 
> 
> ### update servicemonitor for test ingresscontroller
> $ oc edit servicemonitor router-test -n openshift-ingress
>   selector:
>     matchLabels:
>       ingresscontroller.operator.openshift.io/owning-ingresscontroller: test

Just to be clear, while this is a possible solution in the context of a formal support exception, we don't have an exception yet, and manually editing this resource IS NOT SUPPORTED. Doing so could make the cluster unsupported or unable to be upgraded.

Please DO NOT execute this patch in a production cluster for which support is expected.

Comment 7 Hongan Li 2019-09-20 05:26:04 UTC

Verified with 4.1.17 and issue has been fixed. 

$ oc -n openshift-ingress-operator get ingresscontroller
NAME      AGE
default   74m
test      3m50s

$ oc -n openshift-ingress logs router-test-6b4ddc8b47-bnxcx | grep -i error

Comment 9 errata-xmlrpc 2019-09-25 07:27:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2820

Note You need to log in before you can comment on or make changes to this bug.