Description of problem: When a 2nd ingresscontroller object is added to cluster router logs for all routers shows the following error which is coming from the prometheus pod ips. ``` http: TLS handshake error from 10.131.0.218:59612: remote error: tls: bad certificate ``` Version-Release number of selected component (if applicable): 4.1.9 How reproducible: 100% Steps to Reproduce: 1. Add second ingresscontroller to cluster. # oc create -n openshift-ingress-operator - -f - <<EOF apiVersion: operator.openshift.io/v1 kind: IngressController metadata: finalizers: - ingresscontroller.operator.openshift.io/finalizer-ingresscontroller generation: 1 name: test selfLink: /apis/operator.openshift.io/v1/namespaces/openshift-ingress-operator/ingresscontrollers/test spec: domain: apps2.example.ocp.com replicas: 1 2. Review logs # oc logs router-test-xxx # oc logs router-default-xxxx Actual results: LOGS: I0812 14:19:41.952951 1 logs.go:49] http: TLS handshake error from 10.128.2.204:54680: remote error: tls: bad certificate I0812 14:19:41.953043 1 logs.go:49] http: TLS handshake error from 10.131.0.225:36654: remote error: tls: bad certificate prometheus is unable to get router metrics. Expected results: Able to add 2nd ingress controller with out breaking prometheus metrics.
This issue has been fixed in 4.2 by PR: https://github.com/openshift/cluster-ingress-operator/pull/242 and same root cause to https://bugzilla.redhat.com/show_bug.cgi?id=1724498
(In reply to Hongan Li from comment #3) > workaround is updating the selector in servicemonitor resource for each > ingresscontroller, for example: > > ### update servicemonitor for default ingresscontroller > $ oc get servicemonitor router-default -o yaml -n openshift-ingress > <---snip---> > spec: > <---snip---> > selector: {} > > $ oc edit servicemonitor router-default -n openshift-ingress > selector: > matchLabels: > ingresscontroller.operator.openshift.io/owning-ingresscontroller: > default > > > ### update servicemonitor for test ingresscontroller > $ oc edit servicemonitor router-test -n openshift-ingress > selector: > matchLabels: > ingresscontroller.operator.openshift.io/owning-ingresscontroller: test Just to be clear, while this is a possible solution in the context of a formal support exception, we don't have an exception yet, and manually editing this resource IS NOT SUPPORTED. Doing so could make the cluster unsupported or unable to be upgraded. Please DO NOT execute this patch in a production cluster for which support is expected.
Verified with 4.1.17 and issue has been fixed. $ oc -n openshift-ingress-operator get ingresscontroller NAME AGE default 74m test 3m50s $ oc -n openshift-ingress logs router-test-6b4ddc8b47-bnxcx | grep -i error
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2820