Bug 1687640

Summary: router metrics with monitoring integration does not work
Product: OpenShift Container Platform Reporter: Hongan Li <hongli>
Component: NetworkingAssignee: Dan Mace <dmace>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, juzhao
Version: 4.1.0   
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: REST scheme/mapper of kube client is cached before ServiceMonitor CRD is registered. Consequence: ServiceMonitor creation for router metrics fails as Client will return GVK error. Fix: Refresh kube client generation in case of NoMatch error for ServiceMonitor. Result: Router metrics is reported correctly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:45:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hongan Li 2019-03-12 02:46:29 UTC
Description of problem:
Router metrics with monitoring integration does not work.
No servicemonitor resources in openshift-ingress namespace.
No router metrics on Prometheus UI. 

Version-Release number of selected component (if applicable):.
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-06-074438   True        False         19h     Cluster version is 4.0.0-0.nightly-2019-03-06-074438


How reproducible:
always

Steps to Reproduce:
1. install 4.0 cluster on AWS
2. oc get servicemonitor -n openshift-ingress
3. Logon Prometheus UI and go to Status>Targets page to check router targets.

Actual results:
No servicemonitor resources in openshift-ingress namespace.
No router metrics on Prometheus UI. 

Expected results:
User should view the router metrics on Prometheus UI.

Additional info:
It was working after https://github.com/openshift/cluster-ingress-operator/pull/108 merged, but is not working in latest build.

Comment 2 Ravi Sankar 2019-03-19 22:33:37 UTC
Fixed by https://github.com/openshift/cluster-ingress-operator/pull/166

Comment 3 Hongan Li 2019-03-25 02:19:42 UTC
verified with 4.0.0-0.nightly-2019-03-23-222829 and the issue has been fixed. The metrics of routers can be showed on Prometheus UI.


$ oc get servicemonitors.monitoring.coreos.com/router-default -n openshift-ingress -o yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  creationTimestamp: 2019-03-25T00:29:31Z
  generation: 1
  name: router-default
  namespace: openshift-ingress
  ownerReferences:
  - apiVersion: apps/v1
    controller: true
    kind: Deployment
    name: router-default
    uid: 06c11402-4e95-11e9-b378-125857bf72fa
  resourceVersion: "10253"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/openshift-ingress/servicemonitors/router-default
  uid: 0e856e06-4e95-11e9-b378-125857bf72fa
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    path: /metrics
    port: metrics
    scheme: https
    tlsConfig:
      caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
      serverName: router-internal-default.openshift-ingress.svc
  namespaceSelector:
    matchNames:
    - openshift-ingress
  selector: {}

Comment 6 errata-xmlrpc 2019-06-04 10:45:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758