Bug 1687640

Summary:	router metrics with monitoring integration does not work
Product:	OpenShift Container Platform	Reporter:	Hongan Li <hongli>
Component:	Networking	Assignee:	Dan Mace <dmace>
Networking sub component:	router	QA Contact:	Hongan Li <hongli>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	aos-bugs, juzhao
Version:	4.1.0
Target Milestone:	---
Target Release:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: REST scheme/mapper of kube client is cached before ServiceMonitor CRD is registered. Consequence: ServiceMonitor creation for router metrics fails as Client will return GVK error. Fix: Refresh kube client generation in case of NoMatch error for ServiceMonitor. Result: Router metrics is reported correctly.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-04 10:45:33 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Hongan Li 2019-03-12 02:46:29 UTC

Description of problem:
Router metrics with monitoring integration does not work.
No servicemonitor resources in openshift-ingress namespace.
No router metrics on Prometheus UI. 

Version-Release number of selected component (if applicable):.
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-06-074438   True        False         19h     Cluster version is 4.0.0-0.nightly-2019-03-06-074438


How reproducible:
always

Steps to Reproduce:
1. install 4.0 cluster on AWS
2. oc get servicemonitor -n openshift-ingress
3. Logon Prometheus UI and go to Status>Targets page to check router targets.

Actual results:
No servicemonitor resources in openshift-ingress namespace.
No router metrics on Prometheus UI. 

Expected results:
User should view the router metrics on Prometheus UI.

Additional info:
It was working after https://github.com/openshift/cluster-ingress-operator/pull/108 merged, but is not working in latest build.

Comment 2 Ravi Sankar 2019-03-19 22:33:37 UTC

Fixed by https://github.com/openshift/cluster-ingress-operator/pull/166

Comment 3 Hongan Li 2019-03-25 02:19:42 UTC

verified with 4.0.0-0.nightly-2019-03-23-222829 and the issue has been fixed. The metrics of routers can be showed on Prometheus UI.


$ oc get servicemonitors.monitoring.coreos.com/router-default -n openshift-ingress -o yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  creationTimestamp: 2019-03-25T00:29:31Z
  generation: 1
  name: router-default
  namespace: openshift-ingress
  ownerReferences:
  - apiVersion: apps/v1
    controller: true
    kind: Deployment
    name: router-default
    uid: 06c11402-4e95-11e9-b378-125857bf72fa
  resourceVersion: "10253"
  selfLink: /apis/monitoring.coreos.com/v1/namespaces/openshift-ingress/servicemonitors/router-default
  uid: 0e856e06-4e95-11e9-b378-125857bf72fa
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    path: /metrics
    port: metrics
    scheme: https
    tlsConfig:
      caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
      serverName: router-internal-default.openshift-ingress.svc
  namespaceSelector:
    matchNames:
    - openshift-ingress
  selector: {}

Comment 6 errata-xmlrpc 2019-06-04 10:45:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758