Bug 1549021

Summary: No servicecatalog metrics in prometheus console
Product: OpenShift Container Platform Reporter: Zhang Cheng <chezhang>
Component: Service CatalogAssignee: Jay Boyd <jaboyd>
Status: CLOSED ERRATA QA Contact: Zhang Cheng <chezhang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.9.0CC: aos-bugs, chezhang, jaboyd, jiazha, jmatthew, zitang
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Service Catalog controller now exposes metrics for Prometheus to scrape. These metrics help enable monitoring Service Catalog.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-30 19:09:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Jay Boyd 2018-02-26 17:45:00 UTC
The controller-manager pod does not have the prometheus scrape attribute:

$ oc edit pod controller-manager-vzfjb -n kube-service-catalog

apiVersion: v1
kind: Pod
metadata:
  annotations:
    openshift.io/scc: restricted
  creationTimestamp: 2018-02-26T07:18:08Z
  generateName: controller-manager-

I don't believe adding the annotation to the daemonset propagates the attribute down to the pod level.

I believe that if you add the scrape annotation to the controller manager pod Prometheus will start scrapping it.  I'm unable to connect to Prometheus to verify though - I used https://prometheus-openshift-metrics.apps.0226-g87.qe.rhcloud.com/ but I'm unable to successfully authenticate (what userid/password do you use?).  

Note that the configuration will change (we won't use the annotation any more) once https://github.com/openshift/origin/pull/18694 merges.

Comment 2 Zhang Cheng 2018-02-27 05:30:56 UTC
(In reply to Jay Boyd from comment #1)
> The controller-manager pod does not have the prometheus scrape attribute:
> 
> $ oc edit pod controller-manager-vzfjb -n kube-service-catalog
> 
> apiVersion: v1
> kind: Pod
> metadata:
>   annotations:
>     openshift.io/scc: restricted
>   creationTimestamp: 2018-02-26T07:18:08Z
>   generateName: controller-manager-
> 
> I don't believe adding the annotation to the daemonset propagates the
> attribute down to the pod level.
That is my mistake, should change daemonset to:
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: controller-manager
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"

And pod will be deployed by:
- apiVersion: v1
  kind: Pod
  metadata:
    annotations:
      openshift.io/scc: restricted
      prometheus.io/scrape: "true"

> I believe that if you add the scrape annotation to the controller manager
> pod Prometheus will start scrapping it.  I'm unable to connect to Prometheus
> to verify though - I used
> https://prometheus-openshift-metrics.apps.0226-g87.qe.rhcloud.com/ but I'm
> unable to successfully authenticate (what userid/password do you use?).  
I using chezhang/redhat to login Prometheus console(You also can use another use, but need cluster-admin role).
I double checked today, bug still cannot get relate metrics in Prometheus console even though added prometheus.io/scrape: "true" to controller-manager pod.

> Note that the configuration will change (we won't use the annotation any
> more) once https://github.com/openshift/origin/pull/18694 merges.
I will double confirm after PR merge.

Comment 3 Jay Boyd 2018-02-27 14:49:14 UTC
I tried to review your environment this morning but it looks it was reset. Do you want to debug this or put it on hold until we have the new configuration?

Comment 7 Zhang Cheng 2018-03-02 02:47:05 UTC
@Jay
Thanks your quickly response.

Comment 8 Zhang Cheng 2018-03-07 07:53:08 UTC
@Jay

I noticed PR https://github.com/openshift/origin/pull/18694 is using openshift:master branch(I think master branch is for 3.10 at present), but no any PR for release-3.9 branch.

Mar 7 is code freeze date, will this bug be fixed in ocp3.9?

Comment 9 Jay Boyd 2018-03-07 14:23:03 UTC
This is the PR for 3.9:  https://github.com/openshift/origin/pull/18815

I'm hoping to get it in, but the merge queue is really slow.

Comment 10 Jay Boyd 2018-03-07 21:31:19 UTC
This issue was pulled from 3.9 at the last minute.  Review needed as this exposes metrics over non-authenticated HTTP.

Comment 11 Zhang Cheng 2018-03-08 01:52:21 UTC
changing version to 3.9.0 since issue was hunt in 3.9.0 testing.

Comment 12 Zhang Cheng 2018-04-09 07:37:58 UTC
Jay,

How about the status of this bug? Do you still want to provide the fix in 3.9.z?
Do you need to change "target release" to 3.10?

Comment 13 Jay Boyd 2018-04-10 00:14:06 UTC
Yes, target is 3.10.  
Ansible Installer:  https://github.com/openshift/openshift-ansible/pull/7681 is merged.
Cluster Up: https://github.com/openshift/origin/pull/19286

Comment 14 Jay Boyd 2018-04-13 12:11:52 UTC
finally merged.

Comment 15 Zhang Cheng 2018-04-23 09:34:54 UTC
Changing status to ON_QA since image ready for test.

Comment 16 Zhang Cheng 2018-04-23 09:36:08 UTC
Verified and passed with:
# service-catalog --version
v3.10.0-0.27.0;Upstream:v0.1.13

Currently can get metrics of service-catalog both in prometheus console and backend.

Comment 18 errata-xmlrpc 2018-07-30 19:09:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816