Bug 1588010 - Prometheus can't access router metrics
Summary: Prometheus can't access router metrics
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 3.9.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: 3.9.z
Assignee: Simon Pasquier
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On: 1565095
Blocks: 1619998
TreeView+ depends on / blocked
 
Reported: 2018-06-06 13:13 UTC by Simon Pasquier
Modified: 2018-08-29 14:43 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the Prometheus service account doesn't have the required permissions to access the metrics endpoint of the router. Consequence: Prometheus fails to scrape the router's metrics. Fix: the Prometheus service account is granted an additional role with permissions to access the metrics endpoint. Result: Prometheus can pull metrics from the router.
Clone Of: 1565095
: 1619998 (view as bug list)
Environment:
Last Closed: 2018-08-29 14:42:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
"no route to host" error for router targets (133.62 KB, image/png)
2018-08-22 06:27 UTC, Junqi Zhao
no flags Details
router prometheus output (67.65 KB, text/plain)
2018-08-22 07:54 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2549 0 None None None 2018-08-29 14:43:15 UTC

Comment 1 Simon Pasquier 2018-06-06 13:15:42 UTC
Upstream PR: https://github.com/openshift/openshift-ansible/pull/8596

Comment 3 Junqi Zhao 2018-08-22 06:26:44 UTC
Since Bug 1589023, openshift-router target shows "no route to host", see the attached picture

Change back to MODIFIED

openshift-ansible-3.9.41-1.git.0.4c55974.el7.noarch

# openshift version
openshift v3.9.41
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16

Comment 4 Junqi Zhao 2018-08-22 06:27:20 UTC
Created attachment 1477792 [details]
"no route to host" error for router targets

Comment 5 Junqi Zhao 2018-08-22 07:10:11 UTC
@Simon
From the attached picture
the openshift-router target is 
http://172.16.120.91:1936/metrics,
should it be use https protocol
https://172.16.120.91:1936/metrics?

since 3.10, I see the router target is using https protocol

Comment 6 Junqi Zhao 2018-08-22 07:52:35 UTC
Please ignore Comment 4 and Comment 5, it is another issue, I think we could close this defect, following is my reason

# get token
token=`oc sa get-token prometheus -n openshift-metrics`, then

oc rsh {router-pod}, and use the token from the previous step and run command

curl -k -H "Authorization: Bearer $token" http://{router_ip}:1936/metrics
we can get the prometheus output, see the attached file

Comment 7 Junqi Zhao 2018-08-22 07:54:03 UTC
Created attachment 1477811 [details]
router prometheus output

Comment 8 Simon Pasquier 2018-08-22 08:56:03 UTC
According to the previous attachment, it is a multinode cluster and it is probably a firewall issue as described in https://bugzilla.redhat.com/show_bug.cgi?id=1552235

Comment 9 Junqi Zhao 2018-08-22 10:55:46 UTC
Per Comment 6 - Comment 8, problem mentioned in this defect is fixed, set to VERIFIED

Comment 11 errata-xmlrpc 2018-08-29 14:42:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2549


Note You need to log in before you can comment on or make changes to this bug.