Bug 1588010

Summary: Prometheus can't access router metrics
Product: OpenShift Container Platform Reporter: Simon Pasquier <spasquie>
Component: MonitoringAssignee: Simon Pasquier <spasquie>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.9.0CC: aos-bugs, juzhao, oourfali, pep, spasquie
Target Milestone: ---   
Target Release: 3.9.z   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: the Prometheus service account doesn't have the required permissions to access the metrics endpoint of the router. Consequence: Prometheus fails to scrape the router's metrics. Fix: the Prometheus service account is granted an additional role with permissions to access the metrics endpoint. Result: Prometheus can pull metrics from the router.
Story Points: ---
Clone Of: 1565095
: 1619998 (view as bug list) Environment:
Last Closed: 2018-08-29 14:42:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1565095    
Bug Blocks: 1619998    
Attachments:
Description Flags
"no route to host" error for router targets
none
router prometheus output none

Comment 1 Simon Pasquier 2018-06-06 13:15:42 UTC
Upstream PR: https://github.com/openshift/openshift-ansible/pull/8596

Comment 3 Junqi Zhao 2018-08-22 06:26:44 UTC
Since Bug 1589023, openshift-router target shows "no route to host", see the attached picture

Change back to MODIFIED

openshift-ansible-3.9.41-1.git.0.4c55974.el7.noarch

# openshift version
openshift v3.9.41
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16

Comment 4 Junqi Zhao 2018-08-22 06:27:20 UTC
Created attachment 1477792 [details]
"no route to host" error for router targets

Comment 5 Junqi Zhao 2018-08-22 07:10:11 UTC
@Simon
From the attached picture
the openshift-router target is 
http://172.16.120.91:1936/metrics,
should it be use https protocol
https://172.16.120.91:1936/metrics?

since 3.10, I see the router target is using https protocol

Comment 6 Junqi Zhao 2018-08-22 07:52:35 UTC
Please ignore Comment 4 and Comment 5, it is another issue, I think we could close this defect, following is my reason

# get token
token=`oc sa get-token prometheus -n openshift-metrics`, then

oc rsh {router-pod}, and use the token from the previous step and run command

curl -k -H "Authorization: Bearer $token" http://{router_ip}:1936/metrics
we can get the prometheus output, see the attached file

Comment 7 Junqi Zhao 2018-08-22 07:54:03 UTC
Created attachment 1477811 [details]
router prometheus output

Comment 8 Simon Pasquier 2018-08-22 08:56:03 UTC
According to the previous attachment, it is a multinode cluster and it is probably a firewall issue as described in https://bugzilla.redhat.com/show_bug.cgi?id=1552235

Comment 9 Junqi Zhao 2018-08-22 10:55:46 UTC
Per Comment 6 - Comment 8, problem mentioned in this defect is fixed, set to VERIFIED

Comment 11 errata-xmlrpc 2018-08-29 14:42:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2549