Bug 1565095
| Summary: | Prometheus can't access router metrics | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Josep 'Pep' Turro Mauri <pep> | ||||
| Component: | Monitoring | Assignee: | Simon Pasquier <spasquie> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Junqi Zhao <juzhao> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 3.9.0 | CC: | aos-bugs, knakayam, oourfali, spasquie, vlaad | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.10.0 | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
Cause: the Prometheus service account doesn't have the required permissions to access the metrics endpoint of the router.
Consequence: Prometheus fails to scrape the router's metrics.
Fix: the Prometheus service account is granted an additional role with permissions to access the metrics endpoint.
Result: Prometheus can pull metrics from the router.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1588010 (view as bug list) | Environment: | |||||
| Last Closed: | 2018-10-08 12:44:07 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1588010, 1619998 | ||||||
| Attachments: |
|
||||||
|
Description
Josep 'Pep' Turro Mauri
2018-04-09 11:26:00 UTC
The upstream bug is fixed on master (upcoming 3.10). Doc is LGTM @Oved The Rarget Release is set to 3.11, I think it should be 3.10 We need new prometheus images to test this defect, the following configurations is not in /etc/prometheus/prometheus.yml of prometheus container
# Scrape config for the router
- job_name: 'openshift-router'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
server_name: router.default.svc
bearer_token_file: /var/run/secrets/kubernetes.io/scraper/token
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- default
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: router;1936-tcp
Right, there's a difference between the upstream issue that was focused on 'oc cluster up' + the example Prometheus template [1] and this BZ which targets OpenShift Ansible. IIUC the existing playbooks don't configure Prometheus to scrape the router endpoint: this is the configuration snippet that you're not getting currently. I'll address this. That being said, the merged PR [2] is relevant for both cases. [1] https://github.com/openshift/origin/tree/master/examples/prometheus [2] https://github.com/openshift/origin/pull/19318 I've checked further: with the current openshift/origin and openshift-ansible, Prometheus doesn't scrape the router's metrics because the router's service doesn't have the "prometheus.io/scrape: true" annotation anymore. I've submitted https://github.com/openshift/openshift-ansible/pull/8512 for Prometheus to scrape the metrics. https://github.com/openshift/openshift-ansible/pull/8512 has been merged. clusterrole router-metrics is added in prometheus namespace, and router metrics could be accessed openshift-ansible version: openshift-ansible-3.10.0-0.58.0.git.0.d8f6377.el7.noarch.rpm Created attachment 1447321 [details]
openshift-router target
|