Description of problem: When deploying Prometheus on OCP 3.9 using openshift-ansible, the router's metrics are not available: the router metrics endpoint is protected and prometheus can't scrape it. Version-Release number of selected component (if applicable): atomic-openshift-3.9.14-1.git.0.4efa2ca.el7.x86_64 openshift-ansible-3.9.14-1.git.3.c62bc34.el7.noarch How reproducible: Always Steps to Reproduce: 1. Deploy prometheus metrics on an OCP 3.9 cluster via openshift-ansible: https://docs.openshift.com/container-platform/3.9/install_config/cluster_metrics.html#openshift-prometheus 2. Check the kubernetes-service-endpoints target for the router metrics endpoint Actual results: level=debug ts=2018-04-09T11:18:56.431809488Z caller=scrape.go:676 component="scrape manager" scrape_pool=kubernetes-service-endpoints target=http://192.168.55.143:1936/metrics msg="Scrape failed" err="server returned HTTP status 403 Forbidden" Expected results: Router metrics can be scraped by prometheus Additional info: This is reported upstream in https://github.com/openshift/origin/issues/17685
The upstream bug is fixed on master (upcoming 3.10).
Doc is LGTM
@Oved The Rarget Release is set to 3.11, I think it should be 3.10
We need new prometheus images to test this defect, the following configurations is not in /etc/prometheus/prometheus.yml of prometheus container # Scrape config for the router - job_name: 'openshift-router' scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt server_name: router.default.svc bearer_token_file: /var/run/secrets/kubernetes.io/scraper/token kubernetes_sd_configs: - role: endpoints namespaces: names: - default relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: router;1936-tcp
Right, there's a difference between the upstream issue that was focused on 'oc cluster up' + the example Prometheus template [1] and this BZ which targets OpenShift Ansible. IIUC the existing playbooks don't configure Prometheus to scrape the router endpoint: this is the configuration snippet that you're not getting currently. I'll address this. That being said, the merged PR [2] is relevant for both cases. [1] https://github.com/openshift/origin/tree/master/examples/prometheus [2] https://github.com/openshift/origin/pull/19318
I've checked further: with the current openshift/origin and openshift-ansible, Prometheus doesn't scrape the router's metrics because the router's service doesn't have the "prometheus.io/scrape: true" annotation anymore. I've submitted https://github.com/openshift/openshift-ansible/pull/8512 for Prometheus to scrape the metrics.
https://github.com/openshift/openshift-ansible/pull/8512 has been merged.
clusterrole router-metrics is added in prometheus namespace, and router metrics could be accessed openshift-ansible version: openshift-ansible-3.10.0-0.58.0.git.0.d8f6377.el7.noarch.rpm
Created attachment 1447321 [details] openshift-router target