Hide Forgot
Description of problem: When a service monitor defines a sample limit (which is possible for user-workload monitoring), the reporting metrics (up, scrape_samples_scraped, ...) may not be inserted by Prometheus if the number of samples exposed by the target is close to the limit. See https://github.com/prometheus/prometheus/issues/9990 for the details. Version-Release number of selected component (if applicable): 4.10 How reproducible: Always Steps to Reproduce: 1. Follow the OCP documentation to deploy the sample application which exposes only one metric. https://docs.openshift.com/container-platform/4.9/monitoring/managing-metrics.html#setting-up-metrics-collection-for-user-defined-projects_managing-metrics 2. Add a sample limit of 1 to the application's service monitor apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: k8s-app: prometheus-example-monitor name: prometheus-example-monitor namespace: ns1 spec: endpoints: - interval: 30s port: web scheme: http sampleLimit: 1 selector: matchLabels: app: prometheus-example-app Actual results: The target is scraped successfully (no target down) but the up metric is missing (like the other reporting metrics). Expected results: Reporting metrics should be present. Additional info: Fixed in Prometheus v2.32.1.
tested with 4.10.0-0.nightly-2021-12-21-130047, followed steps in comment 0, could see the up metric # oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0 ts=2021-12-21T23:54:33.692Z caller=main.go:532 level=info msg="Starting Prometheus" version="(version=2.32.1, branch=rhaos-4.10-rhel-8, revision=2003b6cb83d933ad154a6dcd6bc6b497488b8501)" # oc -n openshift-user-workload-monitoring exec -c prometheus prometheus-user-workload-0 -- cat /etc/prometheus/config_out/prometheus.env.yaml scrape_configs: - job_name: serviceMonitor/ns1/prometheus-example-monitor/0 ... - source_labels: - __tmp_hash regex: 0 action: keep sample_limit: 1 metric_relabel_configs: - target_label: namespace replacement: ns1 # oc -n openshift-user-workload-monitoring exec -c prometheus prometheus-user-workload-0 -- cat /etc/prometheus/config_out/prometheus.env.yaml # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=up%7Bnamespace%3D%22ns1%22%7D' | jq { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "up", "endpoint": "web", "instance": "10.131.0.152:8080", "job": "prometheus-example-app", "namespace": "ns1", "pod": "prometheus-example-app-8659789999-nwh2k", "prometheus": "openshift-user-workload-monitoring/user-workload", "service": "prometheus-example-app" }, "value": [ 1640145070.945, "1" ] } ] } }
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056