Bug 1986243
Summary: | delete user-workload-monitoring-config configmap, can not find user metrics although no setting for enforcedTargetLimit | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> | ||||
Component: | Monitoring | Assignee: | Arunprasad Rajkumar <arajkuma> | ||||
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.9 | CC: | amuller, anpicker, aos-bugs, arajkuma, erooth, spasquie | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.9.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-10-18 17:41:26 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Junqi Zhao
2021-07-27 03:55:48 UTC
from "lastError": "target_limit exceeded (number of targets: 2, limit: 0)" reason maybe, if we delete user-workload-monitoring-config configmap, the target limit value is 0, 0 should mean no limit (as for Prometheus), but it's treated as number 0 here, see bug 1982931 # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/targets' | jq | grep prometheus-example-monitor -C10 "__meta_kubernetes_pod_name": "prometheus-example-app-d748cfb54-h27vb", "__meta_kubernetes_pod_node_name": "ip-10-0-177-101.us-east-2.compute.internal", "__meta_kubernetes_pod_phase": "Running", "__meta_kubernetes_pod_ready": "true", "__meta_kubernetes_pod_uid": "fa7aa1b4-f4d2-4880-b5ac-9d239787bc72", "__meta_kubernetes_service_label_app": "prometheus-example-app", "__meta_kubernetes_service_labelpresent_app": "true", "__meta_kubernetes_service_name": "prometheus-example-app", "__metrics_path__": "/metrics", "__scheme__": "http", "job": "serviceMonitor/ns1/prometheus-example-monitor/0", "prometheus": "openshift-user-workload-monitoring/user-workload" }, "labels": { "endpoint": "web", "instance": "10.129.2.12:8080", "job": "prometheus-example-app", "namespace": "ns1", "pod": "prometheus-example-app-d748cfb54-h27vb", "prometheus": "openshift-user-workload-monitoring/user-workload", "service": "prometheus-example-app" }, "scrapePool": "serviceMonitor/ns1/prometheus-example-monitor/0", "scrapeUrl": "http://10.129.2.12:8080/metrics", "globalUrl": "http://10.129.2.12:8080/metrics", "lastError": "target_limit exceeded (number of targets: 2, limit: 0)", "lastScrape": "2021-07-27T04:19:26.780629467Z", "lastScrapeDuration": 3.5823e-05, "health": "down" }, { "discoveredLabels": { "__address__": "10.131.0.91:8080", -- "__meta_kubernetes_pod_name": "prometheus-example-app-d748cfb54-wpjw9", "__meta_kubernetes_pod_node_name": "ip-10-0-199-189.us-east-2.compute.internal", "__meta_kubernetes_pod_phase": "Running", "__meta_kubernetes_pod_ready": "true", "__meta_kubernetes_pod_uid": "eead5b2d-8bea-43e0-bff5-a9a2c493c052", "__meta_kubernetes_service_label_app": "prometheus-example-app", "__meta_kubernetes_service_labelpresent_app": "true", "__meta_kubernetes_service_name": "prometheus-example-app", "__metrics_path__": "/metrics", "__scheme__": "http", "job": "serviceMonitor/ns1/prometheus-example-monitor/0", "prometheus": "openshift-user-workload-monitoring/user-workload" }, "labels": { "endpoint": "web", "instance": "10.131.0.91:8080", "job": "prometheus-example-app", "namespace": "ns1", "pod": "prometheus-example-app-d748cfb54-wpjw9", "prometheus": "openshift-user-workload-monitoring/user-workload", "service": "prometheus-example-app" }, "scrapePool": "serviceMonitor/ns1/prometheus-example-monitor/0", "scrapeUrl": "http://10.131.0.91:8080/metrics", "globalUrl": "http://10.131.0.91:8080/metrics", "lastError": "target_limit exceeded (number of targets: 2, limit: 0)", "lastScrape": "2021-07-27T04:19:26.501800903Z", "lastScrapeDuration": 1.7929e-05, "health": "down" }, { "discoveredLabels": { "__address__": "10.0.129.103:10250", I could reproduce this issue on upstream prometheus 2.28.1 by setting `target_limit: 1` then back to `target_limit: 0` with SIGHUP to reload config. ``` scrape_configs: - job_name: 'prometheus-k8s' target_limit: 0 static_configs: - targets: - 'localhost:9090' labels: pod: prometheus-k8s-0 service: prometheus-k8s - targets: - 'localhost:9090' labels: pod: prometheus-k8s-1 service: prometheus-k8s ``` 1. Start the prometheus with following config, $ ./prometheus --config.file=./config.yaml ``` scrape_configs: - job_name: 'prometheus-k8s' target_limit: 1 static_configs: - targets: - 'localhost:9090' labels: pod: prometheus-k8s-0 service: prometheus-k8s - targets: - 'localhost:9090' labels: pod: prometheus-k8s-1 service: prometheus-k8s ``` 2. Change `target_limit: 0` and sent signal to prometheus to reload config. $ kill -SIGHUP $(ps | awk -F ' ' '!/awk/ && /prometheus/{print $1}') 3. Check targets status, it will show the error "target_limit exceeded (number of targets: 2, limit: 0)" Fix will be available on prometheus 2.29.0 upstream release. tested with 4.9.0-0.nightly-2021-08-19-184748 and followed the steps in Comment 0, delete user-workload-monitoring-config configmap, can see the user metrics now Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |