Bug 1986243
| Summary: | delete user-workload-monitoring-config configmap, can not find user metrics although no setting for enforcedTargetLimit | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> | ||||
| Component: | Monitoring | Assignee: | Arunprasad Rajkumar <arajkuma> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.9 | CC: | amuller, anpicker, aos-bugs, arajkuma, erooth, spasquie | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.9.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-10-18 17:41:26 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Junqi Zhao
2021-07-27 03:55:48 UTC
from "lastError": "target_limit exceeded (number of targets: 2, limit: 0)" reason maybe, if we delete user-workload-monitoring-config configmap, the target limit value is 0, 0 should mean no limit (as for Prometheus), but it's treated as number 0 here, see bug 1982931 # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/targets' | jq | grep prometheus-example-monitor -C10 "__meta_kubernetes_pod_name": "prometheus-example-app-d748cfb54-h27vb", "__meta_kubernetes_pod_node_name": "ip-10-0-177-101.us-east-2.compute.internal", "__meta_kubernetes_pod_phase": "Running", "__meta_kubernetes_pod_ready": "true", "__meta_kubernetes_pod_uid": "fa7aa1b4-f4d2-4880-b5ac-9d239787bc72", "__meta_kubernetes_service_label_app": "prometheus-example-app", "__meta_kubernetes_service_labelpresent_app": "true", "__meta_kubernetes_service_name": "prometheus-example-app", "__metrics_path__": "/metrics", "__scheme__": "http", "job": "serviceMonitor/ns1/prometheus-example-monitor/0", "prometheus": "openshift-user-workload-monitoring/user-workload" }, "labels": { "endpoint": "web", "instance": "10.129.2.12:8080", "job": "prometheus-example-app", "namespace": "ns1", "pod": "prometheus-example-app-d748cfb54-h27vb", "prometheus": "openshift-user-workload-monitoring/user-workload", "service": "prometheus-example-app" }, "scrapePool": "serviceMonitor/ns1/prometheus-example-monitor/0", "scrapeUrl": "http://10.129.2.12:8080/metrics", "globalUrl": "http://10.129.2.12:8080/metrics", "lastError": "target_limit exceeded (number of targets: 2, limit: 0)", "lastScrape": "2021-07-27T04:19:26.780629467Z", "lastScrapeDuration": 3.5823e-05, "health": "down" }, { "discoveredLabels": { "__address__": "10.131.0.91:8080", -- "__meta_kubernetes_pod_name": "prometheus-example-app-d748cfb54-wpjw9", "__meta_kubernetes_pod_node_name": "ip-10-0-199-189.us-east-2.compute.internal", "__meta_kubernetes_pod_phase": "Running", "__meta_kubernetes_pod_ready": "true", "__meta_kubernetes_pod_uid": "eead5b2d-8bea-43e0-bff5-a9a2c493c052", "__meta_kubernetes_service_label_app": "prometheus-example-app", "__meta_kubernetes_service_labelpresent_app": "true", "__meta_kubernetes_service_name": "prometheus-example-app", "__metrics_path__": "/metrics", "__scheme__": "http", "job": "serviceMonitor/ns1/prometheus-example-monitor/0", "prometheus": "openshift-user-workload-monitoring/user-workload" }, "labels": { "endpoint": "web", "instance": "10.131.0.91:8080", "job": "prometheus-example-app", "namespace": "ns1", "pod": "prometheus-example-app-d748cfb54-wpjw9", "prometheus": "openshift-user-workload-monitoring/user-workload", "service": "prometheus-example-app" }, "scrapePool": "serviceMonitor/ns1/prometheus-example-monitor/0", "scrapeUrl": "http://10.131.0.91:8080/metrics", "globalUrl": "http://10.131.0.91:8080/metrics", "lastError": "target_limit exceeded (number of targets: 2, limit: 0)", "lastScrape": "2021-07-27T04:19:26.501800903Z", "lastScrapeDuration": 1.7929e-05, "health": "down" }, { "discoveredLabels": { "__address__": "10.0.129.103:10250", I could reproduce this issue on upstream prometheus 2.28.1 by setting `target_limit: 1` then back to `target_limit: 0` with SIGHUP to reload config.
```
scrape_configs:
- job_name: 'prometheus-k8s'
target_limit: 0
static_configs:
- targets:
- 'localhost:9090'
labels:
pod: prometheus-k8s-0
service: prometheus-k8s
- targets:
- 'localhost:9090'
labels:
pod: prometheus-k8s-1
service: prometheus-k8s
```
1. Start the prometheus with following config,
$ ./prometheus --config.file=./config.yaml
```
scrape_configs:
- job_name: 'prometheus-k8s'
target_limit: 1
static_configs:
- targets:
- 'localhost:9090'
labels:
pod: prometheus-k8s-0
service: prometheus-k8s
- targets:
- 'localhost:9090'
labels:
pod: prometheus-k8s-1
service: prometheus-k8s
```
2. Change `target_limit: 0` and sent signal to prometheus to reload config.
$ kill -SIGHUP $(ps | awk -F ' ' '!/awk/ && /prometheus/{print $1}')
3. Check targets status, it will show the error "target_limit exceeded (number of targets: 2, limit: 0)"
Fix will be available on prometheus 2.29.0 upstream release. tested with 4.9.0-0.nightly-2021-08-19-184748 and followed the steps in Comment 0, delete user-workload-monitoring-config configmap, can see the user metrics now Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |