Bug 1819565
| Summary: | user-workload-monitoring prometheus-operator endpoint is down due to x509 issue | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> | ||||
| Component: | Monitoring | Assignee: | Pawel Krupa <pkrupa> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 4.5 | CC: | alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania | ||||
| Target Milestone: | --- | Keywords: | Regression | ||||
| Target Release: | 4.5.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-07-13 17:24:43 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 1675292 [details]
user-workload-monitoring prometheus-operator endpoint is down
Tested with 4.5.0-0.nightly-2020-04-01-232323, user-workload-monitoring prometheus-operator endpoint is up
- job_name: openshift-monitoring/prometheus-operator/0
honor_labels: true
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- openshift-monitoring
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
server_name: prometheus-operator.openshift-monitoring.svc
insecure_skip_verify: false
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |
Description of problem: enabled techPreviewUserWorkload, user-workload-monitoring prometheus-operator endpoint is down due to x509 error # oc -n openshift-monitoring get cm cluster-monitoring-config -oyaml apiVersion: v1 data: config.yaml: | techPreviewUserWorkload: enabled: true kind: ConfigMap metadata: creationTimestamp: "2020-04-01T04:14:22Z" name: cluster-monitoring-config namespace: openshift-monitoring ... user-workload-monitoring prometheus-operator endpoint is down due to x509 error: Get https://10.130.0.51:8443/metrics: x509: certificate is valid for prometheus-operator.openshift-user-workload-monitoring.svc, prometheus-operator.openshift-user-workload-monitoring.svc.cluster.local, not prometheus-operator-user-workload.openshift-monitoring.svc See the picture # oc -n openshift-user-workload-monitoring get pod -o wide | grep prometheus-operator prometheus-operator-8687bb4d7c-qpz2q 2/2 Running 0 3h19m 10.130.0.51 ip-10-0-173-92.us-east-2.compute.internal <none> <none> But there is not issue from command # token=`oc sa get-token prometheus-k8s -n openshift-monitoring` # oc -n openshift-user-workload-monitoring exec -c prometheus-operator prometheus-operator-8687bb4d7c-qpz2q -- curl -k -H "Authorization: Bearer $token" https://10.130.0.51:8443/metrics | head -n 5 # oc -n openshift-user-workload-monitoring exec -c prometheus-operator prometheus-operator-8687bb4d7c-qpz2q -- curl -k -H "Authorization: Bearer $token" https://10.130.0.51:8443/metrics | head -n 5 HELP go_gc_duration_seconds A summary of the GC invocation durations. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 1.2129e-05 go_gc_duration_seconds{quantile="0.25"} 2.1588e-05 go_gc_duration_seconds{quantile="0.5"} 4.0001e-05 configuration file, server_name is: prometheus-operator-user-workload.openshift-monitoring.svc - job_name: openshift-user-workload-monitoring/prometheus-operator/0 honor_labels: true honor_timestamps: true scrape_interval: 30s scrape_timeout: 10s metrics_path: /metrics scheme: https kubernetes_sd_configs: - role: endpoints namespaces: names: - openshift-user-workload-monitoring bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token tls_config: ca_file: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt server_name: prometheus-operator-user-workload.openshift-monitoring.svc insecure_skip_verify: false Version-Release number of selected component (if applicable): 4.5.0-0.nightly-2020-03-31-203533 How reproducible: Always Steps to Reproduce: 1. See the description 2. 3. Actual results: user-workload-monitoring prometheus-operator endpoint is down Expected results: user-workload-monitoring prometheus-operator endpoint should be up Additional info: