Bug 1954421
Summary: | Get 'Application is not available' when access Prometheus UI | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | hongyan li <hongyli> | ||||
Component: | Monitoring | Assignee: | Pawel Krupa <pkrupa> | ||||
Status: | CLOSED ERRATA | QA Contact: | hongyan li <hongyli> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.8 | CC: | alegrand, anpicker, erooth, juzhao, kakkoyun, lcosic, pkrupa, spasquie | ||||
Target Milestone: | --- | Keywords: | Regression | ||||
Target Release: | 4.8.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-07-27 23:04:13 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
hongyan li
2021-04-28 07:13:48 UTC
Monitoring-Metrics console works well Grafana UI displays well Get 'Application is not available' when access both Prometheus UI and AlertManager UI # oc -n openshift-monitoring get svc prometheus-k8s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE prometheus-k8s ClusterIP 172.30.87.10 <none> 9091/TCP,9092/TCP 4h42m # oc -n openshift-monitoring get pod -o wide | grep prometheus-k8s prometheus-k8s-0 7/7 Running 1 4h20m 10.131.0.33 ip-10-0-197-153.ap-northeast-1.compute.internal <none> <none> prometheus-k8s-1 7/7 Running 1 4h17m 10.128.2.30 ip-10-0-142-190.ap-northeast-1.compute.internal <none> <none> # oc debug node/ip-10-0-197-153.ap-northeast-1.compute.internal sh-4.4# chroot /host sh-4.4# iptables-save | grep 9091 -A KUBE-SERVICES ! -s 10.128.0.0/14 -d 172.30.212.37/32 -p tcp -m comment --comment "openshift-monitoring/thanos-querier:web cluster IP" -m tcp --dport 9091 -j KUBE-MARK-MASQ -A KUBE-SERVICES -d 172.30.212.37/32 -p tcp -m comment --comment "openshift-monitoring/thanos-querier:web cluster IP" -m tcp --dport 9091 -j KUBE-SVC-G5A7ID5ATXHWKRS5 -A KUBE-SEP-6K6USKENEZOOGRJZ -p tcp -m comment --comment "openshift-monitoring/thanos-querier:web" -m tcp -j DNAT --to-destination 10.131.0.30:9091 -A KUBE-SEP-SSSHBF3TPULS5UN7 -p tcp -m comment --comment "openshift-monitoring/thanos-querier:web" -m tcp -j DNAT --to-destination 10.128.2.25:9091 -A KUBE-SERVICES -d 172.30.87.10/32 -p tcp -m comment --comment "openshift-monitoring/prometheus-k8s:web has no endpoints" -m tcp --dport 9091 -j REJECT --reject-with icmp-port-unreachable ************************************************ no prometheus-k8s:web cluster IP setting from the result, example: -A KUBE-SERVICES -d 172.30.87.10/32 -p tcp -m comment --comment "openshift-monitoring/prometheus-k8s:web cluster IP" -m tcp --dport 9091 -j KUBE-SVC-DCLNKYLNAMROIJRV # oc -n openshift-monitoring get ep NAME ENDPOINTS AGE alertmanager-main <none> 4h55m alertmanager-operated 10.128.2.31:9095,10.128.2.32:9095,10.131.0.35:9095 + 6 more... 4h55m cluster-monitoring-operator 10.128.0.89:8443 5h7m grafana 10.128.2.29:3000 4h55m kube-state-metrics 10.131.0.26:8443,10.131.0.26:9443 5h7m node-exporter 10.0.142.190:9100,10.0.153.211:9100,10.0.168.159:9100 + 3 more... 5h7m openshift-state-metrics 10.131.0.29:8443,10.131.0.29:9443 5h7m prometheus-adapter 10.128.2.27:6443,10.131.0.27:6443 5h7m prometheus-k8s <none> 4h55m prometheus-k8s-thanos-sidecar <none> 4h55m prometheus-operated 10.128.2.30:9091,10.131.0.33:9091,10.128.2.30:10901 + 1 more... 4h55m prometheus-operator 10.130.0.79:8080,10.130.0.79:8443 5h7m telemeter-client 10.128.2.24:8443 5h7m thanos-querier 10.128.2.25:9093,10.131.0.30:9093,10.128.2.25:9092 + 3 more... 5h7m # oc -n openshift-monitoring get ep prometheus-k8s -oyaml apiVersion: v1 kind: Endpoints metadata: annotations: endpoints.kubernetes.io/last-change-trigger-time: "2021-04-28T03:51:36Z" creationTimestamp: "2021-04-28T03:51:48Z" labels: app.kubernetes.io/component: prometheus app.kubernetes.io/managed-by: cluster-monitoring-operator app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: openshift-monitoring app.kubernetes.io/version: 2.24.0 prometheus: k8s name: prometheus-k8s namespace: openshift-monitoring resourceVersion: "22584" uid: 7f3573f1-e28d-4785-a172-de03797da1cb checked with 4.8.0-0.nightly-2021-04-29-063720, can login prometheus UI now, see the attached picture, all the endpoints are normal # oc -n openshift-monitoring get ep NAME ENDPOINTS AGE alertmanager-main 10.128.2.27:9095,10.131.0.32:9095,10.131.0.38:9095 + 3 more... 66m alertmanager-operated 10.128.2.27:9095,10.131.0.32:9095,10.131.0.38:9095 + 6 more... 66m cluster-monitoring-operator 10.130.0.76:8443 74m grafana 10.128.2.24:3000 66m kube-state-metrics 10.131.0.26:8443,10.131.0.26:9443 74m node-exporter 10.0.0.3:9100,10.0.0.4:9100,10.0.0.5:9100 + 3 more... 74m openshift-state-metrics 10.131.0.30:8443,10.131.0.30:9443 74m prometheus-adapter 10.128.2.23:6443,10.131.0.29:6443 74m prometheus-k8s 10.128.2.29:9092,10.131.0.34:9092,10.128.2.29:9091 + 1 more... 66m prometheus-k8s-thanos-sidecar 10.128.2.29:10902,10.131.0.34:10902 66m prometheus-operated 10.128.2.29:9091,10.131.0.34:9091,10.128.2.29:10901 + 1 more... 66m prometheus-operator 10.128.0.96:8080,10.128.0.96:8443 74m telemeter-client 10.128.2.25:8443 74m thanos-querier 10.128.2.36:9093,10.129.2.33:9093,10.128.2.36:9092 + 3 more... 74m # oc -n openshift-monitoring get sts prometheus-k8s -oyaml labels: app.kubernetes.io/component: prometheus app.kubernetes.io/managed-by: cluster-monitoring-operator app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: openshift-monitoring app.kubernetes.io/version: 2.24.0 # oc -n openshift-monitoring get sts alertmanager-main -oyaml labels: alertmanager: main app.kubernetes.io/component: alert-router app.kubernetes.io/managed-by: cluster-monitoring-operator app.kubernetes.io/name: alertmanager app.kubernetes.io/part-of: openshift-monitoring app.kubernetes.io/version: 0.21.0 # oc -n openshift-monitoring get pod --show-labels | grep -E "prometheus-k8s|alertmanager-main" alertmanager-main-0 5/5 Running 0 46m alertmanager=main,app.kubernetes.io/component=alert-router,app.kubernetes.io/managed-by=cluster-monitoring-operator,app.kubernetes.io/name=alertmanager,app.kubernetes.io/part-of=openshift-monitoring,app.kubernetes.io/version=0.21.0,app=alertmanager,controller-revision-hash=alertmanager-main-78f7cc764d,statefulset.kubernetes.io/pod-name=alertmanager-main-0 alertmanager-main-1 5/5 Running 0 51m alertmanager=main,app.kubernetes.io/component=alert-router,app.kubernetes.io/managed-by=cluster-monitoring-operator,app.kubernetes.io/name=alertmanager,app.kubernetes.io/part-of=openshift-monitoring,app.kubernetes.io/version=0.21.0,app=alertmanager,controller-revision-hash=alertmanager-main-78f7cc764d,statefulset.kubernetes.io/pod-name=alertmanager-main-1 alertmanager-main-2 5/5 Running 0 45m alertmanager=main,app.kubernetes.io/component=alert-router,app.kubernetes.io/managed-by=cluster-monitoring-operator,app.kubernetes.io/name=alertmanager,app.kubernetes.io/part-of=openshift-monitoring,app.kubernetes.io/version=0.21.0,app=alertmanager,controller-revision-hash=alertmanager-main-78f7cc764d,statefulset.kubernetes.io/pod-name=alertmanager-main-2 prometheus-k8s-0 7/7 Running 1 45m app.kubernetes.io/component=prometheus,app.kubernetes.io/managed-by=cluster-monitoring-operator,app.kubernetes.io/name=prometheus,app.kubernetes.io/part-of=openshift-monitoring,app.kubernetes.io/version=2.24.0,app=prometheus,controller-revision-hash=prometheus-k8s-588f669d48,operator.prometheus.io/name=k8s,operator.prometheus.io/shard=0,prometheus=k8s,statefulset.kubernetes.io/pod-name=prometheus-k8s-0 prometheus-k8s-1 7/7 Running 1 51m app.kubernetes.io/component=prometheus,app.kubernetes.io/managed-by=cluster-monitoring-operator,app.kubernetes.io/name=prometheus,app.kubernetes.io/part-of=openshift-monitoring,app.kubernetes.io/version=2.24.0,app=prometheus,controller-revision-hash=prometheus-k8s-588f669d48,operator.prometheus.io/name=k8s,operator.prometheus.io/shard=0,prometheus=k8s,statefulset.kubernetes.io/pod-name=prometheus-k8s-1 no need to remove managed-by label, maybe caused by other issues Created attachment 1777086 [details]
prometheus UI can login now
We couldn't merge the PR [1] that fixed the selector labels in time because the CMO CI pipeline was broken (for other reasons). So we decided to revert the prometheus operator bump [2]. I'm moving the bug to MODIFIED. [1] https://github.com/openshift/cluster-monitoring-operator/pull/1138 [2] https://github.com/openshift/prometheus-operator/pull/116 Verified with payload 4.8.0-0.nightly-2021-04-29-151418 Prometheus UI works well Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |