Bug 1954421
| Summary: | Get 'Application is not available' when access Prometheus UI | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | hongyan li <hongyli> | ||||
| Component: | Monitoring | Assignee: | Pawel Krupa <pkrupa> | ||||
| Status: | CLOSED ERRATA | QA Contact: | hongyan li <hongyli> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.8 | CC: | alegrand, anpicker, erooth, juzhao, kakkoyun, lcosic, pkrupa, spasquie | ||||
| Target Milestone: | --- | Keywords: | Regression | ||||
| Target Release: | 4.8.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-07-27 23:04:13 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
hongyan li
2021-04-28 07:13:48 UTC
Monitoring-Metrics console works well Grafana UI displays well Get 'Application is not available' when access both Prometheus UI and AlertManager UI # oc -n openshift-monitoring get svc prometheus-k8s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE prometheus-k8s ClusterIP 172.30.87.10 <none> 9091/TCP,9092/TCP 4h42m # oc -n openshift-monitoring get pod -o wide | grep prometheus-k8s prometheus-k8s-0 7/7 Running 1 4h20m 10.131.0.33 ip-10-0-197-153.ap-northeast-1.compute.internal <none> <none> prometheus-k8s-1 7/7 Running 1 4h17m 10.128.2.30 ip-10-0-142-190.ap-northeast-1.compute.internal <none> <none> # oc debug node/ip-10-0-197-153.ap-northeast-1.compute.internal sh-4.4# chroot /host sh-4.4# iptables-save | grep 9091 -A KUBE-SERVICES ! -s 10.128.0.0/14 -d 172.30.212.37/32 -p tcp -m comment --comment "openshift-monitoring/thanos-querier:web cluster IP" -m tcp --dport 9091 -j KUBE-MARK-MASQ -A KUBE-SERVICES -d 172.30.212.37/32 -p tcp -m comment --comment "openshift-monitoring/thanos-querier:web cluster IP" -m tcp --dport 9091 -j KUBE-SVC-G5A7ID5ATXHWKRS5 -A KUBE-SEP-6K6USKENEZOOGRJZ -p tcp -m comment --comment "openshift-monitoring/thanos-querier:web" -m tcp -j DNAT --to-destination 10.131.0.30:9091 -A KUBE-SEP-SSSHBF3TPULS5UN7 -p tcp -m comment --comment "openshift-monitoring/thanos-querier:web" -m tcp -j DNAT --to-destination 10.128.2.25:9091 -A KUBE-SERVICES -d 172.30.87.10/32 -p tcp -m comment --comment "openshift-monitoring/prometheus-k8s:web has no endpoints" -m tcp --dport 9091 -j REJECT --reject-with icmp-port-unreachable ************************************************ no prometheus-k8s:web cluster IP setting from the result, example: -A KUBE-SERVICES -d 172.30.87.10/32 -p tcp -m comment --comment "openshift-monitoring/prometheus-k8s:web cluster IP" -m tcp --dport 9091 -j KUBE-SVC-DCLNKYLNAMROIJRV # oc -n openshift-monitoring get ep
NAME ENDPOINTS AGE
alertmanager-main <none> 4h55m
alertmanager-operated 10.128.2.31:9095,10.128.2.32:9095,10.131.0.35:9095 + 6 more... 4h55m
cluster-monitoring-operator 10.128.0.89:8443 5h7m
grafana 10.128.2.29:3000 4h55m
kube-state-metrics 10.131.0.26:8443,10.131.0.26:9443 5h7m
node-exporter 10.0.142.190:9100,10.0.153.211:9100,10.0.168.159:9100 + 3 more... 5h7m
openshift-state-metrics 10.131.0.29:8443,10.131.0.29:9443 5h7m
prometheus-adapter 10.128.2.27:6443,10.131.0.27:6443 5h7m
prometheus-k8s <none> 4h55m
prometheus-k8s-thanos-sidecar <none> 4h55m
prometheus-operated 10.128.2.30:9091,10.131.0.33:9091,10.128.2.30:10901 + 1 more... 4h55m
prometheus-operator 10.130.0.79:8080,10.130.0.79:8443 5h7m
telemeter-client 10.128.2.24:8443 5h7m
thanos-querier 10.128.2.25:9093,10.131.0.30:9093,10.128.2.25:9092 + 3 more... 5h7m
# oc -n openshift-monitoring get ep prometheus-k8s -oyaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
endpoints.kubernetes.io/last-change-trigger-time: "2021-04-28T03:51:36Z"
creationTimestamp: "2021-04-28T03:51:48Z"
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/managed-by: cluster-monitoring-operator
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: openshift-monitoring
app.kubernetes.io/version: 2.24.0
prometheus: k8s
name: prometheus-k8s
namespace: openshift-monitoring
resourceVersion: "22584"
uid: 7f3573f1-e28d-4785-a172-de03797da1cb
checked with 4.8.0-0.nightly-2021-04-29-063720, can login prometheus UI now, see the attached picture, all the endpoints are normal
# oc -n openshift-monitoring get ep
NAME ENDPOINTS AGE
alertmanager-main 10.128.2.27:9095,10.131.0.32:9095,10.131.0.38:9095 + 3 more... 66m
alertmanager-operated 10.128.2.27:9095,10.131.0.32:9095,10.131.0.38:9095 + 6 more... 66m
cluster-monitoring-operator 10.130.0.76:8443 74m
grafana 10.128.2.24:3000 66m
kube-state-metrics 10.131.0.26:8443,10.131.0.26:9443 74m
node-exporter 10.0.0.3:9100,10.0.0.4:9100,10.0.0.5:9100 + 3 more... 74m
openshift-state-metrics 10.131.0.30:8443,10.131.0.30:9443 74m
prometheus-adapter 10.128.2.23:6443,10.131.0.29:6443 74m
prometheus-k8s 10.128.2.29:9092,10.131.0.34:9092,10.128.2.29:9091 + 1 more... 66m
prometheus-k8s-thanos-sidecar 10.128.2.29:10902,10.131.0.34:10902 66m
prometheus-operated 10.128.2.29:9091,10.131.0.34:9091,10.128.2.29:10901 + 1 more... 66m
prometheus-operator 10.128.0.96:8080,10.128.0.96:8443 74m
telemeter-client 10.128.2.25:8443 74m
thanos-querier 10.128.2.36:9093,10.129.2.33:9093,10.128.2.36:9092 + 3 more... 74m
# oc -n openshift-monitoring get sts prometheus-k8s -oyaml
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/managed-by: cluster-monitoring-operator
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: openshift-monitoring
app.kubernetes.io/version: 2.24.0
# oc -n openshift-monitoring get sts alertmanager-main -oyaml
labels:
alertmanager: main
app.kubernetes.io/component: alert-router
app.kubernetes.io/managed-by: cluster-monitoring-operator
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: openshift-monitoring
app.kubernetes.io/version: 0.21.0
# oc -n openshift-monitoring get pod --show-labels | grep -E "prometheus-k8s|alertmanager-main"
alertmanager-main-0 5/5 Running 0 46m alertmanager=main,app.kubernetes.io/component=alert-router,app.kubernetes.io/managed-by=cluster-monitoring-operator,app.kubernetes.io/name=alertmanager,app.kubernetes.io/part-of=openshift-monitoring,app.kubernetes.io/version=0.21.0,app=alertmanager,controller-revision-hash=alertmanager-main-78f7cc764d,statefulset.kubernetes.io/pod-name=alertmanager-main-0
alertmanager-main-1 5/5 Running 0 51m alertmanager=main,app.kubernetes.io/component=alert-router,app.kubernetes.io/managed-by=cluster-monitoring-operator,app.kubernetes.io/name=alertmanager,app.kubernetes.io/part-of=openshift-monitoring,app.kubernetes.io/version=0.21.0,app=alertmanager,controller-revision-hash=alertmanager-main-78f7cc764d,statefulset.kubernetes.io/pod-name=alertmanager-main-1
alertmanager-main-2 5/5 Running 0 45m alertmanager=main,app.kubernetes.io/component=alert-router,app.kubernetes.io/managed-by=cluster-monitoring-operator,app.kubernetes.io/name=alertmanager,app.kubernetes.io/part-of=openshift-monitoring,app.kubernetes.io/version=0.21.0,app=alertmanager,controller-revision-hash=alertmanager-main-78f7cc764d,statefulset.kubernetes.io/pod-name=alertmanager-main-2
prometheus-k8s-0 7/7 Running 1 45m app.kubernetes.io/component=prometheus,app.kubernetes.io/managed-by=cluster-monitoring-operator,app.kubernetes.io/name=prometheus,app.kubernetes.io/part-of=openshift-monitoring,app.kubernetes.io/version=2.24.0,app=prometheus,controller-revision-hash=prometheus-k8s-588f669d48,operator.prometheus.io/name=k8s,operator.prometheus.io/shard=0,prometheus=k8s,statefulset.kubernetes.io/pod-name=prometheus-k8s-0
prometheus-k8s-1 7/7 Running 1 51m app.kubernetes.io/component=prometheus,app.kubernetes.io/managed-by=cluster-monitoring-operator,app.kubernetes.io/name=prometheus,app.kubernetes.io/part-of=openshift-monitoring,app.kubernetes.io/version=2.24.0,app=prometheus,controller-revision-hash=prometheus-k8s-588f669d48,operator.prometheus.io/name=k8s,operator.prometheus.io/shard=0,prometheus=k8s,statefulset.kubernetes.io/pod-name=prometheus-k8s-1
no need to remove managed-by label, maybe caused by other issues
Created attachment 1777086 [details]
prometheus UI can login now
We couldn't merge the PR [1] that fixed the selector labels in time because the CMO CI pipeline was broken (for other reasons). So we decided to revert the prometheus operator bump [2]. I'm moving the bug to MODIFIED. [1] https://github.com/openshift/cluster-monitoring-operator/pull/1138 [2] https://github.com/openshift/prometheus-operator/pull/116 Verified with payload 4.8.0-0.nightly-2021-04-29-151418 Prometheus UI works well Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |