Hide Forgot
+++ This bug was initially created as a clone of Bug #1957646 +++ issue is fixed with 4.7.0-0.nightly-2021-05-07-004616 # oc -n openshift-user-workload-monitoring get po --show-labels NAME READY STATUS RESTARTS AGE LABELS prometheus-operator-8d4d69888-fc8k9 2/2 Running 0 3m8s app.kubernetes.io/component=controller,app.kubernetes.io/name=prometheus-operator,app.kubernetes.io/version=v0.44.1,pod-template-hash=8d4d69888 prometheus-user-workload-0 5/5 Running 1 3m4s app=prometheus,controller-revision-hash=prometheus-user-workload-99c9d5494,operator.prometheus.io/name=user-workload,operator.prometheus.io/shard=0,prometheus=user-workload,statefulset.kubernetes.io/pod-name=prometheus-user-workload-0 prometheus-user-workload-1 5/5 Running 1 3m4s app=prometheus,controller-revision-hash=prometheus-user-workload-99c9d5494,operator.prometheus.io/name=user-workload,operator.prometheus.io/shard=0,prometheus=user-workload,statefulset.kubernetes.io/pod-name=prometheus-user-workload-1 thanos-ruler-user-workload-0 3/3 Running 0 3m1s app=thanos-ruler,controller-revision-hash=thanos-ruler-user-workload-7bbdf8c4,statefulset.kubernetes.io/pod-name=thanos-ruler-user-workload-0,thanos-ruler=user-workload thanos-ruler-user-workload-1 3/3 Running 0 3m1s app=thanos-ruler,controller-revision-hash=thanos-ruler-user-workload-7bbdf8c4,statefulset.kubernetes.io/pod-name=thanos-ruler-user-workload-1,thanos-ruler=user-workload
reproduced with 4.6.0-0.nightly-2021-05-15-131411 # oc -n openshift-user-workload-monitoring get po | grep thanos-ruler-user-workload thanos-ruler-user-workload-0 2/3 CrashLoopBackOff 3 84s thanos-ruler-user-workload-1 2/3 CrashLoopBackOff 3 84s # oc -n openshift-user-workload-monitoring describe pod thanos-ruler-user-workload-0 thanos-ruler: ... State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Message: cords \"_web._tcp.alertmanager-operated.openshift-monitoring.svc\": lookup _web._tcp.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: cannot unmarshal DNS message" level=warn ts=2021-05-19T08:10:30.051226335Z caller=intrumentation.go:54 component=rules msg="changing probe status" status=not-ready reason="lookup SRV records \"_web._tcp.alertmanager-operated.openshift-monitoring.svc\": lookup _web._tcp.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: cannot unmarshal DNS message" level=info ts=2021-05-19T08:10:30.051238915Z caller=http.go:64 component=rules service=http/server component=rule msg="internal server is shutting down" err="lookup SRV records \"_web._tcp.alertmanager-operated.openshift-monitoring.svc\": lookup _web._tcp.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: cannot unmarshal DNS message" level=info ts=2021-05-19T08:10:30.551420564Z caller=http.go:83 component=rules service=http/server component=rule msg="internal server is shutdown gracefully" err="lookup SRV records \"_web._tcp.alertmanager-operated.openshift-monitoring.svc\": lookup _web._tcp.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: cannot unmarshal DNS message" level=info ts=2021-05-19T08:10:30.551515258Z caller=intrumentation.go:66 component=rules msg="changing probe status" status=not-healthy reason="lookup SRV records \"_web._tcp.alertmanager-operated.openshift-monitoring.svc\": lookup _web._tcp.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: cannot unmarshal DNS message" level=error ts=2021-05-19T08:10:30.55161247Z caller=main.go:212 err="lookup SRV records \"_web._tcp.alertmanager-operated.openshift-monitoring.svc\": lookup _web._tcp.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: cannot unmarshal DNS message\nrule command failed\nmain.main\n\t/go/src/github.com/improbable-eng/thanos/cmd/thanos/main.go:212\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:204\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1374"
tested with the not merged PR, issue is fixed # oc -n openshift-user-workload-monitoring get po --show-labels NAME READY STATUS RESTARTS AGE LABELS prometheus-operator-644fd69b76-pwgfk 2/2 Running 0 17m app.kubernetes.io/component=controller,app.kubernetes.io/name=prometheus-operator,app.kubernetes.io/version=v0.42.1,pod-template-hash=644fd69b76 prometheus-user-workload-0 4/4 Running 1 17m app=prometheus,controller-revision-hash=prometheus-user-workload-587d78bbdc,prometheus=user-workload,statefulset.kubernetes.io/pod-name=prometheus-user-workload-0 prometheus-user-workload-1 4/4 Running 1 17m app=prometheus,controller-revision-hash=prometheus-user-workload-587d78bbdc,prometheus=user-workload,statefulset.kubernetes.io/pod-name=prometheus-user-workload-1 thanos-ruler-user-workload-0 3/3 Running 0 17m app=thanos-ruler,controller-revision-hash=thanos-ruler-user-workload-7d4c766bc6,statefulset.kubernetes.io/pod-name=thanos-ruler-user-workload-0,thanos-ruler=user-workload thanos-ruler-user-workload-1 3/3 Running 0 17m app=thanos-ruler,controller-revision-hash=thanos-ruler-user-workload-7d4c766bc6,statefulset.kubernetes.io/pod-name=thanos-ruler-user-workload-1,thanos-ruler=user-workload
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.31 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2100