Description of problem: Hi, trying to configure user-workload-monitoring on OCP 4.7.10, but thanos-ruler pods got failed to start up for "cannot unmarshal DNS message" Version-Release number of selected component (if applicable): # oc version Client Version: 4.7.10 Server Version: 4.7.10 Kubernetes Version: v1.20.0+e3fdce4 # oc -n openshift-user-workload-monitoring get po NAME READY STATUS RESTARTS AGE prometheus-operator-f754fdb6-r7s4g 2/2 Running 0 3m9s prometheus-user-workload-0 5/5 Running 1 3m4s prometheus-user-workload-1 5/5 Running 1 3m4s thanos-ruler-user-workload-0 2/3 CrashLoopBackOff 4 3m2s thanos-ruler-user-workload-1 2/3 CrashLoopBackOff 4 3m2s pod/thanos-ruler-user-workload-0 log -------------------------------------------- level=warn ts=2021-05-19T09:04:11.681836334Z caller=intrumentation.go:54 component=rules msg="changing probe status" status=not-ready reason="lookup SRV records \"_web._tcp.alertmanager-operated.openshift-monitoring.svc\": lookup _web._tcp.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: cannot unmarshal DNS message" level=info ts=2021-05-19T09:04:11.681850165Z caller=grpc.go:123 component=rules service=gRPC/server component=rule msg="internal server is shutting down" err="lookup SRV records \"_web._tcp.alertmanager-operated.openshift-monitoring.svc\": lookup _web._tcp.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: cannot unmarshal DNS message" level=info ts=2021-05-19T09:04:11.681873189Z caller=grpc.go:136 component=rules service=gRPC/server component=rule msg="gracefully stopping internal server" level=info ts=2021-05-19T09:04:11.681917501Z caller=grpc.go:149 component=rules service=gRPC/server component=rule msg="internal server is shutdown gracefully" err="lookup SRV records \"_web._tcp.alertmanager-operated.openshift-monitoring.svc\": lookup _web._tcp.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: cannot unmarshal DNS message" level=warn ts=2021-05-19T09:04:11.681933626Z caller=intrumentation.go:54 component=rules msg="changing probe status" status=not-ready reason="lookup SRV records \"_web._tcp.alertmanager-operated.openshift-monitoring.svc\": lookup _web._tcp.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: cannot unmarshal DNS message" level=info ts=2021-05-19T09:04:11.681942583Z caller=http.go:65 component=rules service=http/server component=rule msg="internal server is shutting down" err="lookup SRV records \"_web._tcp.alertmanager-operated.openshift-monitoring.svc\": lookup _web._tcp.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: cannot unmarshal DNS message" level=error ts=2021-05-19T09:04:12.086697314Z caller=rule.go:774 component=rules err="read query instant response: perform GET request against https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query: Post \"https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query\": context canceled" query="absent(up{job=\"prometheus-example-app\",namespace=\"mon-ns1\"} == 1)" level=warn ts=2021-05-19T09:04:12.086746013Z caller=manager.go:598 component=rules group=example msg="Evaluating rule failed" rule="alert: prometheus-example-app-instance-down-alert\nexpr: absent(up{job=\"prometheus-example-app\",namespace=\"mon-ns1\"} == 1)\nlabels:\n namespace: mon-ns1\n severity: warning\nannotations:\n message: Instance down alert triggered, prometheus-example-app instance was down.\n" err="no query API server reachable" level=info ts=2021-05-19T09:04:12.182198137Z caller=http.go:84 component=rules service=http/server component=rule msg="internal server is shutdown gracefully" err="lookup SRV records \"_web._tcp.alertmanager-operated.openshift-monitoring.svc\": lookup _web._tcp.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: cannot unmarshal DNS message" level=info ts=2021-05-19T09:04:12.182617499Z caller=intrumentation.go:66 component=rules msg="changing probe status" status=not-healthy reason="lookup SRV records \"_web._tcp.alertmanager-operated.openshift-monitoring.svc\": lookup _web._tcp.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: cannot unmarshal DNS message" level=error ts=2021-05-19T09:04:12.182982177Z caller=main.go:157 err="lookup SRV records \"_web._tcp.alertmanager-operated.openshift-monitoring.svc\": lookup _web._tcp.alertmanager-operated.openshift-monitoring.svc on 172.30.0.10:53: cannot unmarshal DNS message\nrule command failed\nmain.main\n\t/go/src/github.com/improbable-eng/thanos/cmd/thanos/main.go:157\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:204\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_s390x.s:779" How reproducible: always Steps to Reproduce: 1.enableUserWorkload monitoring. 2. 3. Actual results: thanos-ruler pods failed to start up ,error "cannot unmarshal DNS message" Expected results: pod should come to running state. Additional info:
*** This bug has been marked as a duplicate of bug 1953518 ***