Description of the problem: Since the upgrade to Openshift 4.7.11, Observability thanos-store-shard pods are in crashloopback state. observability-observatorium-thanos-store-shard-0-0 0/1 CrashLoopBackOff 194 16h observability-observatorium-thanos-store-shard-1-0 0/1 CrashLoopBackOff 195 16h observability-observatorium-thanos-store-shard-2-0 0/1 CrashLoopBackOff 195 16h It complain about the DNS request for memcached service. lookup _client._tcp.observability-observatorium-thanos-store-memcached.open-cluster-management-observability.svc on 172.30.0.10:53: cannot unmarshal DNS message" I found a similar issue, not sure if this is related. https://bugzilla.redhat.com/show_bug.cgi?id=1953518 Release version: Advanced Cluster Management 2.2.3 OCP version: Openshift 4.7.13 Steps to reproduce 1 - Install ACM operator 2 - Install MCM resource 3 - Install Observability ressource Thanks,
Not sure if this is important, but I have experience the similar situation with OCP version 4.6.31 and RHACM 2.2.3 reviewing the above mentioned bugzilla record https://bugzilla.redhat.com/show_bug.cgi?id=1953518 and links referenced there mainly: https://bugzilla.redhat.com/show_bug.cgi?id=1966116 and https://access.redhat.com/solutions/5984291 with slide modification of patch deployment procedure described in https://access.redhat.com/solutions/5984291 I was able to get the observability-observatorium-thanos-store-shard-X-0 pods out of CrashLoopBackOff and without the following errors related to "unmarshal DNS message" messages like the following one. level=error ts=2021-06-14T13:41:24.803494563Z caller=memcached_client.go:560 msg="failed to resolve addresses for memcached" addresses=dnssrv+_client._tcp.observability-observatorium-thanos-store-memcached.open-cluster-management-observability.svc err="lookup SRV records \"_client._tcp.observability-observatorium-thanos-store-memcached.open-cluster-management-observability.svc\": lookup _client._tcp.observability-observatorium-thanos-store-memcached.open-cluster-management-observability.svc on 172.30.0.10:53: cannot unmarshal DNS message" I tried both settings (bufsize as well as force_tcp) but even if I only used bufsize: 512 was enough to solve issues in my case. can you disclose how will this issue be solved in RHACM 2.2.4?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Advanced Cluster Management 2.2.4 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2461