Bug 1967890
Summary: | Observability Thanos store shard crashing - cannot unmarshal DNS message | ||
---|---|---|---|
Product: | Red Hat Advanced Cluster Management for Kubernetes | Reporter: | Martin Ouimet <mouimet> |
Component: | Core Services / Observability | Assignee: | Chunlin Yang <chuyang> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhacm-2.2 | CC: | borazem |
Target Milestone: | --- | Flags: | ming:
rhacm-2.2.z+
|
Target Release: | rhacm-2.2.4 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | --- | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-06-16 19:28:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Martin Ouimet
2021-06-04 10:20:17 UTC
Not sure if this is important, but I have experience the similar situation with OCP version 4.6.31 and RHACM 2.2.3 reviewing the above mentioned bugzilla record https://bugzilla.redhat.com/show_bug.cgi?id=1953518 and links referenced there mainly: https://bugzilla.redhat.com/show_bug.cgi?id=1966116 and https://access.redhat.com/solutions/5984291 with slide modification of patch deployment procedure described in https://access.redhat.com/solutions/5984291 I was able to get the observability-observatorium-thanos-store-shard-X-0 pods out of CrashLoopBackOff and without the following errors related to "unmarshal DNS message" messages like the following one. level=error ts=2021-06-14T13:41:24.803494563Z caller=memcached_client.go:560 msg="failed to resolve addresses for memcached" addresses=dnssrv+_client._tcp.observability-observatorium-thanos-store-memcached.open-cluster-management-observability.svc err="lookup SRV records \"_client._tcp.observability-observatorium-thanos-store-memcached.open-cluster-management-observability.svc\": lookup _client._tcp.observability-observatorium-thanos-store-memcached.open-cluster-management-observability.svc on 172.30.0.10:53: cannot unmarshal DNS message" I tried both settings (bufsize as well as force_tcp) but even if I only used bufsize: 512 was enough to solve issues in my case. can you disclose how will this issue be solved in RHACM 2.2.4? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Advanced Cluster Management 2.2.4 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2461 |