Bug 1993757
Summary: | OCP 4.8 etcd unhealthy | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | kevin <welin> |
Component: | Etcd | Assignee: | Sam Batschelet <sbatsche> |
Status: | CLOSED ERRATA | QA Contact: | Sandeep <skundu> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.8 | CC: | geliu, skundu |
Target Milestone: | --- | ||
Target Release: | 4.9.0 | ||
Hardware: | x86_64 | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-10-18 17:46:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1994483 |
Description
kevin
2021-08-16 05:42:22 UTC
This issue is cosmetic, the workaround for now would be to drop the --cluster flag from etcdctl command. ``` etcdctl endpoint health ``` Please find the below steps and the observations on 4.8 cluster: $ oc get nodes NAME STATUS ROLES AGE VERSION skundu-ver-1-g8d9k-master-0.c.openshift-qe.internal Ready master 3h23m v1.21.1+9807387 skundu-ver-1-g8d9k-master-1.c.openshift-qe.internal Ready master 3h23m v1.21.1+9807387 skundu-ver-1-g8d9k-master-2.c.openshift-qe.internal Ready master 3h23m v1.21.1+9807387 skundu-ver-1-g8d9k-worker-a-pc869.c.openshift-qe.internal Ready worker 3h16m v1.21.1+9807387 skundu-ver-1-g8d9k-worker-b-z4r5c.c.openshift-qe.internal Ready worker 3h16m v1.21.1+9807387 skundu-ver-1-g8d9k-worker-c-2xnvk.c.openshift-qe.internal Ready worker 3h17m v1.21.1+9807387 ____________________________________________________________________________________________________________________________________________________________________________________ $ oc rsh -n openshift-etcd etcd-skundu-ver-1-g8d9k-master-0.c.openshift-qe.internal Defaulting container name to etcdctl. Use 'oc describe pod/etcd-skundu-ver-1-g8d9k-master-0.c.openshift-qe.internal -n openshift-etcd' to see all of the containers in this pod. sh-4.4# sh-4.4# etcd --version etcd Version: 3.4.14 Git SHA: 95a9769 Go Version: go1.12.12 Go OS/Arch: linux/amd64 sh-4.4# etcdctl member list -w table +------------------+---------+-----------------------------------------------------+-----------------------+------------------------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+-----------------------------------------------------+-----------------------+------------------------------------------+------------+ | 44f821c73f39e4fc | started | skundu-ver-1-g8d9k-master-2.c.openshift-qe.internal | https://10.0.0.2:2380 | https://10.0.0.2:2379,unixs://10.0.0.2:0 | false | | 6b00e473bd74e3cb | started | skundu-ver-1-g8d9k-master-1.c.openshift-qe.internal | https://10.0.0.5:2380 | https://10.0.0.5:2379,unixs://10.0.0.5:0 | false | | a1a7b97340cb643c | started | skundu-ver-1-g8d9k-master-0.c.openshift-qe.internal | https://10.0.0.4:2380 | https://10.0.0.4:2379,unixs://10.0.0.4:0 | false | +------------------+---------+-----------------------------------------------------+-----------------------+------------------------------------------+------------+ ______________________________________________________________________________________________________________________________________________________________________________________ sh-4.4# etcdctl endpoint health --cluster {"level":"warn","ts":"2021-08-24T14:54:12.808Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-8d09a2a4-911c-45f1-9053-6f6115b2551f/10.0.0.5:0","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix 10.0.0.5:0: connect: no such file or directory\""} {"level":"warn","ts":"2021-08-24T14:54:12.808Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-8c368e3c-5ebb-4e70-9b9d-f8ba25bbab1c/10.0.0.4:0","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix 10.0.0.4:0: connect: no such file or directory\""} {"level":"warn","ts":"2021-08-24T14:54:12.808Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-aeabbe4e-9930-4a77-ba02-795b82c8541d/10.0.0.2:0","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix 10.0.0.2:0: connect: no such file or directory\""} https://10.0.0.4:2379 is healthy: successfully committed proposal: took = 16.253078ms https://10.0.0.5:2379 is healthy: successfully committed proposal: took = 20.409062ms https://10.0.0.2:2379 is healthy: successfully committed proposal: took = 21.275478ms unixs://10.0.0.5:0 is unhealthy: failed to commit proposal: context deadline exceeded unixs://10.0.0.4:0 is unhealthy: failed to commit proposal: context deadline exceeded unixs://10.0.0.2:0 is unhealthy: failed to commit proposal: context deadline exceeded Error: unhealthy cluster sh-4.4# ______________________________________________________________________________________________________________________________________________________________________________________ The issue as reported in the bug continues to exist on 4.8 ______________________________________________________________________________________________________________________________________________________________________________________ sh-4.4# etcdctl endpoint health https://10.0.0.5:2379 is healthy: successfully committed proposal: took = 29.146886ms https://10.0.0.4:2379 is healthy: successfully committed proposal: took = 29.136991ms https://10.0.0.2:2379 is healthy: successfully committed proposal: took = 41.413365ms ______________________________________________________________________________________________________________________________________________________________________________________ The workaround as mentioned above works correctly. (without the --cluster flag) @skundu, I will change status for you based on 3, for you have not access right to change bug status. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |