Bug 2003775
| Summary: | etcd pod on CrashLoopBackOff after master replacement procedure | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | rlobillo | ||||
| Component: | Etcd | Assignee: | Nobody <nobody> | ||||
| Status: | CLOSED ERRATA | QA Contact: | ge liu <geliu> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 4.9 | CC: | alray, htariq, mcornea, yprokule | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.10.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2022-03-10 16:10:01 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 2016174 | ||||||
| Attachments: |
|
||||||
|
Description
rlobillo
2021-09-13 16:54:11 UTC
Can you please verify each step you performed vs a link to the steps?
For example are you sure that you stopped etcd my moving the etcd-pod.yaml from /etc/kubernetes/manifests.
Then removed the data directory of the failed member.
`rm -rf /var/lib/etcd`
Next removed the etcd member `etcdctl member remove $ID`
Then after that force a new rollout.
`oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "single-master-recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge `
Finally was master-0 the member you replaced I assume?
I see the ansible logs now.. reviewing. I believe this is an upstream bug related to new logic around the handling of membership data[1],[2]. [1] https://github.com/etcd-io/etcd/issues/13196 [2] https://github.com/etcd-io/etcd/pull/13348. Thanks Sam. Removing NEEDINFO flag. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |