Bug 2016174 - etcd pod on CrashLoopBackOff after master replacement procedure
Summary: etcd pod on CrashLoopBackOff after master replacement procedure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.9.z
Assignee: Sam Batschelet
QA Contact: ge liu
URL:
Whiteboard:
Depends On: 2003775
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-20 20:29 UTC by OpenShift BugZilla Robot
Modified: 2023-11-16 14:57 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-10 21:02:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift etcd pull 99 0 None open [openshift-4.9] Bug 2016174: UPSTREAM: <carry>: server: Fix for v3.5 Ensure that cluster members stored in v2store and b... 2021-10-29 01:00:34 UTC
Red Hat Product Errata RHBA-2021:4119 0 None None None 2021-11-10 21:02:54 UTC

Comment 3 ge liu 2021-11-02 08:04:46 UTC
Verified with

replaced crashed master,

# oc get node 
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-149-187.us-east-2.compute.internal   Ready    worker   141m    v1.22.0-rc.0+a44d0f0
ip-10-0-159-4.us-east-2.compute.internal     Ready    master   146m    v1.22.0-rc.0+a44d0f0
ip-10-0-187-82.us-east-2.compute.internal    Ready    master   146m    v1.22.0-rc.0+a44d0f0
ip-10-0-188-189.us-east-2.compute.internal   Ready    worker   140m    v1.22.0-rc.0+a44d0f0
ip-10-0-205-206.us-east-2.compute.internal   Ready    worker   135m    v1.22.0-rc.0+a44d0f0
ip-10-0-209-57.us-east-2.compute.internal    Ready    master   5m15s   v1.22.0-rc.0+a44d0f0



then patch etcd cluster:

# oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge
etcd.operator.openshift.io/cluster patched


# oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.9.0-0.nightly-2021-11-01-180550   True        False         False      13m     
baremetal                                  4.9.0-0.nightly-2021-11-01-180550   True        False         False      144m    
cloud-controller-manager                   4.9.0-0.nightly-2021-11-01-180550   True        False         False      147m    
cloud-credential                           4.9.0-0.nightly-2021-11-01-180550   True        False         False      147m    
cluster-autoscaler                         4.9.0-0.nightly-2021-11-01-180550   True        False         False      144m    
config-operator                            4.9.0-0.nightly-2021-11-01-180550   True        False         False      146m    
console                                    4.9.0-0.nightly-2021-11-01-180550   True        False         False      126m    
csi-snapshot-controller                    4.9.0-0.nightly-2021-11-01-180550   True        False         False      145m    
dns                                        4.9.0-0.nightly-2021-11-01-180550   True        False         False      144m    
etcd                                       4.9.0-0.nightly-2021-11-01-180550   True        True          False      144m    NodeInstallerProgressing: 2 nodes are at revision 5; 1 nodes are at revision 6
image-registry                             4.9.0-0.nightly-2021-11-01-180550   True        False         False      132m    
ingress                                    4.9.0-0.nightly-2021-11-01-180550   True        False         False      130m    
insights                                   4.9.0-0.nightly-2021-11-01-180550   True        False         False      130m    
kube-apiserver                             4.9.0-0.nightly-2021-11-01-180550   True        True          False      132m    NodeInstallerProgressing: 1 nodes are at revision 8; 2 nodes are at revision 9
kube-controller-manager                    4.9.0-0.nightly-2021-11-01-180550   True        False         False      144m    
kube-scheduler                             4.9.0-0.nightly-2021-11-01-180550   True        False         False      143m    
kube-storage-version-migrator              4.9.0-0.nightly-2021-11-01-180550   True        False         False      145m    
...........................
..............

etcd member recovered quorum: 

sh-4.4# etcdctl member list -w table
+------------------+---------+-------------------------------------------+--------------------------+--------------------------+------------+
|        ID        | STATUS  |                   NAME                    |        PEER ADDRS        |       CLIENT ADDRS       | IS LEARNER |
+------------------+---------+-------------------------------------------+--------------------------+--------------------------+------------+
| 3820c8756d3e1144 | started | ip-10-0-187-82.us-east-2.compute.internal | https://10.0.187.82:2380 | https://10.0.187.82:2379 |      false |
| af71e9bd16b6458f | started | ip-10-0-209-57.us-east-2.compute.internal | https://10.0.209.57:2380 | https://10.0.209.57:2379 |      false |
| f83c9bc01cbbe3e7 | started |  ip-10-0-159-4.us-east-2.compute.internal |  https://10.0.159.4:2380 |  https://10.0.159.4:2379 |      false |
+------------------+---------+-------------------------------------------+--------------------------+--------------------------+------------+

Comment 6 errata-xmlrpc 2021-11-10 21:02:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.6 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4119


Note You need to log in before you can comment on or make changes to this bug.