Bug 1797897
Summary: | After masters stopped and restarted, cluster is dead | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Xingxing Xia <xxia> |
Component: | Etcd Operator | Assignee: | Sam Batschelet <sbatsche> |
Status: | CLOSED ERRATA | QA Contact: | Xingxing Xia <xxia> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 4.4 | CC: | aos-bugs, geliu, jialiu, mfojtik, tnozicka |
Target Milestone: | --- | Keywords: | TestBlocker |
Target Release: | 4.4.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-05-04 11:33:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Xingxing Xia
2020-02-04 07:49:04 UTC
etcd is not starting and I am frightened to think the static etcd pod init container waits for kube ``` root@xxia04-6q4xb-m-0 core]# crictl ps -a | grep wait-for-kube 61d65b5ac1d93 c5fb513ba6473e74dfe8606378886fa8402c24df3c96ff2203db4593ed35a9fa 50 seconds ago Exited wait-for-kube 59 b274aa7d51007 [root@xxia04-6q4xb-m-0 core]# crictl logs -f 61d65b5ac1d93 F0204 08:38:02.018434 1 waitforkube.go:36] kube env not populated ``` Feb 04 07:55:14 xxia04-6q4xb-m-0.c.openshift-qe.internal hyperkube[10302]: E0204 07:55:14.607865 10302 pod_workers.go:191] Error syncing pod 3bb792a6006f0085d193d2c3c95dccf0 ("etcd-member-xxia04-6q4xb-m-0.c.openshift-qe.internal_openshift-etcd(3bb792a6006f0085d193d2c3c95dccf0)"), skipping: failed to "StartContainer" for "wait-for-kube" with CrashLoopBackOff: "back-off 5m0s restarting failed container=wait-for-kube pod=etcd-member-xxia04-6q4xb-m-0.c.openshift-qe.internal_openshift-etcd(3bb792a6006f0085d193d2c3c95dccf0)" Feb 04 07:55:14 xxia04-6q4xb-m-0.c.openshift-qe.internal hyperkube[10302]: I0204 07:55:14.607870 10302 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-etcd", Name:"etcd-member-xxia04-6q4xb-m-0.c.openshift-qe.internal", UID:"3bb792a6006f0085d193d2c3c95dccf0", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.initContainers{wait-for-kube}"}): type: 'Warning' reason: 'BackOff' Back-off restarting failed container Feb 04 07:55:18 xxia04-6q4xb-m-0.c.openshift-qe.internal hyperkube[10302]: I0204 07:55:18.972697 10302 worker.go:215] Non-running container probed: etcd-member-xxia04-6q4xb-m-0.c.openshift-qe.internal_openshift-etcd(3bb792a6006f0085d193d2c3c95dccf0) - etcd-member Adding TestBlocker keyword since blocking DR scenario testing and bug 1771410 verification I also hit the same issue in upi on baremetal install with 4.4.0-0.nightly-2020-02-11-035407 Since irrelevant to cloud, tried IPI on AWS with 4.4.0-0.nightly-2020-02-13-212616. Above issue is fixed thus moving to verified, but hit another issue: bug 1802944 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |