Bug 1797897
| Summary: | After masters stopped and restarted, cluster is dead | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Xingxing Xia <xxia> |
| Component: | Etcd Operator | Assignee: | Sam Batschelet <sbatsche> |
| Status: | CLOSED ERRATA | QA Contact: | Xingxing Xia <xxia> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 4.4 | CC: | aos-bugs, geliu, jialiu, mfojtik, tnozicka |
| Target Milestone: | --- | Keywords: | TestBlocker |
| Target Release: | 4.4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-05-04 11:33:06 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Xingxing Xia
2020-02-04 07:49:04 UTC
etcd is not starting and I am frightened to think the static etcd pod init container waits for kube
```
root@xxia04-6q4xb-m-0 core]# crictl ps -a | grep wait-for-kube
61d65b5ac1d93 c5fb513ba6473e74dfe8606378886fa8402c24df3c96ff2203db4593ed35a9fa 50 seconds ago Exited wait-for-kube 59 b274aa7d51007
[root@xxia04-6q4xb-m-0 core]# crictl logs -f 61d65b5ac1d93
F0204 08:38:02.018434 1 waitforkube.go:36] kube env not populated
```
Feb 04 07:55:14 xxia04-6q4xb-m-0.c.openshift-qe.internal hyperkube[10302]: E0204 07:55:14.607865 10302 pod_workers.go:191] Error syncing pod 3bb792a6006f0085d193d2c3c95dccf0 ("etcd-member-xxia04-6q4xb-m-0.c.openshift-qe.internal_openshift-etcd(3bb792a6006f0085d193d2c3c95dccf0)"), skipping: failed to "StartContainer" for "wait-for-kube" with CrashLoopBackOff: "back-off 5m0s restarting failed container=wait-for-kube pod=etcd-member-xxia04-6q4xb-m-0.c.openshift-qe.internal_openshift-etcd(3bb792a6006f0085d193d2c3c95dccf0)"
Feb 04 07:55:14 xxia04-6q4xb-m-0.c.openshift-qe.internal hyperkube[10302]: I0204 07:55:14.607870 10302 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-etcd", Name:"etcd-member-xxia04-6q4xb-m-0.c.openshift-qe.internal", UID:"3bb792a6006f0085d193d2c3c95dccf0", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.initContainers{wait-for-kube}"}): type: 'Warning' reason: 'BackOff' Back-off restarting failed container
Feb 04 07:55:18 xxia04-6q4xb-m-0.c.openshift-qe.internal hyperkube[10302]: I0204 07:55:18.972697 10302 worker.go:215] Non-running container probed: etcd-member-xxia04-6q4xb-m-0.c.openshift-qe.internal_openshift-etcd(3bb792a6006f0085d193d2c3c95dccf0) - etcd-member
Adding TestBlocker keyword since blocking DR scenario testing and bug 1771410 verification I also hit the same issue in upi on baremetal install with 4.4.0-0.nightly-2020-02-11-035407 Since irrelevant to cloud, tried IPI on AWS with 4.4.0-0.nightly-2020-02-13-212616. Above issue is fixed thus moving to verified, but hit another issue: bug 1802944 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |