Bug 1742744
| Summary: | [ci][upgrade] Cluster upgrade fails because of machine config wedging | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> |
| Component: | Machine Config Operator | Assignee: | Kirsten Garrison <kgarriso> |
| Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.2.0 | CC: | alegrand, anpicker, erooth, ffranz, geliu, hongli, juzhao, kgarriso, lcosic, mloibl, pkrupa, pmuller, sbatsche, surbania |
| Target Milestone: | --- | ||
| Target Release: | 4.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1740372 | Environment: | |
| Last Closed: | 2019-10-16 06:36:19 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Clayton Coleman
2019-08-16 20:11:37 UTC
seeing in https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.1-to-4.2/98/artifacts/e2e-aws-upgrade/pods/openshift-machine-config-operator_machine-config-daemon-txcfh_machine-config-daemon.log ```I0815 07:01:21.402817 130669 update.go:89] error when evicting pod "etcd-quorum-guard-85c9bf4f89-tczjg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0815 07:01:26.409812 130669 update.go:89] error when evicting pod "etcd-quorum-guard-85c9bf4f89-tczjg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. .. .. I0815 08:34:14.949610 130669 update.go:89] error when evicting pod "etcd-quorum-guard-85c9bf4f89-tczjg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0815 08:34:19.956818 130669 update.go:89] error when evicting pod "etcd-quorum-guard-85c9bf4f89-tczjg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. ``` for approx 1.5 hours? *** Bug 1733305 has been marked as a duplicate of this bug. *** *** Bug 1737678 has been marked as a duplicate of this bug. *** >I0815 08:34:14.949610 130669 update.go:89] error when evicting pod "etcd-quorum-guard-85c9bf4f89-tczjg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. >I0815 08:34:19.956818 130669 update.go:89] error when evicting pod "etcd-quorum-guard-85c9bf4f89-tczjg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. This is actually etcd-quorum-guard working correctly. If we look at the pods[1] for this run you will notice that only 2 exist for etcd. That is because the manifest deployed for that member does not contain an image for setup-etcd-environment. Recent changes have taken place with MCO which made this image part of the MCO image vs separate and might have been to blame? Because one etcd pod was missing we could not lose another per the message. Now we have a few other bugs attached to this I want to make sure they are all the same before we close. But I think it would be interesting to know why exactly the image was not populated for the spec. ``` Aug 15 08:36:54 ip-10-0-136-216 hyperkube[1515]: E0815 08:36:54.063849 1515 file.go:187] Can't process manifest file "/etc/kubernetes/manifests/etcd-member.yaml": invalid pod: [spec.initContainers[0].image: Required value] ``` [1] https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.1-to-4.2/98/artifacts/e2e-aws-upgrade/pods/ *** Bug 1737799 has been marked as a duplicate of this bug. *** Sam and I would like to try to get https://github.com/openshift/machine-config-operator/pull/1057 in which updates the etcd DR images, to see if this helps. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |