https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.1-to-4.2/98 https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.1-to-4.2/98/artifacts/e2e-aws-upgrade/e2e.log The job is trying to upgrade to 4.2 from 4.1, after which it will rollback. The job never gets to success, stopping at 88% waiting for machine config. Aug 15 06:33:53.499 W clusterversion/version changed Progressing to True: DownloadingUpdate: Working towards registry.svc.ci.openshift.org/ocp/release:4.2.0-0.ci-2019-08-14-232724: downloading update Aug 15 06:33:55.215 - 15s W clusterversion/version cluster is updating to Aug 15 06:34:25.215 - 7140s W clusterversion/version cluster is updating to 4.2.0-0.ci-2019-08-14-232724 Aug 15 07:13:08.600 E clusterversion/version changed Failing to True: ClusterOperatorNotAvailable: Cluster operator machine-config is still updating A Urgent because 4.1 to 4.2 upgrades should never break.
seeing in https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.1-to-4.2/98/artifacts/e2e-aws-upgrade/pods/openshift-machine-config-operator_machine-config-daemon-txcfh_machine-config-daemon.log ```I0815 07:01:21.402817 130669 update.go:89] error when evicting pod "etcd-quorum-guard-85c9bf4f89-tczjg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0815 07:01:26.409812 130669 update.go:89] error when evicting pod "etcd-quorum-guard-85c9bf4f89-tczjg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. .. .. I0815 08:34:14.949610 130669 update.go:89] error when evicting pod "etcd-quorum-guard-85c9bf4f89-tczjg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0815 08:34:19.956818 130669 update.go:89] error when evicting pod "etcd-quorum-guard-85c9bf4f89-tczjg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. ``` for approx 1.5 hours?
*** Bug 1733305 has been marked as a duplicate of this bug. ***
*** Bug 1737678 has been marked as a duplicate of this bug. ***
>I0815 08:34:14.949610 130669 update.go:89] error when evicting pod "etcd-quorum-guard-85c9bf4f89-tczjg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. >I0815 08:34:19.956818 130669 update.go:89] error when evicting pod "etcd-quorum-guard-85c9bf4f89-tczjg" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. This is actually etcd-quorum-guard working correctly. If we look at the pods[1] for this run you will notice that only 2 exist for etcd. That is because the manifest deployed for that member does not contain an image for setup-etcd-environment. Recent changes have taken place with MCO which made this image part of the MCO image vs separate and might have been to blame? Because one etcd pod was missing we could not lose another per the message. Now we have a few other bugs attached to this I want to make sure they are all the same before we close. But I think it would be interesting to know why exactly the image was not populated for the spec. ``` Aug 15 08:36:54 ip-10-0-136-216 hyperkube[1515]: E0815 08:36:54.063849 1515 file.go:187] Can't process manifest file "/etc/kubernetes/manifests/etcd-member.yaml": invalid pod: [spec.initContainers[0].image: Required value] ``` [1] https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.1-to-4.2/98/artifacts/e2e-aws-upgrade/pods/
*** Bug 1737799 has been marked as a duplicate of this bug. ***
Sam and I would like to try to get https://github.com/openshift/machine-config-operator/pull/1057 in which updates the etcd DR images, to see if this helps.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922