Description of problem: Upgrading from 4.2.12 to 4.3.0-0.nightly-2019-12-20-145137 got stuck at 13%: Cluster did not complete upgrade: timed out waiting for the condition: Working towards 4.3.0-0.nightly-2019-12-20-145137: 13% complete The mechine-config clusteroperator reported the following status conditions: "conditions": [ { "lastTransitionTime": "2019-12-20T16:16:38Z", "message": "Cluster not available for 4.3.0-0.nightly-2019-12-20-145137", "status": "False", "type": "Available" }, { "lastTransitionTime": "2019-12-20T16:00:59Z", "message": "Working towards 4.3.0-0.nightly-2019-12-20-145137", "status": "True", "type": "Progressing" }, { "lastTransitionTime": "2019-12-20T16:16:38Z", "message": "Unable to apply 4.3.0-0.nightly-2019-12-20-145137: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-f0cd2de7cae40c363de564c65600efa1 expected 23a6e6fb37e73501bc3216183ef5e6ebb15efc7a has d780d197a9c5848ba786982c0c4aaa7487297046, retrying", "reason": "RequiredPoolsFailed", "status": "True", "type": "Degraded" }, { "lastTransitionTime": "2019-12-20T15:22:23Z", "reason": "AsExpected", "status": "True", "type": "Upgradeable" } ], Two of the nodes report "Kubelet stopped posting node status." in all their status conditions. See https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/13004 Additional info: The "controller version mismatch" error looks similar to bug 1781141 ("controller version mismatch" when upgrading from 4.2.9 to 4.3).
I dont see this error in the MCC: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/13004/artifacts/e2e-aws-upgrade/pods/openshift-machine-config-operator_machine-config-controller-64798bf44b-vwxfv_machine-config-controller.log Looking here it seems that it was in the middle of the upgrade and nothing is degraded: one worker and one master are unavailable. https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/13004/artifacts/e2e-aws-upgrade/machineconfigpools.json The problem seems to lie in kube-apiserver?: Dec 20 16:15:09.190: INFO: cluster upgrade is Progressing: Working towards 4.3.0-0.nightly-2019-12-20-145137: 84% complete, waiting on machine-config Dec 20 16:15:19.186: INFO: cluster upgrade is Progressing: Unable to apply 4.3.0-0.nightly-2019-12-20-145137: the cluster operator kube-apiserver is degraded Dec 20 16:15:19.186: INFO: cluster upgrade is Failing: Cluster operator kube-apiserver is reporting a failure: NodeControllerDegraded: The master node(s) "ip-10-0-136-200.ec2.internal" not ready ... Dec 20 16:23:39.186: INFO: cluster upgrade is Progressing: Working towards 4.3.0-0.nightly-2019-12-20-145137: 13% complete Dec 20 16:23:49.185: INFO: cluster upgrade is Progressing: Unable to apply 4.3.0-0.nightly-2019-12-20-145137: the cluster operator kube-apiserver is degraded Dec 20 16:23:49.185: INFO: cluster upgrade is Failing: Cluster operator kube-apiserver is reporting a failure: NodeControllerDegraded: The master node(s) "ip-10-0-136-200.ec2.internal" not ready Dec 20 16:23:59.186: INFO: cluster upgrade is Progressing: Unable to apply 4.3.0-0.nightly-2019-12-20-145137: the cluster operator kube-apiserver is degraded Also seeing https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/13004: Dec 20 16:15:18.481 E clusterversion/version changed Failing to True: ClusterOperatorDegraded: Cluster operator kube-apiserver is reporting a failure: NodeControllerDegraded: The master node(s) "ip-10-0-136-200.ec2.internal" not ready Dec 20 16:16:38.498 E clusteroperator/machine-config changed Degraded to True: RequiredPoolsFailed: Unable to apply 4.3.0-0.nightly-2019-12-20-145137: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-f0cd2de7cae40c363de564c65600efa1 expected 23a6e6fb37e73501bc3216183ef5e6ebb15efc7a has d780d197a9c5848ba786982c0c4aaa7487297046, retrying
Actually is this a dupe of : https://bugzilla.redhat.com/show_bug.cgi?id=1778904
Per comment #2, closing as DUPLICATE *** This bug has been marked as a duplicate of bug 1778904 ***