Description of problem: Upgrade cluster from 4.1.9 to 4.2.0-0.nightly-2019-08-01-113533, only machine-config operator is not upgraded. Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-08-01-113533 How reproducible: 50% Steps to Reproduce: 1. 2. 3. Actual results: $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.2.0-0.nightly-2019-08-01-113533 True False False 22h cloud-credential 4.2.0-0.nightly-2019-08-01-113533 True False False 23h cluster-autoscaler 4.2.0-0.nightly-2019-08-01-113533 True False False 23h console 4.2.0-0.nightly-2019-08-01-113533 True False False 23h dns 4.2.0-0.nightly-2019-08-01-113533 True False False 23h image-registry 4.2.0-0.nightly-2019-08-01-113533 True False False 7h19m ingress 4.2.0-0.nightly-2019-08-01-113533 True False False 23h kube-apiserver 4.2.0-0.nightly-2019-08-01-113533 True False False 23h kube-controller-manager 4.2.0-0.nightly-2019-08-01-113533 True False False 23h kube-scheduler 4.2.0-0.nightly-2019-08-01-113533 True False False 23h machine-api 4.2.0-0.nightly-2019-08-01-113533 True False False 23h machine-config 4.1.9 False True True 7h50m marketplace 4.2.0-0.nightly-2019-08-01-113533 True False False 8h monitoring 4.2.0-0.nightly-2019-08-01-113533 False True True 7h4m network 4.2.0-0.nightly-2019-08-01-113533 True False False 23h node-tuning 4.2.0-0.nightly-2019-08-01-113533 True False False 8h openshift-apiserver 4.2.0-0.nightly-2019-08-01-113533 True False False 92m openshift-controller-manager 4.2.0-0.nightly-2019-08-01-113533 True False False 23h openshift-samples 4.2.0-0.nightly-2019-08-01-113533 True False False 18h operator-lifecycle-manager 4.2.0-0.nightly-2019-08-01-113533 True False False 23h operator-lifecycle-manager-catalog 4.2.0-0.nightly-2019-08-01-113533 True False False 23h operator-lifecycle-manager-packageserver 4.2.0-0.nightly-2019-08-01-113533 True False False 4h7m service-ca 4.2.0-0.nightly-2019-08-01-113533 True False False 23h service-catalog-apiserver 4.2.0-0.nightly-2019-08-01-113533 True False False 4h6m service-catalog-controller-manager 4.2.0-0.nightly-2019-08-01-113533 True False False 17h storage 4.2.0-0.nightly-2019-08-01-113533 True False False 18h support 4.2.0-0.nightly-2019-08-01-113533 True False False 18h $ oc get co/machine-config -o yaml apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: creationTimestamp: "2019-08-05T03:47:57Z" generation: 1 name: machine-config resourceVersion: "1094784" selfLink: /apis/config.openshift.io/v1/clusteroperators/machine-config uid: d04f2a1b-b733-11e9-ad70-02e77de128dc spec: {} status: conditions: - lastTransitionTime: "2019-08-05T19:05:28Z" message: Cluster not available for 4.2.0-0.nightly-2019-08-01-113533 status: "False" type: Available - lastTransitionTime: "2019-08-05T18:42:37Z" message: Working towards 4.2.0-0.nightly-2019-08-01-113533 status: "True" type: Progressing - lastTransitionTime: "2019-08-05T19:05:28Z" message: 'Unable to apply 4.2.0-0.nightly-2019-08-01-113533: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-ce8adbbe7a871e63d2f9fe30bf489c6f expected 6e75b3fe9bb02eeef9756d8b6ff1a85e790944e3 has 83392b13a5c17e56656acf3f7b0031e3303fb5c0, retrying' reason: FailedToSync status: "True" type: Degraded - lastTransitionTime: "2019-08-05T19:05:28Z" reason: AsExpected status: "True" type: Upgradeable extension: lastSyncError: 'pool master has not progressed to latest configuration: controller version mismatch for rendered-master-ce8adbbe7a871e63d2f9fe30bf489c6f expected 6e75b3fe9bb02eeef9756d8b6ff1a85e790944e3 has 83392b13a5c17e56656acf3f7b0031e3303fb5c0, retrying' worker: all 3 nodes are at latest configuration rendered-worker-5f6dd4e5c2ad1322fbf6120f4d0916d7 relatedObjects: - group: "" name: openshift-machine-config-operator resource: namespaces - group: machineconfiguration.openshift.io name: master resource: machineconfigpools - group: machineconfiguration.openshift.io name: worker resource: machineconfigpools - group: machineconfiguration.openshift.io name: cluster resource: controllerconfigs versions: - name: operator version: 4.1.9 Expected results: machine-config operator can be upgraded from 4.1 to 4.2. Additional info: Tried to upgrade another cluster with same upgrade path and succeed.
This looks similar to the error in BZ#1734531
Does this reconcile eventually? That message is saying that the MCC hasn't generated the newest rendered machineconfigs for the new version which is ok as the MCC may have not run yet.
No, it stuck in that status more than one day until cluster was destroyed. This issue is not 100% reproducible, another cluster was upgraded successfully.
Two other QE (including myself) reproduced this yesterday. Let me know what additional info is required and I can attempt again.
This could mean the new MCC hasn't rolled out yet uhm we need must-gather to check system logs as well
@sgreen can you confirm that you are seeing etcdquorum guard issues that might be impacting this upgrade? If so, you can mark as a dupe for 1742744
Yep, can confirm that this is an etcdquorum guard issue. Visible in must-gather/namespaces/openshift-machine-config-operator/pods/machine-config-daemon-z8l6v/machine-config-daemon/machine-config-daemon/logs/current.log Several thousand lines of the following error: 2019-08-06T03:14:57.218366261Z I0806 03:14:57.218315 123397 update.go:89] error when evicting pod "etcd-quorum-guard-8646778784-phtjq" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. Marking as dupe.
*** This bug has been marked as a duplicate of bug 1742744 ***