Description of problem: Cluster started as 4.4.27 and upgraded to 4.5.14 OK. Upgrading from there to 4.6.0.rc3 hung with the nodes at kube 1.19 and the masters at 1.18 and this error for the MCO: Extension: Last Sync Error: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-c34c90c3ab3bcf286f4074a555f7c1ad expected 48d52f385642cbecf5c95e0ac4b0ec8c37664fe7 has 13dd7810adc20c7e6d99adc4179969eac54e7783: 0 (ready 0) out of 3 nodes are updating to latest configuration rendered-master-1e06e226a3c41e1ba728addf7e860cf6, retrying Master: 0 (ready 0) out of 3 nodes are updating to latest configuration rendered-master-1e06e226a3c41e1ba728addf7e860cf6 Worker: all 3 nodes are at latest configuration rendered-worker-74d8bdd6b44d383a653c72938cb7d6e8 Will add must-gather location shortly. Cluster is a UPI install with OVN network plugin on GCP and I will keep it around until tomorrow (14-Oct) Version-Release number of selected component (if applicable): 4.6.0.rc3 How reproducible: Unknown Steps to Reproduce: 1. Installed UPI on GCP at 4.4.27 2. Upgraded to 4.5.14 successfully. 3. Set channel to 4.6-candidate and upgrade to 4.6.rc3 Actual results: Upgrade hangs with an MCO error and mixed kubernetes versions for master/compute
adding Tim to assess weather those nodes just lose connectivity (ready 0)
12 hours later the cluster is still in this state - I'll put a kubeconfig location in a private comment. root@ip-172-31-64-58: ~ # oc get nodes NAME STATUS ROLES AGE VERSION mffiedler1013b-zk5xv-m-0.c.openshift-qe.internal Ready master 16h v1.18.3+970c1b3 mffiedler1013b-zk5xv-m-1.c.openshift-qe.internal Ready master 16h v1.18.3+970c1b3 mffiedler1013b-zk5xv-m-2.c.openshift-qe.internal Ready master 16h v1.18.3+970c1b3 mffiedler1013b-zk5xv-worker-a-2cngf.c.openshift-qe.internal Ready worker 16h v1.19.0+d59ce34 mffiedler1013b-zk5xv-worker-b-hvln5.c.openshift-qe.internal Ready worker 16h v1.19.0+d59ce34 mffiedler1013b-zk5xv-worker-c-ghllj.c.openshift-qe.internal Ready worker 16h v1.19.0+d59ce34 root@ip-172-31-64-58: ~ # oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.6.0-rc.3 True False False 6h10m cloud-credential 4.6.0-rc.3 True False False 16h cluster-autoscaler 4.6.0-rc.3 True False False 16h config-operator 4.6.0-rc.3 True False False 15h console 4.6.0-rc.3 True False False 14h csi-snapshot-controller 4.6.0-rc.3 True False False 14h dns 4.6.0-rc.3 True False False 14h etcd 4.6.0-rc.3 True False False 16h image-registry 4.6.0-rc.3 True False False 16h ingress 4.6.0-rc.3 True False False 14h insights 4.6.0-rc.3 True False False 16h kube-apiserver 4.6.0-rc.3 True False False 16h kube-controller-manager 4.6.0-rc.3 True False False 16h kube-scheduler 4.6.0-rc.3 True False False 16h kube-storage-version-migrator 4.6.0-rc.3 True False False 13h machine-api 4.6.0-rc.3 True False False 16h machine-approver 4.6.0-rc.3 True False False 15h machine-config 4.5.14 False True True 14h marketplace 4.6.0-rc.3 True False False 14h monitoring 4.6.0-rc.3 True False False 14h network 4.6.0-rc.3 True False False 16h node-tuning 4.6.0-rc.3 True False False 14h openshift-apiserver 4.6.0-rc.3 True False False 14h openshift-controller-manager 4.6.0-rc.3 True False False 16h openshift-samples 4.6.0-rc.3 True False False 14h operator-lifecycle-manager 4.6.0-rc.3 True False False 16h operator-lifecycle-manager-catalog 4.6.0-rc.3 True False False 16h operator-lifecycle-manager-packageserver 4.6.0-rc.3 True False False 14h service-ca 4.6.0-rc.3 True False False 16h storage 4.6.0-rc.3 True False False 14h
This looks the same as the other upgrade issues with OVN, the node running 4.5 has a screwed up br-local bridge with an extra patch port so kapi access wont work: [root@mffiedler1013b-zk5xv-m-0 ~]# ovs-vsctl show df305a14-74e2-4694-8e42-bebcc55fe21d Bridge br-local Port patch-lnet-node_local_switch-to-br-int Interface patch-lnet-node_local_switch-to-br-int type: patch options: {peer=patch-br-int-to-lnet-node_local_switch} Port ovn-k8s-gw0 Interface ovn-k8s-gw0 type: internal Port br-local Interface br-local type: internal Port patch--to-br-int Interface patch--to-br-int type: patch options: {peer=patch-br-int-to-} Port patch-br-local_mffiedler1013b-zk5xv-m-0.c.openshift-qe.internal-to-br-int Interface patch-br-local_mffiedler1013b-zk5xv-m-0.c.openshift-qe.internal-to-br-int type: patch options: {peer=patch-br-int-to-br-local_mffiedler1013b-zk5xv-m-0.c.openshift-qe.internal}
*** This bug has been marked as a duplicate of bug 1880591 ***
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475