All of the CNO OVN-Kubernetes upgrade logic assumes the presence of a master DaemonSet. This is clearly wrong -- Hypershift uses a StatefulSet.
Management cluster upgrade from 4.11.0-fc.3 to 4.11.0-rc.0 stuck after 11h17m version 4.11.0-fc.3 True True 11h Working towards 4.11.0-rc.0: 647 of 802 done (80% complete), waiting on network [ { "completionTime": null, "image": "quay.io/openshift-release-dev/ocp-release:4.11.0-rc.0-x86_64", "startedTime": "2022-06-30T14:13:44Z", "state": "Partial", "verified": false, "version": "4.11.0-rc.0" }, { "completionTime": "2022-06-30T13:15:52Z", "image": "quay.io/openshift-release-dev/ocp-release@sha256:af2fc44a39aaef937ce2eb895c61f2c40d8ec721c99eb866cc8e6d1a4c1b0401", "startedTime": "2022-06-30T12:47:58Z", "state": "Completed", "verified": false, "version": "4.11.0-fc.3" } ]
blocking Hypershift OVN upgrading.
Err, wait, the statefulest is in the hostedcluster not the management cluster. So the original mgmt upgrade test was wrong. oc patch -n clusters hostedcluster $(oc get -n clusters hostedcluster -o jsonpath='{.items[0].metadata.name}') -p='{"spec": {"release": {"image": "quay.io/openshift-release-dev/ocp-release:4.12.0-ec.4-x86_64"}}}' --type=merge oc get -n clusters hostedcluster -o jsonpath='{.items[*].status.version.history}' |jq '. | sort_by(.startedTime) '
Hosted cluster upgrade from 4.11.9 to 4.12.0-ec.4 failed in ovnkube-node, so maybe different issue but blocking full upgrade success. kube-scheduler 4.12.0-ec.4 True False False 8h kube-storage-version-migrator 4.12.0-ec.4 True False False 8h monitoring 4.12.0-ec.4 True False False 5h49m network 4.11.9 True True True 8h DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2022-10-13T22:14:09Z node-tuning 4.12.0-ec.4 True True False 3h38m Waiting for 1/3 Profiles to be applied openshift-apiserver 4.12.0-ec.4 True False False 8h openshift-controller-manager 4.12.0-ec.4 True False False 8h ovnkube-node-lnnxh 4/5 CrashLoopBackOff 21 (3m2s ago) 85m 10.0.141.120 ip-10-0-141-120.compute.internal <none> <none> Will gather logs and file new bug.
Intermittent failures upgrade hostedCluster and then nodepool. If I just upgrade mgmt cluster then hostedCluster upgrade seems to succeed. Verified on 4.11.9 to 4.12.0-0.nightly-2022-10-15-094115 oc get -n clusters hostedcluster -o jsonpath='{.items[*].status.version.history}' |jq '. | sort_by(.startedTime) ' [ { "completionTime": "2022-10-16T20:46:50Z", "image": "quay.io/openshift-release-dev/ocp-release:4.11.9-x86_64", "startedTime": "2022-10-16T20:34:57Z", "state": "Completed", "verified": false, "version": "4.11.9" }, { "completionTime": "2022-10-17T00:15:14Z", "image": "registry.ci.openshift.org/ocp/release:4.12.0-0.nightly-2022-10-15-094115", "startedTime": "2022-10-16T23:58:21Z", "state": "Completed", "verified": false, "version": "4.12.0-0.nightly-2022-10-15-094115" } ]
OCP is no longer using Bugzilla and this bug appears to have been left in an orphaned state. If the bug is still relevant, please open a new issue in the OCPBUGS Jira project: https://issues.redhat.com/projects/OCPBUGS/summary