Bug 2086506

Summary: OVN-Kubernetes CNO upgrade logic broken for Hypershift
Product: OpenShift Container Platform Reporter: Casey Callendrello <cdc>
Component: NetworkingAssignee: zenghui.shi <zshi>
Networking sub component: ovn-kubernetes QA Contact: Ross Brattain <rbrattai>
Status: VERIFIED --- Docs Contact:
Severity: high    
Priority: high CC: rbrattai
Version: 4.11Keywords: TestBlocker
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Casey Callendrello 2022-05-16 11:14:16 UTC
All of the CNO OVN-Kubernetes upgrade logic assumes the presence of a master DaemonSet. This is clearly wrong -- Hypershift uses a StatefulSet.

Comment 5 Ross Brattain 2022-07-01 01:34:20 UTC
Management cluster upgrade from 4.11.0-fc.3 to 4.11.0-rc.0 stuck after 11h17m

version   4.11.0-fc.3   True        True          11h     Working towards 4.11.0-rc.0: 647 of 802 done (80% complete), waiting on network

[
  {
    "completionTime": null,
    "image": "quay.io/openshift-release-dev/ocp-release:4.11.0-rc.0-x86_64",
    "startedTime": "2022-06-30T14:13:44Z",
    "state": "Partial",
    "verified": false,
    "version": "4.11.0-rc.0"
  },
  {
    "completionTime": "2022-06-30T13:15:52Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:af2fc44a39aaef937ce2eb895c61f2c40d8ec721c99eb866cc8e6d1a4c1b0401",
    "startedTime": "2022-06-30T12:47:58Z",
    "state": "Completed",
    "verified": false,
    "version": "4.11.0-fc.3"
  }
]

Comment 7 Ross Brattain 2022-07-18 16:28:20 UTC
blocking Hypershift OVN upgrading.

Comment 10 Ross Brattain 2022-10-13 23:37:16 UTC
Err, wait, the statefulest is in the hostedcluster not the management cluster.

So the original mgmt upgrade test was wrong.

oc patch -n clusters hostedcluster $(oc get -n clusters hostedcluster  -o jsonpath='{.items[0].metadata.name}') -p='{"spec": {"release": {"image": "quay.io/openshift-release-dev/ocp-release:4.12.0-ec.4-x86_64"}}}' --type=merge

oc get -n clusters hostedcluster  -o jsonpath='{.items[*].status.version.history}' |jq '. | sort_by(.startedTime) '

Comment 11 Ross Brattain 2022-10-14 16:13:25 UTC
Hosted cluster upgrade from 4.11.9 to 4.12.0-ec.4 failed in ovnkube-node, so maybe different issue but blocking full upgrade success.

kube-scheduler                             4.12.0-ec.4   True        False         False      8h
kube-storage-version-migrator              4.12.0-ec.4   True        False         False      8h
monitoring                                 4.12.0-ec.4   True        False         False      5h49m
network                                    4.11.9        True        True          True       8h      DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2022-10-13T22:14:09Z
node-tuning                                4.12.0-ec.4   True        True          False      3h38m   Waiting for 1/3 Profiles to be applied
openshift-apiserver                        4.12.0-ec.4   True        False         False      8h
openshift-controller-manager               4.12.0-ec.4   True        False         False      8h

ovnkube-node-lnnxh   4/5     CrashLoopBackOff   21 (3m2s ago)   85m     10.0.141.120   ip-10-0-141-120.compute.internal   <none>           <none>

Will gather logs and file new bug.

Comment 12 Ross Brattain 2022-10-17 23:54:29 UTC
Intermittent failures upgrade hostedCluster and then nodepool.  

If I just upgrade mgmt cluster then hostedCluster upgrade seems to succeed.

Verified on 4.11.9 to 4.12.0-0.nightly-2022-10-15-094115

oc get -n clusters hostedcluster  -o jsonpath='{.items[*].status.version.history}' |jq '. | sort_by(.startedTime) '

[
  {
    "completionTime": "2022-10-16T20:46:50Z",
    "image": "quay.io/openshift-release-dev/ocp-release:4.11.9-x86_64",
    "startedTime": "2022-10-16T20:34:57Z",
    "state": "Completed",
    "verified": false,
    "version": "4.11.9"
  },
  {
    "completionTime": "2022-10-17T00:15:14Z",
    "image": "registry.ci.openshift.org/ocp/release:4.12.0-0.nightly-2022-10-15-094115",
    "startedTime": "2022-10-16T23:58:21Z",
    "state": "Completed",
    "verified": false,
    "version": "4.12.0-0.nightly-2022-10-15-094115"
  }
]