Bug 2086506 - OVN-Kubernetes CNO upgrade logic broken for Hypershift
Summary: OVN-Kubernetes CNO upgrade logic broken for Hypershift
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: zenghui.shi
QA Contact: Ross Brattain
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-16 11:14 UTC by Casey Callendrello
Modified: 2024-04-30 18:04 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-04-30 18:04:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1447 0 None open Bug 2086506: hypershift: respect statefulset when upgrading ovnk 2022-05-17 08:55:37 UTC

Description Casey Callendrello 2022-05-16 11:14:16 UTC
All of the CNO OVN-Kubernetes upgrade logic assumes the presence of a master DaemonSet. This is clearly wrong -- Hypershift uses a StatefulSet.

Comment 5 Ross Brattain 2022-07-01 01:34:20 UTC
Management cluster upgrade from 4.11.0-fc.3 to 4.11.0-rc.0 stuck after 11h17m

version   4.11.0-fc.3   True        True          11h     Working towards 4.11.0-rc.0: 647 of 802 done (80% complete), waiting on network

[
  {
    "completionTime": null,
    "image": "quay.io/openshift-release-dev/ocp-release:4.11.0-rc.0-x86_64",
    "startedTime": "2022-06-30T14:13:44Z",
    "state": "Partial",
    "verified": false,
    "version": "4.11.0-rc.0"
  },
  {
    "completionTime": "2022-06-30T13:15:52Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:af2fc44a39aaef937ce2eb895c61f2c40d8ec721c99eb866cc8e6d1a4c1b0401",
    "startedTime": "2022-06-30T12:47:58Z",
    "state": "Completed",
    "verified": false,
    "version": "4.11.0-fc.3"
  }
]

Comment 7 Ross Brattain 2022-07-18 16:28:20 UTC
blocking Hypershift OVN upgrading.

Comment 10 Ross Brattain 2022-10-13 23:37:16 UTC
Err, wait, the statefulest is in the hostedcluster not the management cluster.

So the original mgmt upgrade test was wrong.

oc patch -n clusters hostedcluster $(oc get -n clusters hostedcluster  -o jsonpath='{.items[0].metadata.name}') -p='{"spec": {"release": {"image": "quay.io/openshift-release-dev/ocp-release:4.12.0-ec.4-x86_64"}}}' --type=merge

oc get -n clusters hostedcluster  -o jsonpath='{.items[*].status.version.history}' |jq '. | sort_by(.startedTime) '

Comment 11 Ross Brattain 2022-10-14 16:13:25 UTC
Hosted cluster upgrade from 4.11.9 to 4.12.0-ec.4 failed in ovnkube-node, so maybe different issue but blocking full upgrade success.

kube-scheduler                             4.12.0-ec.4   True        False         False      8h
kube-storage-version-migrator              4.12.0-ec.4   True        False         False      8h
monitoring                                 4.12.0-ec.4   True        False         False      5h49m
network                                    4.11.9        True        True          True       8h      DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2022-10-13T22:14:09Z
node-tuning                                4.12.0-ec.4   True        True          False      3h38m   Waiting for 1/3 Profiles to be applied
openshift-apiserver                        4.12.0-ec.4   True        False         False      8h
openshift-controller-manager               4.12.0-ec.4   True        False         False      8h

ovnkube-node-lnnxh   4/5     CrashLoopBackOff   21 (3m2s ago)   85m     10.0.141.120   ip-10-0-141-120.compute.internal   <none>           <none>

Will gather logs and file new bug.

Comment 12 Ross Brattain 2022-10-17 23:54:29 UTC
Intermittent failures upgrade hostedCluster and then nodepool.  

If I just upgrade mgmt cluster then hostedCluster upgrade seems to succeed.

Verified on 4.11.9 to 4.12.0-0.nightly-2022-10-15-094115

oc get -n clusters hostedcluster  -o jsonpath='{.items[*].status.version.history}' |jq '. | sort_by(.startedTime) '

[
  {
    "completionTime": "2022-10-16T20:46:50Z",
    "image": "quay.io/openshift-release-dev/ocp-release:4.11.9-x86_64",
    "startedTime": "2022-10-16T20:34:57Z",
    "state": "Completed",
    "verified": false,
    "version": "4.11.9"
  },
  {
    "completionTime": "2022-10-17T00:15:14Z",
    "image": "registry.ci.openshift.org/ocp/release:4.12.0-0.nightly-2022-10-15-094115",
    "startedTime": "2022-10-16T23:58:21Z",
    "state": "Completed",
    "verified": false,
    "version": "4.12.0-0.nightly-2022-10-15-094115"
  }
]

Comment 13 Rory Thrasher 2024-04-30 18:04:53 UTC
OCP is no longer using Bugzilla and this bug appears to have been left in an orphaned state. If the bug is still relevant, please open a new issue in the OCPBUGS Jira project: https://issues.redhat.com/projects/OCPBUGS/summary


Note You need to log in before you can comment on or make changes to this bug.