Hide Forgot
+++ This bug was initially created as a clone of Bug #1998552 +++ The API Server Operator will set Upgradeable=False whenever any of the nodes within the cluster are at the skew limit; that is, when an upgrade of the API Server would exceed the allowable kubelet version skew.
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.
The LifecycleStale keyword was removed because the bug moved to QE. The bug assignee was notified.
@luis After pausing machine-config-pool for worker, upgrade is failing for 4.5->4.6->4.7 and multiple cluster operators are degraded. oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2021-12-15-152825 True True 9h Unable to apply 4.7.0-0.nightly-2021-12-17-022306: an unknown error has occurred: MultipleErrors NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.7.0-0.nightly-2021-12-17-022306 False True True 111m baremetal 4.7.0-0.nightly-2021-12-17-022306 True False False 117m cloud-credential 4.7.0-0.nightly-2021-12-17-022306 True False False 5h53m cluster-autoscaler 4.7.0-0.nightly-2021-12-17-022306 True False False 5h38m config-operator 4.7.0-0.nightly-2021-12-17-022306 True False False 5h38m console 4.7.0-0.nightly-2021-12-17-022306 True False True 114m csi-snapshot-controller 4.7.0-0.nightly-2021-12-17-022306 True False False 115m dns 4.6.0-0.nightly-2021-12-15-152825 True False False 175m etcd 4.7.0-0.nightly-2021-12-17-022306 True False False 5h42m image-registry 4.7.0-0.nightly-2021-12-17-022306 True False False 5h33m ingress 4.7.0-0.nightly-2021-12-17-022306 True False True 3h12m insights 4.7.0-0.nightly-2021-12-17-022306 True False False 5h39m kube-apiserver 4.7.0-0.nightly-2021-12-17-022306 True False False 5h41m kube-controller-manager 4.7.0-0.nightly-2021-12-17-022306 True False False 5h41m kube-scheduler 4.7.0-0.nightly-2021-12-17-022306 True False False 5h41m kube-storage-version-migrator 4.7.0-0.nightly-2021-12-17-022306 True False False 5h33m machine-api 4.7.0-0.nightly-2021-12-17-022306 True False False 5h36m machine-approver 4.7.0-0.nightly-2021-12-17-022306 True False False 5h39m machine-config 4.6.0-0.nightly-2021-12-15-152825 True False False 152m marketplace 4.7.0-0.nightly-2021-12-17-022306 True False False 114m monitoring 4.7.0-0.nightly-2021-12-17-022306 True False False 164m network 4.6.0-0.nightly-2021-12-15-152825 True True True 5h43m node-tuning 4.7.0-0.nightly-2021-12-17-022306 True False False 115m openshift-apiserver 4.7.0-0.nightly-2021-12-17-022306 True False False 3h openshift-controller-manager 4.7.0-0.nightly-2021-12-17-022306 True False False 3h1m openshift-samples 4.7.0-0.nightly-2021-12-17-022306 True False False 115m operator-lifecycle-manager 4.7.0-0.nightly-2021-12-17-022306 True False False 5h43m operator-lifecycle-manager-catalog 4.7.0-0.nightly-2021-12-17-022306 True False False 5h43m operator-lifecycle-manager-packageserver 4.7.0-0.nightly-2021-12-17-022306 True False False 3h1m service-ca 4.7.0-0.nightly-2021-12-17-022306 True False False 5h44m storage 4.7.0-0.nightly-2021-12-17-022306 True False False 161m Details - http://pastebin.test.redhat.com/1016562 http://pastebin.test.redhat.com/1016594 http://pastebin.test.redhat.com/1016598 must-gather-link https://drive.google.com/file/d/1PXHgSRiDbliTSqOgYdH5AfV_oo1CkjfJ/view?usp=sharing
Must gather from comment 5: $ tar xz --strip-components=1 <must-gather.local.4931375460397206808.tar.gz $ yaml2json <cluster-scoped-resources/config.openshift.io/clusterversions.yaml | jq -r '.items[].status.history[] | .startedTime + " " + (.completionTime // "-") + " " + .state + " " + .version' 2021-12-17T08:24:55Z - Partial 4.7.0-0.nightly-2021-12-17-022306 2021-12-17T07:14:20Z 2021-12-17T08:20:10Z Completed 4.6.0-0.nightly-2021-12-15-152825 2021-12-17T04:57:57Z 2021-12-17T05:29:20Z Completed 4.5.0-0.nightly-2021-09-07-164108 $ yaml2json <cluster-scoped-resources/config.openshift.io/clusterversions.yaml | jq -r '.items[].status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2021-12-17T05:29:20Z Available=True : Done applying 4.6.0-0.nightly-2021-12-15-152825 2021-12-17T13:56:13Z Failing=False : 2021-12-17T08:24:55Z Progressing=True : Working towards 4.7.0-0.nightly-2021-12-17-022306: 505 of 668 done (75% complete) 2021-12-17T04:57:57Z RetrievedUpdates=False VersionNotFound: Unable to retrieve available updates: currently reconciling cluster version 4.7.0-0.nightly-2021-12-17-022306 not found in the "stable-4.5" channel 2021-12-17T08:31:49Z Upgradeable=False KubeletMinorVersion_KubeletMinorVersionUnsupported: Cluster operator kube-apiserver cannot be upgraded between minor versions: KubeletMinorVersionUpgradeable: Unsupported kubelet minor versions on nodes ip-10-0-129-213.us-east-2.compute.internal, ip-10-0-186-172.us-east-2.compute.internal, and ip-10-0-213-183.us-east-2.compute.internal are too far behind the target API server version (1.20.11). Hmm... No cluster-version operator logs in this must-gather? $ ls namespaces/openshift-cluster-version/ monitoring.coreos.com Hard to know for sure without CVO logs, but I suspect bug 2018368 may be involved in part of this. Doesn't explain why the other operators would be degraded though, and I haven't poked into those. Anyhow, for the purpose of this backport, you can see that we got far enough into 4.7 to update the Kube API-server operator, and that new API-server operator is appropriately complaining about the old 1.18 kubelets from the stuck-on-4.5 compute nodes: $ grep -r ' kubeletVersion' cluster-scoped-resources/core/nodes cluster-scoped-resources/core/nodes/ip-10-0-166-226.us-east-2.compute.internal.yaml: kubeletVersion: v1.19.16+845f228 cluster-scoped-resources/core/nodes/ip-10-0-152-231.us-east-2.compute.internal.yaml: kubeletVersion: v1.19.16+845f228 cluster-scoped-resources/core/nodes/ip-10-0-186-172.us-east-2.compute.internal.yaml: kubeletVersion: v1.18.3+d8ef5ad cluster-scoped-resources/core/nodes/ip-10-0-129-213.us-east-2.compute.internal.yaml: kubeletVersion: v1.18.3+d8ef5ad cluster-scoped-resources/core/nodes/ip-10-0-221-161.us-east-2.compute.internal.yaml: kubeletVersion: v1.19.16+845f228 cluster-scoped-resources/core/nodes/ip-10-0-213-183.us-east-2.compute.internal.yaml: kubeletVersion: v1.18.3+d8ef5ad Bug 2018356 is up with some wording suggestions, but that would be its own backport series if those get picked up. So I think this bug can be marked VERIFIED as it stands, with it's successful demonstration that 4.7 Kube API-server operator complains as we'd expect it to. And the other issues that kept it from being a nice, clean update can be followed up in other bugs.
@Trevor Please move the bug again on QA then we can move the bug to verified.
@Trevor As per your comment and checked with SDN team, upgrade is not failing due to PR code. It is failing due to SDN issue https://bugzilla.redhat.com/show_bug.cgi?id=1916029. Moving bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.41 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0117