Bug 2091806
Summary: | Cluster upgrade stuck due to "resource deletions in progress" | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Paul Webster <pauwebst> |
Component: | Cluster Version Operator | Assignee: | Jack Ottofaro <jack.ottofaro> |
Status: | CLOSED ERRATA | QA Contact: | liujia <jiajliu> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.9 | CC: | aos-team-ota, ddelcian, jack.ottofaro, lmohanty, yanyang |
Target Milestone: | --- | Keywords: | Upgrades |
Target Release: | 4.9.z | ||
Hardware: | All | ||
OS: | Other | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-09 14:00:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2064991 | ||
Bug Blocks: |
Description
Paul Webster
2022-05-31 05:52:24 UTC
(In reply to Paul Webster from comment #0) > Description of problem: > While attempting to upgrade cluster from 4.6.17 to 4.10.9 through the > upgrade path detailed below, the final upgrade from 4.9.28 to 4.10.9 was > blocked with message: > > Cluster minor level upgrades are not allowed while resource deletions are in > progress; resources=PrometheusRule > "openshift-authentication-operator/authentication-operator",rolebinding > "openshift-machine-api/machine-api-termination-handler",PrometheusRule > "openshift-kube-apiserver/kube-apiserver",role > "openshift-machine-api/machine-api-termination-handler" > > The issue was eventually resolved by resetting the cluster upgrade using the > command: > > $ oc adm upgrade --clear > Any idea how long they waited on the first upgrade request before "clear"ing it? This is a known issue and will require a back port of https://bugzilla.redhat.com/show_bug.cgi?id=1822752 to fix. Reproduced on path 4.8.36 -> 4.9.28 -> 4.10.9 1. Trigger upgrade from 4.8.36 to 4.9.28. 2. Monitor above upgrade, once it finishes, trigger a new upgrade to 4.10(w/o --force) immediately while there is still Upgradeable=False condition (It’s a very short period before it run into ResourceDeletesInProgress status, if we did not trigger the upgrade in this period, then no issue) 3. After trigger the upgrade w/o--force while upgradeable=false, no upgrade will happen as expected and it will prompt `it may not be safe to apply this update` error due to Upgradeable=False. # ./oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.28 True True 23m Unable to apply 4.10.23: it may not be safe to apply this update 4. Do nothing to wait for ResourceDeletes(>30min), the ResourceDeletes does not complete with the above status stuck(unexpected) # ./oc adm upgrade info: An upgrade is in progress. Unable to apply 4.10.9: it may not be safe to apply this update Upgradeable=False Reason: ResourceDeletesInProgress Message: Cluster minor level upgrades are not allowed while resource deletions are in progress; resources=PrometheusRule "openshift-kube-apiserver/kube-apiserver" 5. Run `oc adm upgrade –clear` to cancel the update to 4.10.9 due to Upgradeable=False and then re-trigger the update, upgrade start successfully. Verified on 4.8.46 -> 4.9.0-0.nightly-2022-07-21-221241 -> 4.10.24 At the beginning, it still prompts the error due to we trigger upgrade while upgradeable=false. # ./oc adm upgrade info: An upgrade is in progress. Unable to apply 4.10.24: it may not be safe to apply this update Upgradeable=False Reason: ResourceDeletesInProgress Message: Cluster minor level upgrades are not allowed while resource deletions are in progress; resources=PrometheusRule "openshift-kube-apiserver/kube-apiserver" Do nothing to wait for ResourceDeletes, after several minutes, the ResourceDeletes complete and the upgrade starts successfully and succeeds finally. # ./oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2022-07-21-221241 True True 4m57s Working towards 4.10.24: 95 of 773 done (12% complete) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.9.45 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5879 |