Description of problem: Do upgrade test against 4.4.0-rc.6 to 4.4.0-0.nightly-2020-04-04-025830 path with upgradeable=false condition. Upgrade can not start due to precondition test fail. # ./oc adm upgrade info: An upgrade is in progress. Unable to apply registry.svc.ci.openshift.org/ocp/release@sha256:5e727bba8407a963fb2bdd95aaa2e2ba6aa63bc58da1f7e69ea28c3f43b90dea: it may not be safe to apply this update E0409 04:04:55.478434 1 precondition.go:59] Precondition "ClusterVersionUpgradeable" failed: Disabling ownership via cluster version overrides prevents upgrades. Please remove overrides before continuing. Version-Release number of the following components: 4.4.0-0.nightly-2020-04-04-025830 How reproducible: always Steps to Reproduce: 1. install 4.4.0-rc.6 cluster 2. oc patch clusterversion to override network-operator 3. do upgrade against 4.4.0-rc.6 to 4.4.0-0.nightly-2020-04-04-025830 ./oc adm upgrade --to-image registry.svc.ci.openshift.org/ocp/release@sha256:5e727bba8407a963fb2bdd95aaa2e2ba6aa63bc58da1f7e69ea28c3f43b90dea --allow-explicit-upgrade=true Actual results: upgrade failed due to precondition check. Expected results: upgrade succeed. Additional info: Hit the issue when do regression test against the change in https://bugzilla.redhat.com/show_bug.cgi?id=1797624.Communicated with dev, should be related with "--to-image" name extraction which he works on. Try another way to do upgrade with "--to" which can avoid "--to-image" issue. 1. oc patch clusterversion to override network-operator 2. change channel to candidate-4.4 and do upgrade against 4.4.0-rc.4 to 4.4.0-rc.6 # ./oc adm upgrade --to 4.4.0-rc.6 Updating to 4.4.0-rc.6 3. upgrade succeed.
>Additional info: >Hit the issue when do regression test against the change in https://bugzilla.redhat.com/show_bug.cgi?id=1797624.Communicated with dev, should be related with "--to-image" name extraction which he works on. Try another way to do upgrade with "--to" which can avoid "--to-image" issue. >1. oc patch clusterversion to override network-operator >2. change channel to candidate-4.4 and do upgrade against 4.4.0-rc.4 to 4.4.0-rc.6 # ./oc adm upgrade --to 4.4.0-rc.6 Updating to 4.4.0-rc.6 >3. upgrade succeed. Update the result, the upgrade can start(not the same with --to-image), but seems stuck at 78% complete(more than 2 hrs). I think it's another issue. #./oc adm upgrade info: An upgrade is in progress. Working towards 4.4.0-rc.6: 78% complete # ./oc get co|grep rc.4 dns 4.4.0-rc.4 True False False 5h26m machine-config 4.4.0-rc.4 True False False 5h22m network 4.4.0-rc.4 True False False 5h27m E0409 10:20:58.526070 1 task.go:81] error running apply for clusteroperator "network" (457 of 580): Cluster operator network is still updating I0409 10:20:58.526151 1 task_graph.go:568] Canceled worker 13 I0409 10:20:58.526192 1 task_graph.go:588] Workers finished I0409 10:20:58.526233 1 task_graph.go:516] No more reachable nodes in graph, continue I0409 10:20:58.526256 1 task_graph.go:552] No more work I0409 10:20:58.526271 1 task_graph.go:596] Result of work: [Cluster operator network is still updating] I0409 10:20:58.526289 1 sync_worker.go:783] Summarizing 1 errors I0409 10:20:58.526297 1 sync_worker.go:787] Update error 457 of 580: ClusterOperatorNotAvailable Cluster operator network is still updating (*errors.errorString: cluster operator network is still updating) E0409 10:20:58.526324 1 sync_worker.go:329] unable to synchronize image (waiting 2m52.525702462s): Cluster operator network is still updating # ./oc get clusterversion version -o json|jq .status.conditions[-1] { "lastTransitionTime": "2020-04-09T05:58:13Z", "message": "Disabling ownership via cluster version overrides prevents upgrades. Please remove overrides before continuing.", "reason": "ClusterVersionOverridesSet", "status": "False", "type": "Upgradeable" } # ./oc get clusterversion version -o json|jq .spec.overrides [ { "group": "apps/v1", "kind": "Deployment", "name": "network-operator", "namespace": "openshift-network-operator", "unmanaged": true } ] @king, i think it doesn't work as expected even with --to. I attach the result here first, if needed, we can file a new bug to track it since original one bz1797624 is verified.
> Update the result, the upgrade can start(not the same with --to-image)... So that means we were fine to mark bug 1797624 VERIFIED; the CVO has no problems with the --to bump. And we need this bug about getting --to-image working. > ...but seems stuck at 78% complete(more than 2 hrs). I think it's another issue. Yeah, that seems like a separate issue. Can you post the network ClusterOperator? It might be that they are not actually on board with allowing z-stream updates when they set Upgradeable=False.
Possibly also the logs of the network operator pod.
(In reply to W. Trevor King from comment #2) > > ...but seems stuck at 78% complete(more than 2 hrs). I think it's another issue. > > Yeah, that seems like a separate issue. Can you post the network > ClusterOperator? It might be that they are not actually on board with > allowing z-stream updates when they set Upgradeable=False. Sure, i gave another to try to have more logs and file a new bug https://bugzilla.redhat.com/show_bug.cgi?id=1822844 to track this issue separately. Let's track '--to-image' issue here only.
Deferring to 4.5. I'm agnostic about whether we backport this once we have a fix.
--to-image is not a common customer use case, lowering priority on this one.
Changing the reproducer for this bug. With the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1822844 expected behaviour will be to reject z-level upgrade if overrides are set. However the following will reproduce this bug's issue: Steps to Reproduce: 1. install 4.4.0-rc.6 cluster 2. ./oc patch featuregate cluster --type json -p '[{"op": "add", "path": "/spec/featureSet", "value": "TechPreviewNoUpgrade"}]' featuregate.config.openshift.io/cluster patched 3. do upgrade against 4.4.0-rc.6 to 4.4.0-rc.7 ./ oc adm upgrade --allow-explicit-upgrade=true --to-image quay.io/openshift-release-dev/ocp-release@sha256:2532227a868fca11a0cb7563232a26ab9a682d8ee1bb72fd416c4e7789d7ce11 Actual results: upgrade failed due to precondition check. CVO log: E0626 18:40:35.598906 1 precondition.go:59] Precondition "ClusterVersionUpgradeable" failed: Cluster operator kube-apiserver cannot be upgraded: FeatureGatesUpgradeable: "TechPreviewNoUpgrade" does not allow updates E0626 18:40:35.598965 1 sync_worker.go:329] unable to synchronize image (waiting 2m52.525702462s): Precondition "ClusterVersionUpgradeable" failed because of "FeatureGates_RestrictedFeatureGates_TechPreviewNoUpgrade": Cluster operator kube-apiserver cannot be upgraded: FeatureGatesUpgradeable: "TechPreviewNoUpgrade" does not allow updates Expected results: upgrade success Additional info: Using "./oc adm upgrade --to 4.4.0-rc.7" upgrade succeeds.
With the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1822844 deploys preconditions such as ClusterVersion overrides should block all upgrades including z-level upgrades. However other preconditions should not. Currently, when upgrades are performed using `oc adm upgrade` with the `--to-image option`, the CVO thread that updates version history can add a new cluster version history entry before the thread loading the upgrade version has set the version field resulting in an empty field. This field is used to extract the upgrade target minor version number for comparison to the current version minor version number to check if this is a z-level upgrade and therefore allow precondition bypass.
Adding UpcomingSprint keyword. Bug is still in work.
Disregard my comment https://bugzilla.redhat.com/show_bug.cgi?id=1822513#c12 as it is not completely accurate. The real issue is that currentMinor is always being pulled from cv.Status.History[0].Version (https://github.com/openshift/cluster-version-operator/blob/40ec7e4f90b9fa0992145b926bd5f5bf6bd973a3/pkg/payload/precondition/clusterversion/upgradeable.go#L65) which contains the version being upgraded to and not the current version. In this bug's specific case, when --to-image is used, cv.Status.History[0].Version = "" which then fails the check for a z-level upgrade. Instead we should iterate the version history to find and use the first version with State == configv1.CompletedUpdate, which will yield the current version, and pull currentMinor from it.
Version:4.6.0-0.nightly-2020-08-02-091622 1. install 4.6.0-0.nightly-2020-08-02-044648 cluster 2. # ./oc patch featuregate cluster --type json -p '[{"op": "add", "path": "/spec/featureSet", "value": "TechPreviewNoUpgrade"}]' featuregate.config.openshift.io/cluster patched # ./oc get clusterversion -o json|jq -r '.items[0].status.conditions[-1]' { "lastTransitionTime": "2020-08-03T03:55:52Z", "message": "Cluster operator kube-apiserver cannot be upgraded between minor versions: FeatureGatesUpgradeable: \"TechPreviewNoUpgrade\" does not allow updates", "reason": "FeatureGates_RestrictedFeatureGates_TechPreviewNoUpgrade", "status": "False", "type": "Upgradeable" } 3. do upgrade against 4.6.0-0.nightly-2020-08-02-044648 to 4.6.0-0.nightly-2020-08-02-091622 with --to-image command # ./oc adm upgrade --to-image registry.svc.ci.openshift.org/ocp/release@sha256:a0cd5e461757e8c0d0f4e6563ffd716dca90e8ed2956bd6b1405223e74da057c --allow-explicit-upgrade Updating to release image registry.svc.ci.openshift.org/ocp/release@sha256:a0cd5e461757e8c0d0f4e6563ffd716dca90e8ed2956bd6b1405223e74da057c Upgrade succeed. # ./oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-08-02-091622 True False 7m9s Cluster version is 4.6.0-0.nightly-2020-08-02-091622
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196