Description of problem: When deploying an OpenShift 4.4 cluster and above, a Performance Profile Addon operator can be applied, in order to get realtime kernel, cpu pinning, etc... https://github.com/openshift-kni/performance-addon-operators This works fine but is causing a side effect on the cluster, it blocks the upgrades. As soon as I apply this operator, i receive this error on ClusterVersion: - lastTransitionTime: "2020-07-28T08:48:15Z" message: 'Cluster operator kube-apiserver cannot be upgraded: FeatureGatesUpgradeable: "LatencySensitive" does not allow updates' reason: FeatureGates_RestrictedFeatureGates_LatencySensitive status: "False" type: Upgradeable This seems to be caused by https://github.com/openshift/cluster-kube-apiserver-operator/blob/f73bebb6361c3649dab5305d8c7d1cd9753e61aa/pkg/operator/featureupgradablecontroller/feature_upgradeable_controller.go#L18 . It needs to list "LatencySensitive" on that line, to allow upgrades of this component. This bug is applying from 4.4 in advance.
LatencySensitive should not be blocking release according to https://github.com/openshift/api/blob/7192180f496aab1f7659d8660fc360498bab498b/config/v1/types_feature.go#L38 This feature gate is needed for enabling TopologyManager in OCP 4.4 and is explicitely mentioned in the OCP docs here (with no warning about upgrade being blocked): https://docs.openshift.com/container-platform/4.4/scalability_and_performance/using-topology-manager.html#seting_up_topology_manager_using-topology-manager
What's the next step here? Yolanda should reproduce and give you access to the setup? Martin meanwhile you/QE should try to reproduce investigate this in parallel?
We already know what happened. The next step is the kube-apiserver team reviewing our findings. 1. PAO enabled LatencySensitive FG 2. kube-apiserver set status.upgradeable to False as it does not recognize this feature gate as allowed 3. CVO interrogated all operators and noticed it can't upgrade I believe the step 2) is a bug as other place in the sources explicitly says LatencySensitive FG does not block upgrades.
Want to add that upgrade failed only for minor versions, for example from 4.4.10 to 4.4.15 As for major versions (4.4.z -> 4.5.z) - upgrade successfully completed: > Initially I had 4.4.15 # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.15 True False 92m Cluster version is 4.4.15 > Start upgrading to 4.5.4 # oc adm upgrade --to-image "registry.svc.ci.openshift.org/ocp/release:4.5.4" --allow-explicit-upgrade --force Updating to release image registry.svc.ci.openshift.org/ocp/release:4.5.4 > Upgrade in process # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.15 True True 16m Working towards 4.5.4: 76% complete > Successfully finished after ~2hrs # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.4 True False 2m31s Cluster version is 4.5.4 > All operators are active and upgraded: # oc get clusteroperator NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.5.4 True False False 170m cloud-credential 4.5.4 True False False 3h32m cluster-autoscaler 4.5.4 True False False 3h13m config-operator 4.5.4 True False False 76m console 4.5.4 True False False 32m csi-snapshot-controller 4.5.4 True False False 38m dns 4.5.4 True False False 3h20m etcd 4.5.4 True False False 3h19m image-registry 4.5.4 True False False 3h14m ingress 4.5.4 True False False 133m insights 4.5.4 True False False 3h14m kube-apiserver 4.5.4 True False False 3h18m kube-controller-manager 4.5.4 True False False 3h19m kube-scheduler 4.5.4 True False False 3h18m kube-storage-version-migrator 4.5.4 True False False 38m machine-api 4.5.4 True False False 3h14m machine-approver 4.5.4 True False False 63m machine-config 4.5.4 True False False 7m3s marketplace 4.5.4 True False False 37m monitoring 4.5.4 True False False 7m7s network 4.5.4 True False False 3h21m node-tuning 4.5.4 True False False 65m openshift-apiserver 4.5.4 True False False 26m openshift-controller-manager 4.5.4 True False False 64m openshift-samples 4.5.4 True False False 65m operator-lifecycle-manager 4.5.4 True False False 3h21m operator-lifecycle-manager-catalog 4.5.4 True False False 3h21m operator-lifecycle-manager-packageserver 4.5.4 True False False 26m service-ca 4.5.4 True False False 3h21m storage 4.5.4 True False False 65m
Verified with OCP 4.6.0-0.nightly-2020-07-31-080025, steps see below, $ oc edit featuregate/cluster $ oc describe featuregate/cluster Name: cluster Namespace: Labels: <none> Annotations: release.openshift.io/create-only: true API Version: config.openshift.io/v1 Kind: FeatureGate ... Spec: Feature Set: LatencySensitive Events: <none> $ oc create -f topologymanager-kubeletconfig.yaml kubeletconfig.machineconfiguration.openshift.io/cpumanager-enabled created $ oc get KubeletConfig NAME AGE cpumanager-enabled 17s Looking for a 4.6 nightly payload included the bug fix PR. $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-07-31-080025 | grep kube-apiserver cluster-kube-apiserver-operator https://github.com/openshift/cluster-kube-apiserver-operator 19c2ecc4e39d7da2388265c3e85dbd17e8b1fd1c $ git log --date local --pretty="%h %an %cd - %s" 19c2ecc4e | grep '#920' 19c2ecc4 OpenShift Merge Robot Thu Jul 30 22:31:42 2020 - Merge pull request #920 from MarSik/bug_1861431 The build 4.6.0-0.nightly-2020-07-31-080025 just we wanted $ oc patch clusterversion/version --patch '{"spec":{"upstream":"https://openshift-release.svc.ci.openshift.org/graph"}}' --type=merge clusterversion.config.openshift.io/version patched $ oc adm upgrade Cluster version is 4.5.4 No updates available. You may force an upgrade to a specific release image, but doing so may not be supported and result in downtime or data loss. Because the build 4.6.0-0.nightly-2020-07-31-080025 has not been signed, have o upgrade with --force parameter, $ oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-07-31-080025 --allow-explicit-upgrade=true --force warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade to the update to preceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Updating to release image registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-07-31-080025 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.4 True True 35s Working towards 4.6.0-0.nightly-2020-07-31-080025: 0% complete $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-07-31-080025 True False 64m Cluster version is 4.6.0-0.nightly-2020-07-31-080025 $ oc get clusterversion -o json|jq ".items[0].spec" { "channel": "stable-4.5", "clusterID": "607a8084-b37d-4f17-9f43-122d38d382e4", "desiredUpdate": { "force": true, "image": "registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-07-31-080025", "version": "" }, "upstream": "https://openshift-release.svc.ci.openshift.org/graph" } $ oc get clusterversion -o json|jq ".items[0].status.history" [ { "completionTime": "2020-07-31T11:46:08Z", "image": "registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-07-31-080025", "startedTime": "2020-07-31T10:28:03Z", "state": "Completed", "verified": false, "version": "4.6.0-0.nightly-2020-07-31-080025" }, { "completionTime": "2020-07-31T09:57:40Z", "image": "quay.io/openshift-release-dev/ocp-release@sha256:02dfcae8f6a67e715380542654c952c981c59604b1ba7f569b13b9e5d0fbbed3", "startedTime": "2020-07-31T09:27:53Z", "state": "Completed", "verified": false, "version": "4.5.4" } ] $ oc describe featuregate/cluster Name: cluster Namespace: Labels: <none> Annotations: release.openshift.io/create-only: true API Version: config.openshift.io/v1 Kind: FeatureGate Metadata: Creation Timestamp: 2020-07-31T09:28:23Z Generation: 3 Managed Fields: API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:release.openshift.io/create-only: f:spec: Manager: cluster-version-operator Operation: Update Time: 2020-07-31T09:28:23Z API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:spec: f:featureSet: Manager: oc Operation: Update Time: 2020-07-31T10:15:49Z Resource Version: 28617 Self Link: /apis/config.openshift.io/v1/featuregates/cluster UID: 70c3346d-5ef3-4238-a64b-4cf00545ac37 Spec: Feature Set: LatencySensitive Events: <none> $ oc get KubeletConfig NAME AGE cpumanager-enabled 154m $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.6.0-0.nightly-2020-07-31-080025 True False False 161m cloud-credential 4.6.0-0.nightly-2020-07-31-080025 True False False 4h51m cluster-autoscaler 4.6.0-0.nightly-2020-07-31-080025 True False False 4h36m config-operator 4.6.0-0.nightly-2020-07-31-080025 True False False 4h37m console 4.6.0-0.nightly-2020-07-31-080025 True False False 158m csi-snapshot-controller 4.6.0-0.nightly-2020-07-31-080025 True False False 4h31m dns 4.6.0-0.nightly-2020-07-31-080025 True False False 4h41m etcd 4.6.0-0.nightly-2020-07-31-080025 True False False 4h41m image-registry 4.6.0-0.nightly-2020-07-31-080025 True False False 4h31m ingress 4.6.0-0.nightly-2020-07-31-080025 True False False 3h46m insights 4.6.0-0.nightly-2020-07-31-080025 True False False 4h37m kube-apiserver 4.6.0-0.nightly-2020-07-31-080025 True False False 4h39m kube-controller-manager 4.6.0-0.nightly-2020-07-31-080025 True False False 4h40m kube-scheduler 4.6.0-0.nightly-2020-07-31-080025 True False False 4h39m kube-storage-version-migrator 4.6.0-0.nightly-2020-07-31-080025 True False False 163m machine-api 4.6.0-0.nightly-2020-07-31-080025 True False False 4h34m machine-approver 4.6.0-0.nightly-2020-07-31-080025 True False False 4h38m machine-config 4.6.0-0.nightly-2020-07-31-080025 True False False 154m marketplace 4.6.0-0.nightly-2020-07-31-080025 True False False 162m monitoring 4.6.0-0.nightly-2020-07-31-080025 True False False 175m network 4.6.0-0.nightly-2020-07-31-080025 True False False 4h43m node-tuning 4.6.0-0.nightly-2020-07-31-080025 True False False 3h45m openshift-apiserver 4.6.0-0.nightly-2020-07-31-080025 True False False 4h38m openshift-controller-manager 4.6.0-0.nightly-2020-07-31-080025 True False False 3h45m openshift-samples 4.6.0-0.nightly-2020-07-31-080025 True False False 3h45m operator-lifecycle-manager 4.6.0-0.nightly-2020-07-31-080025 True False False 4h41m operator-lifecycle-manager-catalog 4.6.0-0.nightly-2020-07-31-080025 True False False 4h41m operator-lifecycle-manager-packageserver 4.6.0-0.nightly-2020-07-31-080025 True False False 158m service-ca 4.6.0-0.nightly-2020-07-31-080025 True False False 4h42m storage 4.6.0-0.nightly-2020-07-31-080025 True False False 3h45m We can see all is well.
Also had another try which upgrade OCP 4.5.4 to 4.6 nightly without fixed PR, run into the same problem, see below, $ oc patch clusterversion/version --patch '{"spec":{"upstream":"https://openshift-release.svc.ci.openshift.org/graph"}}' --type=merge clusterversion.config.openshift.io/version patched $ oc adm upgrade Cluster version is 4.5.4 No updates available. You may force an upgrade to a specific release image, but doing so may not be supported and result in downtime or data loss. $ oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-07-30-112525 --allow-explicit-upgrade --force warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade to the update to preceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Updating to release image registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-07-30-112525 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.4 True True 17s Unable to apply registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-07-30-112525: could not download the update $ oc get clusterversion -o json { ... "status": { "availableUpdates": null, "conditions": [ { "lastTransitionTime": "2020-07-31T14:24:15Z", "message": "Done applying 4.5.4", "status": "True", "type": "Available" }, { "lastTransitionTime": "2020-07-31T14:35:37Z", "status": "False", "type": "Failing" }, { "lastTransitionTime": "2020-07-31T14:35:12Z", "message": "Working towards 4.6.0-0.nightly-2020-07-30-112525: 1% complete", "status": "True", "type": "Progressing" }, { "lastTransitionTime": "2020-07-31T13:55:07Z", "status": "True", "type": "RetrievedUpdates" }, { "lastTransitionTime": "2020-07-31T14:26:57Z", "message": "Multiple cluster operators cannot be upgraded between minor versions:\n* Cluster operator kube-apiserver cannot be upgraded between minor versions: FeatureGates_RestrictedFeatureGates_LatencySensitive: FeatureGatesUpgradeable: \"LatencySensitive\" does not allow updates\n* Cluster operator marketplace cannot be upgraded between minor versions: DeprecatedAPIsInUse: The cluster has custom OperatorSource, which is deprecated in future versions. Please visit this link for further details: https://docs.openshift.com/container-platform/4.4/release_notes/ocp-4-4-release-notes.html#ocp-4-4-marketplace-apis-deprecated", "reason": "ClusterOperatorsNotUpgradeable", "status": "False", "type": "Upgradeable" } ], ... By comparing the above test results, we can see that the problem has been fixed, so move the bug Verified.
Tried one upgrade from 4.6 nightly 4.6.0-0.nightly-2020-07-25-091217 to 4.6.0-0.nightly-2020-07-31-080025, hit the bug again, detail see below, $ oc describe featuregate/cluster Name: cluster Namespace: Labels: <none> Annotations: release.openshift.io/create-only: true API Version: config.openshift.io/v1 Kind: FeatureGate ... Spec: Feature Set: LatencySensitive $ oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-07-31-080025 --force=true --allow-explicit-upgrade=true warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade to the update to preceed anyway warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Updating to release image registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-07-31-080025 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-07-25-091217 True False 36m Cluster version is 4.6.0-0.nightly-2020-07-25-091217 $ oc get clusterversion -o json ... { "lastTransitionTime": "2020-07-31T16:08:54Z", "message": "Cluster operator kube-apiserver cannot be upgraded between minor versions: FeatureGatesUpgradeable: \"LatencySensitive\" does not allow updates", "reason": "FeatureGates_RestrictedFeatureGates_LatencySensitive", "status": "False", "type": "Upgradeable" } Can anyone take a look this problem?
The PR was merged 3 days ago, probably you should use newer start version for the upgrade, can you try the upgrade from 4.6.0-0.nightly-2020-07-31-080025-> 4.6.0-0.nightly-2020-08-02-044648?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196