Bug 1862156

Summary: Cannot upgrade a cluster when adding Performance Profile Operator
Product: OpenShift Container Platform Reporter: Martin Sivák <msivak>
Component: kube-apiserverAssignee: Stefan Schimanski <sttts>
Status: CLOSED ERRATA QA Contact: Xingxing Xia <xxia>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.5CC: aos-bugs, dshchedr, fromani, fsimonce, grajaiya, kewang, mfojtik, msivak, sttts, wking, xxia, yjoseph, yprokule, yroblamo
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Cause: A bug in the feature gate upgradeability logic. Consequence: The CVO was marking the cluster as not upgradeable to the next minor version when LatencySensitive FeatureGate was in use. Workaround (if any): Upgrade to a version that has this bug fixed. Result: Upgrade is performed and the upgraded version includes this bug fix so CVO no longer treats LatencySensitive FeatureGate as blocking for minor-version upgrades.
Story Points: ---
Clone Of: 1861431
: 1863076 (view as bug list) Environment:
Last Closed: 2020-08-10 13:50:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1861431    
Bug Blocks: 1863076    

Comment 3 Ke Wang 2020-08-03 15:31:25 UTC
Verified with OCP 4.5.0-0.nightly-2020-08-01-204100, steps see below,

Case 1: Upgrade 4.4.z to 4.5.z

$ oc edit featuregate/cluster

$ oc describe featuregate/cluster
Name:         cluster
Namespace:    
Labels:       <none>
Annotations:  release.openshift.io/create-only: true
API Version:  config.openshift.io/v1
Kind:         FeatureGate
...
Spec:
  Feature Set:  LatencySensitive
Events:         <none>

$ cat topologymanager-kubeletconfig.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: cpumanager-enabled
  kubeletConfig:
     cpuManagerPolicy: static
     cpuManagerReconcilePeriod: 5s
     topologyManagerPolicy: single-numa-node
     
$ oc create -f topologymanager-kubeletconfig.yaml 
kubeletconfig.machineconfiguration.openshift.io/cpumanager-enabled created

$ oc get KubeletConfig
NAME                 AGE
cpumanager-enabled   35s

$  oc patch clusterversion/version --patch '{"spec":{"upstream":"https://openshift-release.svc.ci.openshift.org/graph"}}' --type=merge
clusterversion.config.openshift.io/version patched

$ oc adm upgrade
Cluster version is 4.4.15

Updates:

VERSION                      IMAGE
4.4.0-0.ci-2020-07-31-153948 registry.svc.ci.openshift.org/ocp/release@sha256:816d581120c2f4e42ae99600cd5e475be0e253a42a416dc12ea418fd4c7697a3

$ oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-08-01-204100 --allow-explicit-upgrade --force
warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead
warning: The requested upgrade image is not one of the available updates.  You have used --allow-explicit-upgrade to the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Updating to release image registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-08-01-204100

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.15    True        True          22m     Working towards 4.5.0-0.nightly-2020-08-01-204100: 79% complete

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-08-01-204100   True        False         52m     Cluster version is 4.5.0-0.nightly-2020-08-01-204100

$ oc get clusterversion -o json | jq .items[0].status
{
  "availableUpdates": null,
  "conditions": [
    {
      "lastTransitionTime": "2020-08-03T12:12:06Z",
      "message": "Done applying 4.5.0-0.nightly-2020-08-01-204100",
      "status": "True",
      "type": "Available"
    },
    {
      "lastTransitionTime": "2020-08-03T14:21:16Z",
      "status": "False",
      "type": "Failing"
    },
    {
      "lastTransitionTime": "2020-08-03T14:22:31Z",
      "message": "Cluster version is 4.5.0-0.nightly-2020-08-01-204100",
      "status": "False",
      "type": "Progressing"
    },
    {
      "lastTransitionTime": "2020-08-03T13:36:24Z",
      "status": "True",
      "type": "RetrievedUpdates"
    },
    {
      "lastTransitionTime": "2020-08-03T12:14:48Z",
      "message": "Cluster operator marketplace cannot be upgraded between minor versions: The cluster has custom OperatorSource, which is deprecated in future versions. Please visit this link for further details: https://docs.openshift.com/container-platform/4.4/release_notes/ocp-4-4-release-notes.html#ocp-4-4-marketplace-apis-deprecated",
      "reason": "DeprecatedAPIsInUse",
      "status": "False",
      "type": "Upgradeable"
    }
  ],
  "desired": {
    "force": true,
    "image": "registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-08-01-204100",
    "version": "4.5.0-0.nightly-2020-08-01-204100"
  },
  "history": [
    {
      "completionTime": "2020-08-03T14:22:31Z",
      "image": "registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-08-01-204100",
      "startedTime": "2020-08-03T13:37:47Z",
      "state": "Completed",
      "verified": false,
      "version": "4.5.0-0.nightly-2020-08-01-204100"
    },
    {
      "completionTime": "2020-08-03T12:12:06Z",
      "image": "quay.io/openshift-release-dev/ocp-release@sha256:cf3f799779fb0646c43dd16d376bf67fddd29597009d21223f956f5dd7a4c02f",
      "startedTime": "2020-08-03T11:48:16Z",
      "state": "Completed",
      "verified": false,
      "version": "4.4.15"
    }
  ],
  "observedGeneration": 3,
  "versionHash": "BQVhuXCVbRE="
}

-----------------------------------------

Case 2:Upgrade between 4.5.z
Did the same setting for featuregate likes above.

$  oc patch clusterversion/version --patch '{"spec":{"upstream":"https://openshift-release.svc.ci.openshift.org/graph"}}' --type=merge
clusterversion.config.openshift.io/version patched

$ oc adm upgrade
Cluster version is 4.5.4

No updates available. You may force an upgrade to a specific release image, but doing so may not be supported and result in downtime or data loss.

$ oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-08-01-204100 --allow-explicit-upgrade --force
warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead
warning: The requested upgrade image is not one of the available updates.  You have used --allow-explicit-upgrade to the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Updating to release image registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-08-01-204100

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.4     True        True          12m     Working towards 4.5.0-0.nightly-2020-08-01-204100: 77% complete

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-08-01-204100   True        False         48m     Cluster version is 4.5.0-0.nightly-2020-08-01-204100

$ oc get clusterversion -o json | jq .items[0].status
{
  "availableUpdates": [
    {
      "force": false,
      "image": "registry.svc.ci.openshift.org/ocp/release@sha256:0a2171761c02ca895b33887d5d4991e932f4e87c3e07670557a06fca9923ff87",
      "version": "4.5.0-0.nightly-2020-08-03-123303"
    }
  ],
  "conditions": [
    {
      "lastTransitionTime": "2020-08-03T12:04:27Z",
      "message": "Done applying 4.5.0-0.nightly-2020-08-01-204100",
      "status": "True",
      "type": "Available"
    },
    {
      "lastTransitionTime": "2020-08-03T14:09:15Z",
      "status": "False",
      "type": "Failing"
    },
    {
      "lastTransitionTime": "2020-08-03T14:26:39Z",
      "message": "Cluster version is 4.5.0-0.nightly-2020-08-01-204100",
      "status": "False",
      "type": "Progressing"
    },
    {
      "lastTransitionTime": "2020-08-03T11:34:59Z",
      "status": "True",
      "type": "RetrievedUpdates"
    },
    {
      "lastTransitionTime": "2020-08-03T12:07:09Z",
      "message": "Cluster operator marketplace cannot be upgraded between minor versions: The cluster has custom OperatorSource, which is deprecated in future versions. Please visit this link for further details: https://docs.openshift.com/container-platform/4.4/release_notes/ocp-4-4-release-notes.html#ocp-4-4-marketplace-apis-deprecated",
      "reason": "DeprecatedAPIsInUse",
      "status": "False",
      "type": "Upgradeable"
    }
  ],
  "desired": {
    "force": true,
    "image": "registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-08-01-204100",
    "version": "4.5.0-0.nightly-2020-08-01-204100"
  },
  "history": [
    {
      "completionTime": "2020-08-03T14:26:39Z",
      "image": "registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-08-01-204100",
      "startedTime": "2020-08-03T13:48:01Z",
      "state": "Completed",
      "verified": false,
      "version": "4.5.0-0.nightly-2020-08-01-204100"
    },
    {
      "completionTime": "2020-08-03T12:04:27Z",
      "image": "quay.io/openshift-release-dev/ocp-release@sha256:02dfcae8f6a67e715380542654c952c981c59604b1ba7f569b13b9e5d0fbbed3",
      "startedTime": "2020-08-03T11:34:59Z",
      "state": "Completed",
      "verified": false,
      "version": "4.5.4"
    }
  ],
  "observedGeneration": 3,
  "versionHash": "BQVhuXCVbRE="
}

From above test results, we can see the fix works fine, so move the bug Verified.

Comment 4 Scott Dodson 2020-08-04 19:37:00 UTC
The Upgradeable=False condition applies to the currently running version and blocks you from upgrading to the next minor version, therefore this shouldn't block 4.4 to 4.5 upgrades but would block upgrades from 4.5 to 4.6. So the minimum version for the 4.5 to 4.6 upgrade should be at least 4.5.5 and we should consider raising the minimum 4.4 version once the fix has been backported to 4.4. It appears there are currently four affected clusters based on telemetry data.

Comment 6 errata-xmlrpc 2020-08-10 13:50:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.5 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3188

Comment 7 W. Trevor King 2020-08-11 20:12:33 UTC
Updating the doc text as described in [1], although with the release shipped it may be too late for updates ;).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1863076#c5

Comment 8 W. Trevor King 2021-04-05 17:46:55 UTC
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475