Bug 1863076 - Cannot upgrade a cluster when adding Performance Profile Operator
Summary: Cannot upgrade a cluster when adding Performance Profile Operator
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.4
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.4.z
Assignee: Stefan Schimanski
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On: 1862156
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-03 15:03 UTC by Francesco Romani
Modified: 2021-04-05 17:36 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Cause: A bug in the feature gate upgradeability logic. Consequence: The CVO was marking the cluster as not upgradeable to the next minor version when LatencySensitive FeatureGate was in use. Workaround (if any): Upgrade to a version that has this bug fixed. Result: Upgrade is performed and the upgraded version includes this bug fix so CVO no longer treats LatencySensitive FeatureGate as blocking for minor-version upgrades.
Clone Of: 1862156
Environment:
Last Closed: 2020-08-18 11:45:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-apiserver-operator pull 923 0 None closed [release-4.4] Bug 1863076: LatencySensitive feature gate allows upgrades 2020-09-02 07:44:41 UTC
Red Hat Product Errata RHBA-2020:3334 0 None None None 2020-08-18 11:45:39 UTC

Comment 4 Ke Wang 2020-08-11 05:16:14 UTC
Verified with OCP 4.4 latest nightly build, steps see below,

Case 1: Upgrade 4.3.z to 4.4.z

$ oc edit featuregate/cluster
...
spec:
  featureSet: "LatencySensitive"

$ oc describe featuregate/cluster
Name:         cluster
Namespace:    
Labels:       <none>
Annotations:  release.openshift.io/create-only: true
API Version:  config.openshift.io/v1
Kind:         FeatureGate
...
Spec:
  Feature Set:  LatencySensitive
Events:         <none>

$ cat topologymanager-kubeletconfig.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: cpumanager-enabled
  kubeletConfig:
     cpuManagerPolicy: static
     cpuManagerReconcilePeriod: 5s
     topologyManagerPolicy: single-numa-node
     
$ oc create -f topologymanager-kubeletconfig.yaml 
kubeletconfig.machineconfiguration.openshift.io/cpumanager-enabled created

$ oc get KubeletConfig
NAME                 AGE
cpumanager-enabled   35s

$  oc patch clusterversion/version --patch '{"spec":{"upstream":"https://openshift-release.svc.ci.openshift.org/graph"}}' --type=merge
clusterversion.config.openshift.io/version patched

$  oc adm upgrade
Cluster version is 4.3.31

No updates available. You may force an upgrade to a specific release image, but doing so may not be supported and result in downtime or data loss.

$ oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-08-10-180247 --allow-explicit-upgrade --force
warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead
warning: The requested upgrade image is not one of the available updates.  You have used --allow-explicit-upgrade to the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Updating to release image registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-08-10-180247

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.31    True        True          34m     Working towards 4.4.0-0.nightly-2020-08-10-180247: 83% complete

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-08-10-180247   True        False         25m     Cluster version is 4.4.0-0.nightly-2020-08-10-180247

$ oc get clusterversion -o json | jq .items[0].status
...
    {
      "lastTransitionTime": "2020-08-11T04:09:02Z",
      "message": "Cluster operator marketplace cannot be upgraded: The cluster has custom OperatorSource/CatalogSourceConfig, which are deprecated in future versions. Please visit this link for further deatils: https://docs.openshift.com/container-platform/4.4/release_notes/ocp-4-4-release-notes.html#ocp-4-4-marketplace-apis-deprecated",
      "reason": "DeprecatedAPIsInUse",
      "status": "False",
      "type": "Upgradeable"
    }
  ],
  "desired": {
    "force": true,
    "image": "registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-08-10-180247",
    "version": "4.4.0-0.nightly-2020-08-10-180247"
  },
  "history": [
    {
      "completionTime": "2020-08-11T04:44:15Z",
      "image": "registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-08-10-180247",
      "startedTime": "2020-08-11T03:50:05Z",
      "state": "Completed",
      "verified": false,
      "version": "4.4.0-0.nightly-2020-08-10-180247"
    },
    {
      "completionTime": "2020-08-11T03:45:20Z",
      "image": "quay.io/openshift-release-dev/ocp-release@sha256:6395ddd44276c4a1d760c77f9f5d8dabf302df7b84afd7b3147c97bdf268ab0f",
      "startedTime": "2020-08-11T03:23:37Z",
      "state": "Completed",
      "verified": false,
      "version": "4.3.31"
    }
  ],
  "observedGeneration": 3,
  "versionHash": "2Ijnq012QfQ="
}

--------------------------------------
Case 1: Upgrade between 4.4.z

$ oc edit featuregate/cluster
...
spec:
  featureSet: "LatencySensitive"

$ oc describe featuregate/cluster
Name:         cluster
Namespace:    
Labels:       <none>
Annotations:  release.openshift.io/create-only: true
API Version:  config.openshift.io/v1
Kind:         FeatureGate
...
Spec:
  Feature Set:  LatencySensitive
Events:         <none>

$ cat topologymanager-kubeletconfig.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: cpumanager-enabled
  kubeletConfig:
     cpuManagerPolicy: static
     cpuManagerReconcilePeriod: 5s
     topologyManagerPolicy: single-numa-node
     
$ oc create -f topologymanager-kubeletconfig.yaml 
kubeletconfig.machineconfiguration.openshift.io/cpumanager-enabled created

$ oc get KubeletConfig
NAME                 AGE
cpumanager-enabled   35s

$  oc patch clusterversion/version --patch '{"spec":{"upstream":"https://openshift-release.svc.ci.openshift.org/graph"}}' --type=merge
clusterversion.config.openshift.io/version patched

$  oc adm upgrade
Cluster version is 4.4.16

Updates:

VERSION                      IMAGE
4.4.0-0.ci-2020-08-06-125105 registry.svc.ci.openshift.org/ocp/release@sha256:7e950235c9ad92bf1ee11897a401bb682bcd698ff7593139857a9debb5c05b88

$ oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-08-08-143753 --allow-explicit-upgrade --force
warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead
warning: The requested upgrade image is not one of the available updates.  You have used --allow-explicit-upgrade to the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Updating to release image registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-08-08-143753

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-08-07-130733   True        True          7s      Working towards 4.4.0-0.nightly-2020-08-08-143753: 1% complete

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-08-08-143753   True        False         17m     Cluster version is 4.4.0-0.nightly-2020-08-08-143753

$ oc get clusterversion -o json | jq .items[0].status
{
  ...
  "desired": {
    "force": true,
    "image": "registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-08-08-143753",
    "version": "4.4.0-0.nightly-2020-08-08-143753"
  },
  "history": [
    {
      "completionTime": "2020-08-10T06:48:41Z",
      "image": "registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-08-08-143753",
      "startedTime": "2020-08-10T06:38:50Z",
      "state": "Completed",
      "verified": false,
      "version": "4.4.0-0.nightly-2020-08-08-143753"
    },
    {
      "completionTime": "2020-08-10T03:53:18Z",
      "image": "registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-08-07-130733",
      "startedTime": "2020-08-10T02:57:27Z",
      "state": "Completed",
      "verified": false,
      "version": "4.4.0-0.nightly-2020-08-07-130733"
    },
    {
      "completionTime": "2020-08-10T02:02:55Z",
      "image": "quay.io/openshift-release-dev/ocp-release@sha256:74b5cefea8c8bac158468458dbcb0fdb213aafdcf5b11c9aaefc75e2f7b9fe96",
      "startedTime": "2020-08-10T01:41:06Z",
      "state": "Completed",
      "verified": false,
      "version": "4.4.16"
    }
  ],
...

From above test results, we can see the fix works fine, so move the bug Verified.

Comment 5 W. Trevor King 2020-08-11 20:09:43 UTC
(In reply to Ke Wang from comment #4)
> $  oc adm upgrade
> Cluster version is 4.4.16
> ...
> $ oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-08-08-143753 --allow-explicit-upgrade --force
> warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead
> warning: The requested upgrade image is not one of the available updates.  You have used --allow-explicit-upgrade to the update to proceed anyway
> warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
> Updating to release image registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-08-08-143753

As 'oc' says, using --force is a big hammer.  And because this 4.4 -> 4.4 update does not change the minor version, Upgradeable=False should not block the update (see bug 1797624 and bug 1823306).  So folks should not need to force this particular update.  I'm updating the proposed doc text to mention the scope as blocked minor-version updates and removing the "force" mention.  Folks can resolve an impacted 4.4.16 cluster with an unforced update to 4.4.(z-with-this-fix) followed by subsequent unforced updates to later 4.4 and 4.5 releases as they see fit.

Comment 6 W. Trevor King 2020-08-11 22:59:55 UTC
> ... Upgradeable=False should not block the update (see bug 1797624 and bug 1823306).

Ah, but updating to an unsigned nightly will block the update.  And indeed, that nightly is unsigned:

$ curl -Is https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release@sha256=7e950235c9ad92bf1ee11897a401bb682bcd698ff7593139857a9debb5c05b88/signature-1 | head -n1
HTTP/1.1 404 Not Found
$ curl -Is https://mirror.openshift.com/pub/openshift-v4/signatures/openshift-release-dev/ocp-release-nightly@sha256=7e950235c9ad92bf1ee11897a401bb682bcd698ff7593139857a9debb5c05b88/signature-1 | head -n1
HTTP/1.1 404 Not Found

But once we have a signed release, you should not need to use --force to apply the patch updates, so I think my doc-text edit is still correct.

Comment 8 errata-xmlrpc 2020-08-18 11:45:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.4.17 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3334

Comment 9 W. Trevor King 2021-04-05 17:36:34 UTC
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475


Note You need to log in before you can comment on or make changes to this bug.