Bug 2080429 - CVO must ensure non-upgrade related changes are saved when desired payload fails to load
Summary: CVO must ensure non-upgrade related changes are saved when desired payload fa...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Jack Ottofaro
QA Contact: Yang Yang
URL:
Whiteboard:
Depends On:
Blocks: 2090150
TreeView+ depends on / blocked
 
Reported: 2022-04-29 15:47 UTC by Jack Ottofaro
Modified: 2022-08-10 11:10 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:09:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 770 0 None open Bug 2080429: pkg/cvo/sync_worker.go: Save overrides and capabilities 2022-05-02 15:35:55 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:10:26 UTC

Description Jack Ottofaro 2022-04-29 15:47:43 UTC
Description of problem:

The CVO Update function is continuously called to reconcile changes to desired version, overrides, and capabilities. It is possible that all of these items have changed since the last call to Update. Capability changes are determined up front [1] and used later by loadUpdatedPayload if desired version has also changed. These capabilities, saved at [2] and [3], may contain capability changes resulting from the desired update version. If the desired version payload fails to load these changes should not be saved. However there could also be admin initiated capability changes made manually and therefore are independent of the desired version update which do need to be saved.

[1] https://github.com/openshift/cluster-version-operator/blob/118e938999ce7bf90c1c5c5e311a15258b942acc/pkg/cvo/sync_worker.go#L412
[2] https://github.com/openshift/cluster-version-operator/blob/118e938999ce7bf90c1c5c5e311a15258b942acc/pkg/cvo/sync_worker.go#L418
[3] https://github.com/openshift/cluster-version-operator/blob/118e938999ce7bf90c1c5c5e311a15258b942acc/pkg/cvo/sync_worker.go#L457

Comment 4 Yang Yang 2022-05-11 09:55:20 UTC
Reproducing it with 4.11.0-0.nightly-2022-05-06-060226

Steps to reproduce:
1. Install a cluster with only marketplace enabled:
# oc get clusterversion/version -ojson | jq -r '.spec, .status.capabilities'
{
  "capabilities": {
    "additionalEnabledCapabilities": [
      "marketplace"
    ],
    "baselineCapabilitySet": "None"
  },
  "channel": "stable-4.11",
  "clusterID": "8620a2f5-e766-44ab-b197-ca6bad13dae3"
}
{
  "enabledCapabilities": [
    "marketplace"
  ],
  "knownCapabilities": [
    "baremetal",
    "marketplace",
    "openshift-samples"
  ]
}


2. Upgrade to an unsigned build
# oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3 --allow-explicit-upgrade 
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
Updating to release image registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3

3. Check cv
# oc get clusterversion/version -ojson | jq -r '.spec, .status.capabilities, .status.conditions'
{
  "capabilities": {
    "additionalEnabledCapabilities": [
      "marketplace"
    ],
    "baselineCapabilitySet": "None"
  },
  "channel": "stable-4.11",
  "clusterID": "8620a2f5-e766-44ab-b197-ca6bad13dae3",
  "desiredUpdate": {
    "force": false,
    "image": "registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3",
    "version": ""
  }
}
{
  "enabledCapabilities": [
    "marketplace"
  ],
  "knownCapabilities": [
    "baremetal",
    "marketplace",
    "openshift-samples"
  ]
}
[
  {
    "lastTransitionTime": "2022-05-11T08:37:58Z",
    "message": "Unable to retrieve available updates: currently reconciling cluster version 4.11.0-0.nightly-2022-05-06-060226 not found in the \"stable-4.11\" channel",
    "reason": "VersionNotFound",
    "status": "False",
    "type": "RetrievedUpdates"
  },
  {
    "lastTransitionTime": "2022-05-11T08:37:58Z",
    "message": "Kubernetes 1.25 and therefore OpenShift 4.12 remove several APIs which require admin consideration. Please see\nthe knowledge article https://access.redhat.com/articles/6955381 for details and instructions.\n",
    "reason": "AdminAckRequired",
    "status": "False",
    "type": "Upgradeable"
  },
  {
    "lastTransitionTime": "2022-05-11T08:37:58Z",
    "message": "Capabilities match configured spec",
    "reason": "AsExpected",
    "status": "False",
    "type": "ImplicitlyEnabledCapabilities"
  },
  {
    "lastTransitionTime": "2022-05-11T09:08:34Z",
    "message": "Retrieving payload failed version=\"\" image=\"registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3\" failure=The update cannot be verified: unable to locate a valid signature for one or more sources",
    "reason": "RetrievePayload",
    "status": "False",
    "type": "ReleaseAccepted"
  },
  {
    "lastTransitionTime": "2022-05-11T08:57:04Z",
    "message": "Done applying 4.11.0-0.nightly-2022-05-06-060226",
    "status": "True",
    "type": "Available"
  },
  {
    "lastTransitionTime": "2022-05-11T08:57:04Z",
    "status": "False",
    "type": "Failing"
  },
  {
    "lastTransitionTime": "2022-05-11T08:57:04Z",
    "message": "Cluster version is 4.11.0-0.nightly-2022-05-06-060226",
    "status": "False",
    "type": "Progressing"
  }
]

Fine, payload fail to load.

4. Enable baremetal
# oc get clusterversion/version -ojson | jq -r '.spec, .status.capabilities, .status.conditions'
{
  "capabilities": {
    "additionalEnabledCapabilities": [
      "marketplace",
      "baremetal"
    ],
    "baselineCapabilitySet": "None"
  },
  "channel": "stable-4.11",
  "clusterID": "8620a2f5-e766-44ab-b197-ca6bad13dae3",
  "desiredUpdate": {
    "force": false,
    "image": "registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3",
    "version": ""
  }
}
{
  "enabledCapabilities": [
    "marketplace"
  ],
  "knownCapabilities": [
    "baremetal",
    "marketplace",
    "openshift-samples"
  ]
}
[
  {
    "lastTransitionTime": "2022-05-11T08:37:58Z",
    "message": "Unable to retrieve available updates: currently reconciling cluster version 4.11.0-0.nightly-2022-05-06-060226 not found in the \"stable-4.11\" channel",
    "reason": "VersionNotFound",
    "status": "False",
    "type": "RetrievedUpdates"
  },
  {
    "lastTransitionTime": "2022-05-11T08:37:58Z",
    "message": "Kubernetes 1.25 and therefore OpenShift 4.12 remove several APIs which require admin consideration. Please see\nthe knowledge article https://access.redhat.com/articles/6955381 for details and instructions.\n",
    "reason": "AdminAckRequired",
    "status": "False",
    "type": "Upgradeable"
  },
  {
    "lastTransitionTime": "2022-05-11T08:37:58Z",
    "message": "Capabilities match configured spec",
    "reason": "AsExpected",
    "status": "False",
    "type": "ImplicitlyEnabledCapabilities"
  },
  {
    "lastTransitionTime": "2022-05-11T09:08:34Z",
    "message": "Retrieving payload failed version=\"\" image=\"registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3\" failure=The update cannot be verified: unable to locate a valid signature for one or more sources",
    "reason": "RetrievePayload",
    "status": "False",
    "type": "ReleaseAccepted"
  },
  {
    "lastTransitionTime": "2022-05-11T08:57:04Z",
    "message": "Done applying 4.11.0-0.nightly-2022-05-06-060226",
    "status": "True",
    "type": "Available"
  },
  {
    "lastTransitionTime": "2022-05-11T08:57:04Z",
    "status": "False",
    "type": "Failing"
  },
  {
    "lastTransitionTime": "2022-05-11T08:57:04Z",
    "message": "Cluster version is 4.11.0-0.nightly-2022-05-06-060226",
    "status": "False",
    "type": "Progressing"
  }
]

status.capabilities doesn't show baremetal. But it's installed.

# oc get co baremetal
NAME        VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
baremetal   4.11.0-0.nightly-2022-05-06-060226   True        False         False      31m

Comment 5 Yang Yang 2022-05-11 10:07:52 UTC
Verified with 4.11.0-0.nightly-2022-05-09-224745

Steps to verify:
1. Install a cluster with only marketplace enabled
# oc get clusterversion/version -ojson | jq -r '.spec, .status.capabilities'
{
  "capabilities": {
    "additionalEnabledCapabilities": [
      "marketplace"
    ],
    "baselineCapabilitySet": "None"
  },
  "clusterID": "8e822d81-b05b-46d2-b43a-1731d8760353",
  "upstream": "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph"
}
{
  "enabledCapabilities": [
    "marketplace"
  ],
  "knownCapabilities": [
    "baremetal",
    "marketplace",
    "openshift-samples"
  ]
}

2. Upgrade to an unsigned build
# oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3 --allow-explicit-upgrade
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
Updating to release image registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3

3. Check cv
# oc get clusterversion/version -ojson | jq -r '.spec, .status.capabilities, .status.conditions'
{
  "capabilities": {
    "additionalEnabledCapabilities": [
      "marketplace"
    ],
    "baselineCapabilitySet": "None"
  },
  "clusterID": "8e822d81-b05b-46d2-b43a-1731d8760353",
  "desiredUpdate": {
    "force": false,
    "image": "registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3",
    "version": ""
  },
  "upstream": "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph"
}
{
  "enabledCapabilities": [
    "marketplace"
  ],
  "knownCapabilities": [
    "baremetal",
    "marketplace",
    "openshift-samples"
  ]
}
[
  {
    "lastTransitionTime": "2022-05-11T09:05:08Z",
    "message": "The update channel has not been configured.",
    "reason": "NoChannel",
    "status": "False",
    "type": "RetrievedUpdates"
  },
  {
    "lastTransitionTime": "2022-05-11T08:34:36Z",
    "message": "Kubernetes 1.25 and therefore OpenShift 4.12 remove several APIs which require admin consideration. Please see\nthe knowledge article https://access.redhat.com/articles/6955381 for details and instructions.\n",
    "reason": "AdminAckRequired",
    "status": "False",
    "type": "Upgradeable"
  },
  {
    "lastTransitionTime": "2022-05-11T08:34:36Z",
    "message": "Capabilities match configured spec",
    "reason": "AsExpected",
    "status": "False",
    "type": "ImplicitlyEnabledCapabilities"
  },
  {
    "lastTransitionTime": "2022-05-11T09:35:26Z",
    "message": "Retrieving payload failed version=\"\" image=\"registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3\" failure=The update cannot be verified: unable to locate a valid signature for one or more sources",
    "reason": "RetrievePayload",
    "status": "False",
    "type": "ReleaseAccepted"
  },
  {
    "lastTransitionTime": "2022-05-11T08:55:15Z",
    "message": "Done applying 4.11.0-0.nightly-2022-05-09-224745",
    "status": "True",
    "type": "Available"
  },
  {
    "lastTransitionTime": "2022-05-11T08:55:15Z",
    "status": "False",
    "type": "Failing"
  },
  {
    "lastTransitionTime": "2022-05-11T08:55:15Z",
    "message": "Cluster version is 4.11.0-0.nightly-2022-05-09-224745",
    "status": "False",
    "type": "Progressing"
  }
]

Fine, payload failed to load.

4. Enable baremetal
# oc get clusterversion/version -ojson | jq -r '.spec, .status.capabilities, .status.conditions'
{
  "capabilities": {
    "additionalEnabledCapabilities": [
      "marketplace",
      "baremetal"
    ],
    "baselineCapabilitySet": "None"
  },
  "clusterID": "8e822d81-b05b-46d2-b43a-1731d8760353",
  "desiredUpdate": {
    "force": false,
    "image": "registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3",
    "version": ""
  },
  "upstream": "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph"
}
{
  "enabledCapabilities": [
    "baremetal",
    "marketplace"
  ],
  "knownCapabilities": [
    "baremetal",
    "marketplace",
    "openshift-samples"
  ]
}
[
  {
    "lastTransitionTime": "2022-05-11T09:05:08Z",
    "message": "The update channel has not been configured.",
    "reason": "NoChannel",
    "status": "False",
    "type": "RetrievedUpdates"
  },
  {
    "lastTransitionTime": "2022-05-11T08:34:36Z",
    "message": "Kubernetes 1.25 and therefore OpenShift 4.12 remove several APIs which require admin consideration. Please see\nthe knowledge article https://access.redhat.com/articles/6955381 for details and instructions.\n",
    "reason": "AdminAckRequired",
    "status": "False",
    "type": "Upgradeable"
  },
  {
    "lastTransitionTime": "2022-05-11T08:34:36Z",
    "message": "Capabilities match configured spec",
    "reason": "AsExpected",
    "status": "False",
    "type": "ImplicitlyEnabledCapabilities"
  },
  {
    "lastTransitionTime": "2022-05-11T09:35:26Z",
    "message": "Retrieving payload failed version=\"\" image=\"registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3\" failure=The update cannot be verified: unable to locate a valid signature for one or more sources",
    "reason": "RetrievePayload",
    "status": "False",
    "type": "ReleaseAccepted"
  },
  {
    "lastTransitionTime": "2022-05-11T08:55:15Z",
    "message": "Done applying 4.11.0-0.nightly-2022-05-09-224745",
    "status": "True",
    "type": "Available"
  },
  {
    "lastTransitionTime": "2022-05-11T08:55:15Z",
    "status": "False",
    "type": "Failing"
  },
  {
    "lastTransitionTime": "2022-05-11T08:55:15Z",
    "message": "Cluster version is 4.11.0-0.nightly-2022-05-09-224745",
    "status": "False",
    "type": "Progressing"
  }
]

woohoo, cv.status.capabilities.enabledCapabilities shows the baremetal. 

# oc get co baremetal
NAME        VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
baremetal   4.11.0-0.nightly-2022-05-09-224745   True        False         False      22m 

It's installed successfully.

Looks good to me. Moving it to verified state.

Comment 7 errata-xmlrpc 2022-08-10 11:09:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.