Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2080429

Summary: CVO must ensure non-upgrade related changes are saved when desired payload fails to load
Product: OpenShift Container Platform Reporter: Jack Ottofaro <jack.ottofaro>
Component: Cluster Version OperatorAssignee: Jack Ottofaro <jack.ottofaro>
Status: CLOSED ERRATA QA Contact: Yang Yang <yanyang>
Severity: high Docs Contact:
Priority: high    
Version: 4.10CC: evakhoni, wking, yanyang
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:09:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2090150    

Description Jack Ottofaro 2022-04-29 15:47:43 UTC
Description of problem:

The CVO Update function is continuously called to reconcile changes to desired version, overrides, and capabilities. It is possible that all of these items have changed since the last call to Update. Capability changes are determined up front [1] and used later by loadUpdatedPayload if desired version has also changed. These capabilities, saved at [2] and [3], may contain capability changes resulting from the desired update version. If the desired version payload fails to load these changes should not be saved. However there could also be admin initiated capability changes made manually and therefore are independent of the desired version update which do need to be saved.

[1] https://github.com/openshift/cluster-version-operator/blob/118e938999ce7bf90c1c5c5e311a15258b942acc/pkg/cvo/sync_worker.go#L412
[2] https://github.com/openshift/cluster-version-operator/blob/118e938999ce7bf90c1c5c5e311a15258b942acc/pkg/cvo/sync_worker.go#L418
[3] https://github.com/openshift/cluster-version-operator/blob/118e938999ce7bf90c1c5c5e311a15258b942acc/pkg/cvo/sync_worker.go#L457

Comment 4 Yang Yang 2022-05-11 09:55:20 UTC
Reproducing it with 4.11.0-0.nightly-2022-05-06-060226

Steps to reproduce:
1. Install a cluster with only marketplace enabled:
# oc get clusterversion/version -ojson | jq -r '.spec, .status.capabilities'
{
  "capabilities": {
    "additionalEnabledCapabilities": [
      "marketplace"
    ],
    "baselineCapabilitySet": "None"
  },
  "channel": "stable-4.11",
  "clusterID": "8620a2f5-e766-44ab-b197-ca6bad13dae3"
}
{
  "enabledCapabilities": [
    "marketplace"
  ],
  "knownCapabilities": [
    "baremetal",
    "marketplace",
    "openshift-samples"
  ]
}


2. Upgrade to an unsigned build
# oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3 --allow-explicit-upgrade 
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
Updating to release image registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3

3. Check cv
# oc get clusterversion/version -ojson | jq -r '.spec, .status.capabilities, .status.conditions'
{
  "capabilities": {
    "additionalEnabledCapabilities": [
      "marketplace"
    ],
    "baselineCapabilitySet": "None"
  },
  "channel": "stable-4.11",
  "clusterID": "8620a2f5-e766-44ab-b197-ca6bad13dae3",
  "desiredUpdate": {
    "force": false,
    "image": "registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3",
    "version": ""
  }
}
{
  "enabledCapabilities": [
    "marketplace"
  ],
  "knownCapabilities": [
    "baremetal",
    "marketplace",
    "openshift-samples"
  ]
}
[
  {
    "lastTransitionTime": "2022-05-11T08:37:58Z",
    "message": "Unable to retrieve available updates: currently reconciling cluster version 4.11.0-0.nightly-2022-05-06-060226 not found in the \"stable-4.11\" channel",
    "reason": "VersionNotFound",
    "status": "False",
    "type": "RetrievedUpdates"
  },
  {
    "lastTransitionTime": "2022-05-11T08:37:58Z",
    "message": "Kubernetes 1.25 and therefore OpenShift 4.12 remove several APIs which require admin consideration. Please see\nthe knowledge article https://access.redhat.com/articles/6955381 for details and instructions.\n",
    "reason": "AdminAckRequired",
    "status": "False",
    "type": "Upgradeable"
  },
  {
    "lastTransitionTime": "2022-05-11T08:37:58Z",
    "message": "Capabilities match configured spec",
    "reason": "AsExpected",
    "status": "False",
    "type": "ImplicitlyEnabledCapabilities"
  },
  {
    "lastTransitionTime": "2022-05-11T09:08:34Z",
    "message": "Retrieving payload failed version=\"\" image=\"registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3\" failure=The update cannot be verified: unable to locate a valid signature for one or more sources",
    "reason": "RetrievePayload",
    "status": "False",
    "type": "ReleaseAccepted"
  },
  {
    "lastTransitionTime": "2022-05-11T08:57:04Z",
    "message": "Done applying 4.11.0-0.nightly-2022-05-06-060226",
    "status": "True",
    "type": "Available"
  },
  {
    "lastTransitionTime": "2022-05-11T08:57:04Z",
    "status": "False",
    "type": "Failing"
  },
  {
    "lastTransitionTime": "2022-05-11T08:57:04Z",
    "message": "Cluster version is 4.11.0-0.nightly-2022-05-06-060226",
    "status": "False",
    "type": "Progressing"
  }
]

Fine, payload fail to load.

4. Enable baremetal
# oc get clusterversion/version -ojson | jq -r '.spec, .status.capabilities, .status.conditions'
{
  "capabilities": {
    "additionalEnabledCapabilities": [
      "marketplace",
      "baremetal"
    ],
    "baselineCapabilitySet": "None"
  },
  "channel": "stable-4.11",
  "clusterID": "8620a2f5-e766-44ab-b197-ca6bad13dae3",
  "desiredUpdate": {
    "force": false,
    "image": "registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3",
    "version": ""
  }
}
{
  "enabledCapabilities": [
    "marketplace"
  ],
  "knownCapabilities": [
    "baremetal",
    "marketplace",
    "openshift-samples"
  ]
}
[
  {
    "lastTransitionTime": "2022-05-11T08:37:58Z",
    "message": "Unable to retrieve available updates: currently reconciling cluster version 4.11.0-0.nightly-2022-05-06-060226 not found in the \"stable-4.11\" channel",
    "reason": "VersionNotFound",
    "status": "False",
    "type": "RetrievedUpdates"
  },
  {
    "lastTransitionTime": "2022-05-11T08:37:58Z",
    "message": "Kubernetes 1.25 and therefore OpenShift 4.12 remove several APIs which require admin consideration. Please see\nthe knowledge article https://access.redhat.com/articles/6955381 for details and instructions.\n",
    "reason": "AdminAckRequired",
    "status": "False",
    "type": "Upgradeable"
  },
  {
    "lastTransitionTime": "2022-05-11T08:37:58Z",
    "message": "Capabilities match configured spec",
    "reason": "AsExpected",
    "status": "False",
    "type": "ImplicitlyEnabledCapabilities"
  },
  {
    "lastTransitionTime": "2022-05-11T09:08:34Z",
    "message": "Retrieving payload failed version=\"\" image=\"registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3\" failure=The update cannot be verified: unable to locate a valid signature for one or more sources",
    "reason": "RetrievePayload",
    "status": "False",
    "type": "ReleaseAccepted"
  },
  {
    "lastTransitionTime": "2022-05-11T08:57:04Z",
    "message": "Done applying 4.11.0-0.nightly-2022-05-06-060226",
    "status": "True",
    "type": "Available"
  },
  {
    "lastTransitionTime": "2022-05-11T08:57:04Z",
    "status": "False",
    "type": "Failing"
  },
  {
    "lastTransitionTime": "2022-05-11T08:57:04Z",
    "message": "Cluster version is 4.11.0-0.nightly-2022-05-06-060226",
    "status": "False",
    "type": "Progressing"
  }
]

status.capabilities doesn't show baremetal. But it's installed.

# oc get co baremetal
NAME        VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
baremetal   4.11.0-0.nightly-2022-05-06-060226   True        False         False      31m

Comment 5 Yang Yang 2022-05-11 10:07:52 UTC
Verified with 4.11.0-0.nightly-2022-05-09-224745

Steps to verify:
1. Install a cluster with only marketplace enabled
# oc get clusterversion/version -ojson | jq -r '.spec, .status.capabilities'
{
  "capabilities": {
    "additionalEnabledCapabilities": [
      "marketplace"
    ],
    "baselineCapabilitySet": "None"
  },
  "clusterID": "8e822d81-b05b-46d2-b43a-1731d8760353",
  "upstream": "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph"
}
{
  "enabledCapabilities": [
    "marketplace"
  ],
  "knownCapabilities": [
    "baremetal",
    "marketplace",
    "openshift-samples"
  ]
}

2. Upgrade to an unsigned build
# oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3 --allow-explicit-upgrade
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
Updating to release image registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3

3. Check cv
# oc get clusterversion/version -ojson | jq -r '.spec, .status.capabilities, .status.conditions'
{
  "capabilities": {
    "additionalEnabledCapabilities": [
      "marketplace"
    ],
    "baselineCapabilitySet": "None"
  },
  "clusterID": "8e822d81-b05b-46d2-b43a-1731d8760353",
  "desiredUpdate": {
    "force": false,
    "image": "registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3",
    "version": ""
  },
  "upstream": "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph"
}
{
  "enabledCapabilities": [
    "marketplace"
  ],
  "knownCapabilities": [
    "baremetal",
    "marketplace",
    "openshift-samples"
  ]
}
[
  {
    "lastTransitionTime": "2022-05-11T09:05:08Z",
    "message": "The update channel has not been configured.",
    "reason": "NoChannel",
    "status": "False",
    "type": "RetrievedUpdates"
  },
  {
    "lastTransitionTime": "2022-05-11T08:34:36Z",
    "message": "Kubernetes 1.25 and therefore OpenShift 4.12 remove several APIs which require admin consideration. Please see\nthe knowledge article https://access.redhat.com/articles/6955381 for details and instructions.\n",
    "reason": "AdminAckRequired",
    "status": "False",
    "type": "Upgradeable"
  },
  {
    "lastTransitionTime": "2022-05-11T08:34:36Z",
    "message": "Capabilities match configured spec",
    "reason": "AsExpected",
    "status": "False",
    "type": "ImplicitlyEnabledCapabilities"
  },
  {
    "lastTransitionTime": "2022-05-11T09:35:26Z",
    "message": "Retrieving payload failed version=\"\" image=\"registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3\" failure=The update cannot be verified: unable to locate a valid signature for one or more sources",
    "reason": "RetrievePayload",
    "status": "False",
    "type": "ReleaseAccepted"
  },
  {
    "lastTransitionTime": "2022-05-11T08:55:15Z",
    "message": "Done applying 4.11.0-0.nightly-2022-05-09-224745",
    "status": "True",
    "type": "Available"
  },
  {
    "lastTransitionTime": "2022-05-11T08:55:15Z",
    "status": "False",
    "type": "Failing"
  },
  {
    "lastTransitionTime": "2022-05-11T08:55:15Z",
    "message": "Cluster version is 4.11.0-0.nightly-2022-05-09-224745",
    "status": "False",
    "type": "Progressing"
  }
]

Fine, payload failed to load.

4. Enable baremetal
# oc get clusterversion/version -ojson | jq -r '.spec, .status.capabilities, .status.conditions'
{
  "capabilities": {
    "additionalEnabledCapabilities": [
      "marketplace",
      "baremetal"
    ],
    "baselineCapabilitySet": "None"
  },
  "clusterID": "8e822d81-b05b-46d2-b43a-1731d8760353",
  "desiredUpdate": {
    "force": false,
    "image": "registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3",
    "version": ""
  },
  "upstream": "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph"
}
{
  "enabledCapabilities": [
    "baremetal",
    "marketplace"
  ],
  "knownCapabilities": [
    "baremetal",
    "marketplace",
    "openshift-samples"
  ]
}
[
  {
    "lastTransitionTime": "2022-05-11T09:05:08Z",
    "message": "The update channel has not been configured.",
    "reason": "NoChannel",
    "status": "False",
    "type": "RetrievedUpdates"
  },
  {
    "lastTransitionTime": "2022-05-11T08:34:36Z",
    "message": "Kubernetes 1.25 and therefore OpenShift 4.12 remove several APIs which require admin consideration. Please see\nthe knowledge article https://access.redhat.com/articles/6955381 for details and instructions.\n",
    "reason": "AdminAckRequired",
    "status": "False",
    "type": "Upgradeable"
  },
  {
    "lastTransitionTime": "2022-05-11T08:34:36Z",
    "message": "Capabilities match configured spec",
    "reason": "AsExpected",
    "status": "False",
    "type": "ImplicitlyEnabledCapabilities"
  },
  {
    "lastTransitionTime": "2022-05-11T09:35:26Z",
    "message": "Retrieving payload failed version=\"\" image=\"registry.ci.openshift.org/ocp/release@sha256:3b1a0e94da50bb6faa35be08227e7ab1942dfcf0976ee894417a95aeed2111a3\" failure=The update cannot be verified: unable to locate a valid signature for one or more sources",
    "reason": "RetrievePayload",
    "status": "False",
    "type": "ReleaseAccepted"
  },
  {
    "lastTransitionTime": "2022-05-11T08:55:15Z",
    "message": "Done applying 4.11.0-0.nightly-2022-05-09-224745",
    "status": "True",
    "type": "Available"
  },
  {
    "lastTransitionTime": "2022-05-11T08:55:15Z",
    "status": "False",
    "type": "Failing"
  },
  {
    "lastTransitionTime": "2022-05-11T08:55:15Z",
    "message": "Cluster version is 4.11.0-0.nightly-2022-05-09-224745",
    "status": "False",
    "type": "Progressing"
  }
]

woohoo, cv.status.capabilities.enabledCapabilities shows the baremetal. 

# oc get co baremetal
NAME        VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
baremetal   4.11.0-0.nightly-2022-05-09-224745   True        False         False      22m 

It's installed successfully.

Looks good to me. Moving it to verified state.

Comment 7 errata-xmlrpc 2022-08-10 11:09:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069