Bug 1826115 - ClusterVersion status.desired bumped during precondition checks before new version begins reconciling
Summary: ClusterVersion status.desired bumped during precondition checks before new ve...
Keywords:
Status: CLOSED DUPLICATE of bug 1822752
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Over the Air Updates
QA Contact: liujia
URL:
Whiteboard:
: 1827166 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-20 22:26 UTC by W. Trevor King
Modified: 2022-05-06 12:29 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-02-24 19:24:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description W. Trevor King 2020-04-20 22:26:22 UTC
Following [1] to get stuck in a precondition check on 4.3.10:

$ oc get -o json clusteroperators | jq -r '.items[] | .upgradeable = ([.status.conditions[] | select(.type == "Upgradeable")][0]) | select(.upgradeable.status == "False") | .upgradeable.lastTransitionTime + " " + .metadata.name + " " + .upgradeable.reason' | sort
...no output...
$ oc patch scc privileged --type json -p '[{"op": "add", "path": "/users/-", "value": "kubeadmin"}]'
$ oc get -o json clusteroperators | jq -r '.items[] | .upgradeable = ([.status.conditions[] | select(.type == "Upgradeable")][0]) | select(.upgradeable.status == "False") | .upgradeable.lastTransitionTime + " " + .metadata.name + " " + .upgradeable.reason' | sort
2020-04-20T22:00:51Z kube-apiserver DefaultSecurityContextConstraints_Mutated
$ oc adm upgrade --to 4.3.13
Updating to 4.3.13
$ oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + " " + .status + " " + .message' | sort
2020-04-20T20:57:12Z RetrievedUpdates True 
2020-04-20T21:23:40Z Available True Done applying 4.3.10
2020-04-20T22:01:55Z Upgradeable False Cluster operator kube-apiserver cannot be upgraded: DefaultSecurityContextConstraintsUpgradeable: Default SecurityContextConstraints object(s) have mutated [privileged]
2020-04-20T22:16:40Z Failing True Precondition "ClusterVersionUpgradeable" failed because of "DefaultSecurityContextConstraints_Mutated": Cluster operator kube-apiserver cannot be upgraded: DefaultSecurityContextConstraintsUpgradeable: Default SecurityContextConstraints object(s) have mutated [privileged]
2020-04-20T22:16:40Z Progressing True Unable to apply 4.3.13: it may not be safe to apply this update
$ oc get -o json clusterversion version | jq -r '.status.desired.version'
4.3.13
$ oc adm upgrade --clear
Cleared the update field, still at 4.3.13
$ oc get -o json clusterversion version | jq -r '.status.desired.version'
4.3.10

So the issue is that ClusterVersion's status only has a 'desired' property, and updates it to match .spec.desiredUpdate before running the precondition checks, before the CVO is updated to run the next version.  That leaves the client-side 'oc' that is clearing .spec.desiredUpdate with no way to figure out the version we're reverting to (short of peering into .status.history and hoping it's current).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1822752#c0

Comment 1 W. Trevor King 2020-04-20 22:30:23 UTC
Another symptom of this failure mode is that:

$ oc adm upgrade --to=4.3.13 --force
info: Cluster is already at version 4.3.13

makes it sound like the current version is 4.3.13, even if we're still stuck in preconditions on a 4.3.10 CVO.

Comment 4 W. Trevor King 2020-06-21 14:19:12 UTC
Haven't had time to get back around to this one.  Adding UpcomingSprint

Comment 6 W. Trevor King 2020-07-10 21:32:25 UTC
Comment 4 is still current.

Comment 7 W. Trevor King 2020-08-01 05:41:46 UTC
Comment 4 is still current.

Comment 8 W. Trevor King 2020-08-08 21:04:43 UTC
*** Bug 1827166 has been marked as a duplicate of this bug. ***

Comment 9 W. Trevor King 2020-08-21 22:26:34 UTC
Comment 4 is still current.

Comment 10 W. Trevor King 2020-09-13 05:05:55 UTC
Comment 4 is still current.

Comment 11 W. Trevor King 2020-10-02 23:14:14 UTC
Comment 4 is still current.

Comment 12 W. Trevor King 2020-10-25 15:44:51 UTC
Comment 4 is still current.

Comment 13 W. Trevor King 2020-12-04 22:37:12 UTC
Comment 4 is still current.

Comment 14 liujia 2020-12-24 09:57:49 UTC
Continue tracking https://bugzilla.redhat.com/show_bug.cgi?id=1827166 here.

The original issue in #1827166 seems a little different now(upgrade with upgradeable=false condition from v4.6 to v4.7) , current result is that the upgrade will hang on "downloading update" step instead of start actualy.

# ./oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.9     True        True          13m     Unable to apply 4.7.0-0.nightly-2020-12-19-113414: could not download the update

But it is still not expected result, which should block the Y-version upgrade directly and return meaningful msg.

Following is the previous blocked result when upgrade with --to-image
# ./oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.9     True        True          8s      Unable to apply 4.7.0-0.nightly-2020-12-20-055006: it may not be safe to apply this update

Comment 18 W. Trevor King 2021-07-23 23:31:50 UTC
I'm removing myself as the assignee.  Anyone who wants this can take it.

Comment 20 W. Trevor King 2022-02-24 19:24:29 UTC
The reshuffle for bug 1822752 addressed this.  Testing in a 4.11 nightly via cluster-bot (which clears the channel):

  $ oc get -o json clusterversion version | jq .status.desired
  {
    "image": "registry.build01.ci.openshift.org/ci-ln-98zz55b/release@sha256:ba95a556da080b887baa8801e1f020d97c985f52a4de52c1027e1a710738fd97",
    "version": "4.11.0-0.nightly-2022-02-23-185405"
  }

Updating to an unsigned CI release [1]:

  $ oc adm upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:49bae27d97ff71b14ff614de0242c0b229959130e6e0fe587ade92f56e8a035e
  error: cannot refresh available updates:
    Reason: NoChannel
    Message: The update channel has not been configured.

  specify --allow-explicit-upgrade to continue with the update.
  $ oc adm upgrade --allow-explicit-upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:49bae27d97ff71b14ff614de0242c0b229959130e6e0fe587ade92f56e8a035e
  warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
  Updating to release image registry.ci.openshift.org/ocp/release@sha256:49bae27d97ff71b14ff614de0242c0b229959130e6e0fe587ade92f56e8a035e

Waiting while the CVO mulls it over.  Then:

  $ oc get -o json clusterversion version | jq '.status | {desired, conditionsFiltered: [.conditions[] | select(.type == "ReleaseAccepted")]}'
  {
    "desired": {
      "image": "registry.build01.ci.openshift.org/ci-ln-98zz55b/release@sha256:ba95a556da080b887baa8801e1f020d97c985f52a4de52c1027e1a710738fd97",
      "version": "4.11.0-0.nightly-2022-02-23-185405"
    },
    "conditionsFiltered": [
      {
        "lastTransitionTime": "2022-02-24T19:13:25Z",
        "message": "Retrieving payload failed version=\"\" image=\"registry.ci.openshift.org/ocp/release@sha256:49bae27d97ff71b14ff614de0242c0b229959130e6e0fe587ade92f56e8a035e\" failure=The update cannot be verified: unable to locate a valid signature for one or more sources",
        "reason": "RetrievePayload",
        "status": "False",
        "type": "ReleaseAccepted"
      }
    ]
  }

So great, we have status.desired still pointing at the old release (which, since bug 1822752, we're still reconciling), and the CVO complaining that it doesn't like the requested target.  Forcing through the CVO's concerns:

  $ oc adm upgrade --force --allow-explicit-upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:49bae27d97ff71b14ff614de0242c0b229959130e6e0fe587ade92f56e8a035e
  warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
  warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
  Updating to release image registry.ci.openshift.org/ocp/release@sha256:49bae27d97ff71b14ff614de0242c0b229959130e6e0fe587ade92f56e8a035e
  $ oc get -o json clusterversion version | jq .spec.desiredUpdate
  {
    "force": true,
    "image": "registry.ci.openshift.org/ocp/release@sha256:49bae27d97ff71b14ff614de0242c0b229959130e6e0fe587ade92f56e8a035e",
    "version": ""
  }
  $ oc get -o json clusterversion version | jq '.status | {desired, conditionsFiltered: [.conditions[] | select(.type == "ReleaseAccepted")]}'
  {
    "desired": {
      "image": "registry.ci.openshift.org/ocp/release@sha256:49bae27d97ff71b14ff614de0242c0b229959130e6e0fe587ade92f56e8a035e",
      "version": "4.11.0-0.ci-2022-02-24-025459"
    },
    "conditionsFiltered": [
      {
        "lastTransitionTime": "2022-02-24T19:15:33Z",
        "message": "Payload loaded version=\"4.11.0-0.ci-2022-02-24-025459\" image=\"registry.ci.openshift.org/ocp/release@sha256:49bae27d97ff71b14ff614de0242c0b229959130e6e0fe587ade92f56e8a035e\"",
        "reason": "PayloadLoaded",
        "status": "True",
        "type": "ReleaseAccepted"
      }
    ]
  }

Hooray.

[1]: https://amd64.ocp.releases.ci.openshift.org/releasestream/4.11.0-0.ci/release/4.11.0-0.ci-2022-02-24-025459

*** This bug has been marked as a duplicate of bug 1822752 ***


Note You need to log in before you can comment on or make changes to this bug.