Bug 2072389
Summary: | CVO exits upgrade immediately rather than waiting for etcd backup | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Yang Yang <yanyang> | ||||
Component: | Cluster Version Operator | Assignee: | Jack Ottofaro <jack.ottofaro> | ||||
Status: | CLOSED ERRATA | QA Contact: | Yang Yang <yanyang> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 4.10 | CC: | jack.ottofaro, jhou, lmohanty, lxia, wking | ||||
Target Milestone: | --- | Keywords: | TestBlocker, UpgradeBlocker, Upgrades | ||||
Target Release: | 4.11.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | UpdateRecommendationsBlocked | ||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 2076793 2083370 (view as bug list) | Environment: | |||||
Last Closed: | 2022-08-10 11:04:00 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 2076793, 2083370 | ||||||
Attachments: |
|
Description
Yang Yang
2022-04-06 08:35:17 UTC
(In reply to Yang Yang from comment #0) > During minor upgrade from 4.10 to 4.11, CVO sets ReleaseAccepted=False once > it finds etcd RecentBackup not true so that upgrade is never started. > Previously, CVO checked etcd RecentBackup if it’s not true, CVO set > Failing=true, then etcd started to take backup. After backup has been taken, > CVO set Failing to false and proceeded the upgrade. > ... > Steps to Reproduce: > 1. Install a 4.10 cluster > 2. Upgrade to 4.11 This makes it a problem in the 4.10 CVO, probably introduced into 4.10.z by [1]. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2064991 And we'll want etcd snapshots working again by the time we are recommending 4.10 -> 4.11 updates, so setting blocker+ on this 4.11.0-targeted bug. Seems like we need to wait for the available signed 4.12 builds to do the verification. This is blocker+, so we're committed to fixing before 4.11 GAs, which means we're unlikely to make update-graph changes based on this. Since update-graph changes are what UpgradeBlocker is for [1], I'm dropping the keyword. [1]: https://github.com/openshift/enhancements/blob/bdf15e7a57a1f5a766e67c27c4ed9e0d03ef4bb4/enhancements/update/update-blocker-lifecycle/README.md Verifying with 4.11.0-0.nightly-2022-05-06-215225 Steps are as below: 1. Install a cluster with 4.11.0-0.nightly-2022-05-06-215225 2. Overrides openshift-network-operator/network-operator # oc patch clusterversion version --type=merge -p '{"spec": {"overrides":[{"kind": "Deployment", "name": "network-operator", "namespace": "openshift-network-operator", "unmanaged": true, "group": "apps/v1"}]}}' 3. Upgrade to 4.11.0-0.nightly-2022-05-07-161754 # oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:a655fcffc1bf299563471eb71625eedf142b4a953f15dc2c8fa76438092495ac --allow-explicit-upgrade warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway Updating to release image registry.ci.openshift.org/ocp/release@sha256:a655fcffc1bf299563471eb71625eedf142b4a953f15dc2c8fa76438092495ac 4. Check cv conditions # oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-05-09T06:14:01Z RetrievedUpdates=False VersionNotFound: Unable to retrieve available updates: currently reconciling cluster version 4.11.0-0.nightly-2022-05-06-215225 not found in the "stable-4.11" channel 2022-05-09T06:14:01Z Upgradeable=False MultipleReasons: Cluster should not be upgraded between minor versions for multiple reasons: ClusterVersionOverridesSet,AdminAckRequired * Disabling ownership via cluster version overrides prevents upgrades. Please remove overrides before continuing. * Kubernetes 1.25 and therefore OpenShift 4.12 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/6955381 for details and instructions. 2022-05-09T06:14:01Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec 2022-05-09T06:51:11Z ReleaseAccepted=False PreconditionChecks: Preconditions failed for payload loaded version="4.11.0-0.nightly-2022-05-07-161754" image="registry.ci.openshift.org/ocp/release@sha256:a655fcffc1bf299563471eb71625eedf142b4a953f15dc2c8fa76438092495ac": Precondition "ClusterVersionUpgradeable" failed because of "ClusterVersionOverridesSet": Disabling ownership via cluster version overrides prevents upgrades. Please remove overrides before continuing. 2022-05-09T06:34:01Z Available=True : Done applying 4.11.0-0.nightly-2022-05-06-215225 2022-05-09T06:49:31Z Failing=False : 2022-05-09T06:49:46Z Progressing=False : Cluster version is 4.11.0-0.nightly-2022-05-06-215225 2022-05-09T06:50:47Z UpgradeableAdminAckRequired=False AdminAckRequired: Kubernetes 1.25 and therefore OpenShift 4.12 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/6955381 for details and instructions. 2022-05-09T06:50:47Z UpgradeableClusterVersionOverrides=False ClusterVersionOverridesSet: Disabling ownership via cluster version overrides prevents upgrades. Please remove overrides before continuing. Nice, we get ReleaseAccepted=False due to overrides 5. Remove overrides # oc patch clusterversion version --type json -p '[{"op": "remove", "path": "/spec/overrides"}]' clusterversion.config.openshift.io/version patched 6. Check cv conditions # oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-05-09T06:14:01Z RetrievedUpdates=False VersionNotFound: Unable to retrieve available updates: currently reconciling cluster version 4.11.0-0.nightly-2022-05-07-161754 not found in the "stable-4.11" channel 2022-05-09T06:14:01Z Upgradeable=False AdminAckRequired: Kubernetes 1.25 and therefore OpenShift 4.12 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/6955381 for details and instructions. 2022-05-09T06:14:01Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec 2022-05-09T06:56:36Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.11.0-0.nightly-2022-05-07-161754" image="registry.ci.openshift.org/ocp/release@sha256:a655fcffc1bf299563471eb71625eedf142b4a953f15dc2c8fa76438092495ac" 2022-05-09T06:34:01Z Available=True : Done applying 4.11.0-0.nightly-2022-05-06-215225 2022-05-09T06:49:31Z Failing=False : 2022-05-09T06:56:36Z Progressing=True : Working towards 4.11.0-0.nightly-2022-05-07-161754: 105 of 795 done (13% complete) Payload loaded and upgrade proceeds. Looks good to me. Thanks to Trevor and Justin, finally we get a 4.12. Verifying with 4.11.0-0.nightly-2022-05-07-161754 Steps to verify: 1. Install a 4.11 cluster 2. Upgrade to 4.12 # oc adm upgrade --to-image=quay.io/openshift-release-dev/ocp-release-nightly@sha256:fb152ef66937c9cbb05467ff5b23f3b327485a90cae6686a5742375c980fea26 --allow-explicit-upgrade 3. Check cv conditions # oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-05-10T01:12:02Z RetrievedUpdates=False VersionNotFound: Unable to retrieve available updates: currently reconciling cluster version 4.11.0-0.nightly-2022-05-07-161754 not found in the "stable-4.11" channel 2022-05-10T01:12:02Z Upgradeable=False AdminAckRequired: Kubernetes 1.25 and therefore OpenShift 4.12 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/6955381 for details and instructions. 2022-05-10T01:12:02Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec 2022-05-10T01:45:35Z ReleaseAccepted=False PreconditionChecks: Preconditions failed for payload loaded version="4.12.0-0.nightly-0" image="quay.io/openshift-release-dev/ocp-release-nightly@sha256:fb152ef66937c9cbb05467ff5b23f3b327485a90cae6686a5742375c980fea26": Multiple precondition checks failed: * Precondition "ClusterVersionUpgradeable" failed because of "AdminAckRequired": Kubernetes 1.25 and therefore OpenShift 4.12 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/6955381 for details and instructions. * Precondition "EtcdRecentBackup" failed because of "UpgradeBackupInProgress": RecentBackup: Backup pod phase: "Pending" 2022-05-10T01:30:19Z Available=True : Done applying 4.11.0-0.nightly-2022-05-07-161754 2022-05-10T01:30:19Z Failing=False : 2022-05-10T01:30:19Z Progressing=False : Cluster version is 4.11.0-0.nightly-2022-05-07-161754 ReleaseAccepted=False due to EtcdRecentBackup and AdminAckRequired # oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-05-10T01:12:02Z RetrievedUpdates=False VersionNotFound: Unable to retrieve available updates: currently reconciling cluster version 4.11.0-0.nightly-2022-05-07-161754 not found in the "stable-4.11" channel 2022-05-10T01:12:02Z Upgradeable=False AdminAckRequired: Kubernetes 1.25 and therefore OpenShift 4.12 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/6955381 for details and instructions. 2022-05-10T01:12:02Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec 2022-05-10T01:45:35Z ReleaseAccepted=False PreconditionChecks: Preconditions failed for payload loaded version="4.12.0-0.nightly-0" image="quay.io/openshift-release-dev/ocp-release-nightly@sha256:fb152ef66937c9cbb05467ff5b23f3b327485a90cae6686a5742375c980fea26": Precondition "ClusterVersionUpgradeable" failed because of "AdminAckRequired": Kubernetes 1.25 and therefore OpenShift 4.12 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/6955381 for details and instructions. 2022-05-10T01:30:19Z Available=True : Done applying 4.11.0-0.nightly-2022-05-07-161754 2022-05-10T01:30:19Z Failing=False : 2022-05-10T01:30:19Z Progressing=False : Cluster version is 4.11.0-0.nightly-2022-05-07-161754 EtcdRecentBackup precondition validation passed. Then we manually provide the administrator acknowledgement # oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-4.11-kube-1.25-api-removals-in-4.12":"true"}}' --type=merge configmap/admin-acks patched # oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + "=" + .status + " " + .reason + ": " + .message' 2022-05-10T01:12:02Z RetrievedUpdates=False VersionNotFound: Unable to retrieve available updates: currently reconciling cluster version 4.12.0-0.nightly-0 not found in the "stable-4.11" channel 2022-05-10T01:12:02Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec 2022-05-10T01:47:37Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.12.0-0.nightly-0" image="quay.io/openshift-release-dev/ocp-release-nightly@sha256:fb152ef66937c9cbb05467ff5b23f3b327485a90cae6686a5742375c980fea26" 2022-05-10T01:30:19Z Available=True : Done applying 4.11.0-0.nightly-2022-05-07-161754 2022-05-10T01:30:19Z Failing=False : 2022-05-10T01:47:37Z Progressing=True : Working towards 4.12.0-0.nightly-0: 9 of 795 done (1% complete) AdminAck precondition validation passed. And upgrade proceeds. Looks good to me. Moving it to verified state. Impact assessment Which 4.y.z to 4.y'.z' updates increase vulnerability? Which types of clusters? * Upgrades will be impacted if the current cluster version is >= 4.10.8 and <= to 4.10.14. * Upgrades to 4.11 are specially impacted because as part of the upgrade we need to take backup of etcd and CVO does not wait for it because of this bug . What is the impact? Is it serious enough to warrant removing update recommendations? * In the event of an initial pre conditional check failure, CVO does not re-check the state of pre-conditional check. * The upgrade will not proceed because CVO will not accept the new target release because of this bug and it will continue to reconcile the current version. How involved is remediation? * The bug only impacts upgrades. So as long as the cluster stays in the current version there is nothing to be remediated. * To avoid the etcd backup issue, we suggest updating to 4.10.15 or later versions (which contains the fix) before updating to 4.11.z. Because for z stream updates CVO does not check the etcd backup precondition check. * If you stay in ReleaseAccepted = False then address the condition , clear the upgrade with “$ oc adm upgrade –clear” and try to upgrade again. Is this a regression? * Yes. The fix for https://bugzilla.redhat.com/show_bug.cgi?id=2064991 introduced it. [1] landed a conditional risk for 4.10.14 to 4.11.0-fc.0, so I'm setting UpdateRecommendationsBlocked. [1]: https://github.com/openshift/cincinnati-graph-data/pull/2069 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |