+++ This bug was initially created as a clone of Bug #2072389 +++ Created attachment 1871007 [details] CVO log file Description of problem: During minor upgrade from 4.10 to 4.11, CVO sets ReleaseAccepted=False once it finds etcd RecentBackup not true so that upgrade is never started. Previously, CVO checked etcd RecentBackup if it’s not true, CVO set Failing=true, then etcd started to take backup. After backup has been taken, CVO set Failing to false and proceeded the upgrade. # oc get clusterversion -oyaml apiVersion: v1 items: - apiVersion: config.openshift.io/v1 kind: ClusterVersion metadata: creationTimestamp: "2022-04-06T06:34:30Z" generation: 3 name: version resourceVersion: "36529" uid: d3d4b24e-9b77-4e49-8796-350c2f8cd96f spec: channel: stable-4.10 clusterID: c6e49e12-4a08-4795-beb9-fd819a14ea33 desiredUpdate: force: false image: registry.ci.openshift.org/ocp/release@sha256:28d4c78bd2ce3fa33479c0ee57372777908fead95e150f664fec8e4310cd85e4 version: "" status: availableUpdates: null conditions: - lastTransitionTime: "2022-04-06T06:34:31Z" message: 'Unable to retrieve available updates: currently reconciling cluster version 4.10.7 not found in the "stable-4.10" channel' reason: VersionNotFound status: "False" type: RetrievedUpdates - lastTransitionTime: "2022-04-06T07:14:51Z" message: 'Preconditions failed for payload loaded version="4.11.0-0.nightly-2022-03-29-152521" image="registry.ci.openshift.org/ocp/release@sha256:28d4c78bd2ce3fa33479c0ee57372777908fead95e150f664fec8e4310cd85e4": Precondition "EtcdRecentBackup" failed because of "ControllerStarted": ' reason: PreconditionChecks status: "False" type: ReleaseAccepted - lastTransitionTime: "2022-04-06T06:55:47Z" message: Done applying 4.10.7 status: "True" type: Available - lastTransitionTime: "2022-04-06T06:55:47Z" status: "False" type: Failing - lastTransitionTime: "2022-04-06T07:15:47Z" message: Cluster version is 4.10.7 status: "False" type: Progressing desired: image: quay.io/openshift-release-dev/ocp-release@sha256:347fcefa4cff84074fa56ff73a483b9fee7ba98b9a71752763502f11182a11af url: https://access.redhat.com/errata/RHBA-2022:1162 version: 4.10.7 history: - completionTime: "2022-04-06T06:55:47Z" image: quay.io/openshift-release-dev/ocp-release@sha256:347fcefa4cff84074fa56ff73a483b9fee7ba98b9a71752763502f11182a11af startedTime: "2022-04-06T06:34:31Z" state: Completed verified: false version: 4.10.7 observedGeneration: 3 versionHash: o09_Mvm2ad0= kind: List metadata: resourceVersion: "" selfLink: "" # oc get co/etcd -oyaml apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: annotations: exclude.release.openshift.io/internal-openshift-hosted: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" creationTimestamp: "2022-04-06T06:34:31Z" generation: 1 name: etcd ownerReferences: - apiVersion: config.openshift.io/v1 kind: ClusterVersion name: version uid: d3d4b24e-9b77-4e49-8796-350c2f8cd96f resourceVersion: "29598" uid: b77ad218-07a7-4db5-b862-7d3df7954c36 spec: {} status: conditions: - lastTransitionTime: "2022-04-06T06:39:47Z" message: |- NodeControllerDegraded: All master nodes are ready EtcdMembersDegraded: No unhealthy members found reason: AsExpected status: "False" type: Degraded - lastTransitionTime: "2022-04-06T06:56:30Z" message: |- NodeInstallerProgressing: 3 nodes are at revision 7 EtcdMembersProgressing: No unstarted etcd members found reason: AsExpected status: "False" type: Progressing - lastTransitionTime: "2022-04-06T06:41:55Z" message: |- StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 7 EtcdMembersAvailable: 3 members are available reason: AsExpected status: "True" type: Available - lastTransitionTime: "2022-04-06T06:39:47Z" message: All is well reason: AsExpected status: "True" type: Upgradeable - lastTransitionTime: "2022-04-06T06:39:47Z" reason: ControllerStarted status: Unknown type: RecentBackup extension: null relatedObjects: - group: operator.openshift.io name: cluster resource: etcds - group: "" name: openshift-config resource: namespaces - group: "" name: openshift-config-managed resource: namespaces - group: "" name: openshift-etcd-operator resource: namespaces - group: "" name: openshift-etcd resource: namespaces versions: - name: raw-internal version: 4.10.7 - name: etcd version: 4.10.7 - name: operator version: 4.10.7 Version-Release number of the following components: 4.10.7 How reproducible: Always Steps to Reproduce: 1. Install a 4.10 cluster 2. Upgrade to 4.11 # oc adm upgrade --allow-explicit-upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:28d4c78bd2ce3fa33479c0ee57372777908fead95e150f664fec8e4310cd85e4 Actual results: Upgrade exits because precondition check fails on etcd backup Expected results: CVO sets failing to true and waits for etcd backup Additional info: Please attach logs from ansible-playbook with the -vvv flag --- Additional comment from W. Trevor King on 2022-04-06 19:09:38 UTC --- (In reply to Yang Yang from comment #0) > During minor upgrade from 4.10 to 4.11, CVO sets ReleaseAccepted=False once > it finds etcd RecentBackup not true so that upgrade is never started. > Previously, CVO checked etcd RecentBackup if it’s not true, CVO set > Failing=true, then etcd started to take backup. After backup has been taken, > CVO set Failing to false and proceeded the upgrade. > ... > Steps to Reproduce: > 1. Install a 4.10 cluster > 2. Upgrade to 4.11 This makes it a problem in the 4.10 CVO, probably introduced into 4.10.z by [1]. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2064991 --- Additional comment from W. Trevor King on 2022-04-06 19:14:06 UTC --- And we'll want etcd snapshots working again by the time we are recommending 4.10 -> 4.11 updates, so setting blocker+ on this 4.11.0-targeted bug.
As of change https://github.com/openshift/cluster-version-operator/pull/683 CVO no longer sets Failing=true when the preconditions, including the etcd backup precondition, fail. CVO now sets the ReleaseAccepted condition to indicate whether payload has been successfully loaded. Etcd should now instead check ReleaseAccepted!=true.
Verified with 4.11.0-0.nightly-2022-04-26-030643, upgrade from 4.10.11 to 4.11.0-0.nightly-2022-04-26-030643, it succeed.
Verifying with 4.11.0-0.nightly-2022-04-26-181148 by patching the cv status to change the ReleaseAccepted to false Before patching cv status # oc get co/etcd -oyaml - lastTransitionTime: "2022-04-27T05:50:20Z" reason: ControllerStarted status: Unknown type: RecentBackup Patching cv to change ReleaseAccepted to false # oc scale --replicas 0 -n openshift-cluster-version deployments/cluster-version-operator deployment.apps/cluster-version-operator scaled # oc proxy & # curl -k -XPATCH -H "Accept: application/json" -H "Content-Type: applicaton/json-patch+json" 'http://127.0.0.1:8001/apis/config.openshift.io/v1/clusterversions/version/status' -d '[{"op": "add", "path": "/status/conditions", "value": [{"type":"ReleaseAccepted", "status": "False", "reason": "UpgradePreconditionCheckFailed", "message": "EtcdRecentBackup failed", "lastTransitionTime": "2022-04-27T18:25:51Z"}]}]' { "apiVersion": "config.openshift.io/v1", "kind": "ClusterVersion", "metadata": { "creationTimestamp": "2022-04-27T05:47:06Z", "generation": 4, "managedFields": [ { "apiVersion": "config.openshift.io/v1", "fieldsType": "FieldsV1", "fieldsV1": { "f:spec": { ".": {}, "f:channel": {}, "f:clusterID": {} } }, "manager": "cluster-bootstrap", "operation": "Update", "time": "2022-04-27T05:47:06Z" }, { "apiVersion": "config.openshift.io/v1", "fieldsType": "FieldsV1", "fieldsV1": { "f:status": { ".": {}, "f:availableUpdates": {}, "f:capabilities": { ".": {}, "f:enabledCapabilities": {}, "f:knownCapabilities": {} }, "f:desired": { ".": {}, "f:image": {}, "f:version": {} }, "f:history": {}, "f:observedGeneration": {}, "f:versionHash": {} } }, "manager": "cluster-version-operator", "operation": "Update", "subresource": "status", "time": "2022-04-27T05:47:10Z" }, { "apiVersion": "config.openshift.io/v1", "fieldsType": "FieldsV1", "fieldsV1": { "f:status": { "f:conditions": {} } }, "manager": "curl", "operation": "Update", "subresource": "status", "time": "2022-04-27T12:42:23Z" } ], "name": "version", "resourceVersion": "165197", "uid": "f1924212-4134-4bfb-a860-b24d8e084bad" }, "spec": { "channel": "stable-4.11", "clusterID": "09edcc03-502b-4f63-81f3-d307a002253f" }, "status": { "availableUpdates": null, "capabilities": { "enabledCapabilities": [ "baremetal", "marketplace", "openshift-samples" ], "knownCapabilities": [ "baremetal", "marketplace", "openshift-samples" ] }, "conditions": [ { "lastTransitionTime": "2022-04-27T18:25:51Z", "message": "EtcdRecentBackup failed", "reason": "UpgradePreconditionCheckFailed", "status": "False", "type": "ReleaseAccepted" } ], "desired": { "image": "registry.ci.openshift.org/ocp/release@sha256:30452e14cbefed21f883ac38652b9dbaf653a922a1ca0efd6f3a1a10acfc2e1c", "version": "4.11.0-0.nightly-2022-04-26-181148" }, "history": [ { "completionTime": "2022-04-27T06:07:32Z", "image": "registry.ci.openshift.org/ocp/release@sha256:30452e14cbefed21f883ac38652b9dbaf653a922a1ca0efd6f3a1a10acfc2e1c", "startedTime": "2022-04-27T05:47:10Z", "state": "Completed", "verified": false, "version": "4.11.0-0.nightly-2022-04-26-181148" } ], "observedGeneration": 2, "versionHash": "QNLRulmodCo=" } } # oc get co/etcd -oyaml apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: annotations: exclude.release.openshift.io/internal-openshift-hosted: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" creationTimestamp: "2022-04-27T05:47:10Z" generation: 1 name: etcd ownerReferences: - apiVersion: config.openshift.io/v1 kind: ClusterVersion name: version uid: f1924212-4134-4bfb-a860-b24d8e084bad resourceVersion: "165237" uid: 1ebee225-51a9-4de3-9b2d-1a1c9d240a4c spec: {} status: conditions: - lastTransitionTime: "2022-04-27T05:59:50Z" message: |- NodeControllerDegraded: All master nodes are ready EtcdMembersDegraded: No unhealthy members found reason: AsExpected status: "False" type: Degraded - lastTransitionTime: "2022-04-27T06:10:04Z" message: |- EtcdMembersProgressing: No unstarted etcd members found NodeInstallerProgressing: 3 nodes are at revision 8 reason: AsExpected status: "False" type: Progressing - lastTransitionTime: "2022-04-27T05:52:50Z" message: |- EtcdMembersAvailable: 3 members are available StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 8 reason: AsExpected status: "True" type: Available - lastTransitionTime: "2022-04-27T05:50:19Z" message: All is well reason: AsExpected status: "True" type: Upgradeable - lastTransitionTime: "2022-04-27T12:42:29Z" message: UpgradeBackup pre 4.9 located at path /etc/kubernetes/cluster-backup/upgrade-backup-2022-04-27_124223 on node "yanyang-0427a-j7zrw-master-0.c.openshift-qe.internal" reason: UpgradeBackupSuccessful status: "True" type: RecentBackup extension: null relatedObjects: - group: operator.openshift.io name: cluster resource: etcds - group: "" name: openshift-config resource: namespaces - group: "" name: openshift-config-managed resource: namespaces - group: "" name: openshift-etcd-operator resource: namespaces - group: "" name: openshift-etcd resource: namespaces versions: - name: raw-internal version: 4.11.0-0.nightly-2022-04-26-181148 - name: etcd version: 4.11.0-0.nightly-2022-04-26-181148 - name: operator version: 4.11.0-0.nightly-2022-04-26-181148 # oc get co/etcd -oyaml apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: annotations: exclude.release.openshift.io/internal-openshift-hosted: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" creationTimestamp: "2022-04-27T05:47:10Z" generation: 1 name: etcd ownerReferences: - apiVersion: config.openshift.io/v1 kind: ClusterVersion name: version uid: f1924212-4134-4bfb-a860-b24d8e084bad resourceVersion: "165237" uid: 1ebee225-51a9-4de3-9b2d-1a1c9d240a4c spec: {} status: conditions: - lastTransitionTime: "2022-04-27T05:59:50Z" message: |- NodeControllerDegraded: All master nodes are ready EtcdMembersDegraded: No unhealthy members found reason: AsExpected status: "False" type: Degraded - lastTransitionTime: "2022-04-27T06:10:04Z" message: |- EtcdMembersProgressing: No unstarted etcd members found NodeInstallerProgressing: 3 nodes are at revision 8 reason: AsExpected status: "False" type: Progressing - lastTransitionTime: "2022-04-27T05:52:50Z" message: |- EtcdMembersAvailable: 3 members are available StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 8 reason: AsExpected status: "True" type: Available - lastTransitionTime: "2022-04-27T05:50:19Z" message: All is well reason: AsExpected status: "True" type: Upgradeable - lastTransitionTime: "2022-04-27T12:42:29Z" message: UpgradeBackup pre 4.9 located at path /etc/kubernetes/cluster-backup/upgrade-backup-2022-04-27_124223 on node "yanyang-0427a-j7zrw-master-0.c.openshift-qe.internal" reason: UpgradeBackupSuccessful status: "True" type: RecentBackup extension: null relatedObjects: - group: operator.openshift.io name: cluster resource: etcds - group: "" name: openshift-config resource: namespaces - group: "" name: openshift-config-managed resource: namespaces - group: "" name: openshift-etcd-operator resource: namespaces - group: "" name: openshift-etcd resource: namespaces versions: - name: raw-internal version: 4.11.0-0.nightly-2022-04-26-181148 - name: etcd version: 4.11.0-0.nightly-2022-04-26-181148 - name: operator version: 4.11.0-0.nightly-2022-04-26-181148 Etcd RecentBackup goes to True. Looks good to me.
Moving it to verified state based on comment#5.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069