Bug 2071211

Summary: CVO does not trigger new upgrade again after fail to update to unavailable payload
Product: OpenShift Container Platform Reporter: liujia <jiajliu>
Component: Cluster Version OperatorAssignee: Over the Air Updates <aos-team-ota>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: high Docs Contact:
Priority: high    
Version: 4.10CC: aos-bugs, jack.ottofaro, wking
Target Milestone: ---   
Target Release: 4.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2062568 Environment:
Last Closed: 2022-05-02 18:38:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2062568    
Bug Blocks:    

Comment 2 liujia 2022-04-19 03:21:40 UTC
Checked the cluster that launched by cluster-bot: 4.10,openshift/cluster-version-operator#764

1. First try to upgrade with unavailable repo and failed.
# ./oc get clusterversion -ojson|jq -r '.items[].status.conditions[]| select(.type=="ReleaseAccepted")'{
  "lastTransitionTime": "2022-04-19T02:59:18Z",
  "message": "Retrieving payload failed version=\"\" image=\"quay.io/openshift-release-dev-test/ocp-release@sha256:39efe13ef67cb4449f5e6cdd8a26c83c07c6a2ce5d235dfbc3ba58c64418fcf3\" failure=Unable to download and prepare the update: deadline exceeded, reason: \"DeadlineExceeded\", message: \"Job was active longer than specified deadline\"",
  "reason": "RetrievePayload",
  "status": "False",
  "type": "ReleaseAccepted"
}

2. Continue upgrade to target payload with correct repo. The upgrade is not triggered successfully. But this time it's because another known issue.
# ./oc get clusterversion -ojson|jq -r '.items[].status.conditions[]| select(.type=="ReleaseAccepted")'
{
  "lastTransitionTime": "2022-04-19T02:59:18Z",
  "message": "Preconditions failed for payload loaded version=\"4.10.10\" image=\"quay.io/openshift-release-dev/ocp-release@sha256:39efe13ef67cb4449f5e6cdd8a26c83c07c6a2ce5d235dfbc3ba58c64418fcf3\": Precondition \"EtcdRecentBackup\" failed because of \"ControllerStarted\": ",
  "reason": "PreconditionChecks",
  "status": "False",
  "type": "ReleaseAccepted"
}

Further check to find that new job of downloading the new target payload is successful, which means the new update is not blocked by the issue in this bug yet.
# ./oc get job
NAME             COMPLETIONS   DURATION   AGE
version--jvldk   1/1           8s         11m
version--snz26   0/1           21m        21m

# ./oc describe pod/version--jvldk-j9zvz|grep "quay.io"
    Image:         quay.io/openshift-release-dev/ocp-release@sha256:39efe13ef67cb4449f5e6cdd8a26c83c07c6a2ce5d235dfbc3ba58c64418fcf3
    Image ID:      quay.io/openshift-release-dev/ocp-release@sha256:39efe13ef67cb4449f5e6cdd8a26c83c07c6a2ce5d235dfbc3ba58c64418fcf3
  Normal  Pulling         10m   kubelet  Pulling image "quay.io/openshift-release-dev/ocp-release@sha256:39efe13ef67cb4449f5e6cdd8a26c83c07c6a2ce5d235dfbc3ba58c64418fcf3"
  Normal  Pulled          10m   kubelet  Successfully pulled image "quay.io/openshift-release-dev/ocp-release@sha256:39efe13ef67cb4449f5e6cdd8a26c83c07c6a2ce5d235dfbc3ba58c64418fcf3" in 2.976460275s

Comment 7 errata-xmlrpc 2022-05-02 18:38:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.12 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1601

Comment 8 W. Trevor King 2022-08-18 02:25:03 UTC
We're considering taking this back to 4.9.z in [1].

[1]: https://issues.redhat.com/browse/OCPBUGS-230