Bug 2071211 - CVO does not trigger new upgrade again after fail to update to unavailable payload
Summary: CVO does not trigger new upgrade again after fail to update to unavailable pa...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.10.z
Assignee: Over the Air Updates
QA Contact: liujia
Depends On: 2062568
TreeView+ depends on / blocked
Reported: 2022-04-02 08:03 UTC by liujia
Modified: 2022-08-18 02:25 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2062568
Last Closed: 2022-05-02 18:38:50 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 764 0 None Merged Bug 2071211: lib/resourcebuilder/batch: Stop waiting on Job deadline exceeded 2022-08-18 01:58:53 UTC
Red Hat Product Errata RHBA-2022:1601 0 None None None 2022-05-02 18:39:07 UTC

Comment 2 liujia 2022-04-19 03:21:40 UTC
Checked the cluster that launched by cluster-bot: 4.10,openshift/cluster-version-operator#764

1. First try to upgrade with unavailable repo and failed.
# ./oc get clusterversion -ojson|jq -r '.items[].status.conditions[]| select(.type=="ReleaseAccepted")'{
  "lastTransitionTime": "2022-04-19T02:59:18Z",
  "message": "Retrieving payload failed version=\"\" image=\"quay.io/openshift-release-dev-test/ocp-release@sha256:39efe13ef67cb4449f5e6cdd8a26c83c07c6a2ce5d235dfbc3ba58c64418fcf3\" failure=Unable to download and prepare the update: deadline exceeded, reason: \"DeadlineExceeded\", message: \"Job was active longer than specified deadline\"",
  "reason": "RetrievePayload",
  "status": "False",
  "type": "ReleaseAccepted"

2. Continue upgrade to target payload with correct repo. The upgrade is not triggered successfully. But this time it's because another known issue.
# ./oc get clusterversion -ojson|jq -r '.items[].status.conditions[]| select(.type=="ReleaseAccepted")'
  "lastTransitionTime": "2022-04-19T02:59:18Z",
  "message": "Preconditions failed for payload loaded version=\"4.10.10\" image=\"quay.io/openshift-release-dev/ocp-release@sha256:39efe13ef67cb4449f5e6cdd8a26c83c07c6a2ce5d235dfbc3ba58c64418fcf3\": Precondition \"EtcdRecentBackup\" failed because of \"ControllerStarted\": ",
  "reason": "PreconditionChecks",
  "status": "False",
  "type": "ReleaseAccepted"

Further check to find that new job of downloading the new target payload is successful, which means the new update is not blocked by the issue in this bug yet.
# ./oc get job
version--jvldk   1/1           8s         11m
version--snz26   0/1           21m        21m

# ./oc describe pod/version--jvldk-j9zvz|grep "quay.io"
    Image:         quay.io/openshift-release-dev/ocp-release@sha256:39efe13ef67cb4449f5e6cdd8a26c83c07c6a2ce5d235dfbc3ba58c64418fcf3
    Image ID:      quay.io/openshift-release-dev/ocp-release@sha256:39efe13ef67cb4449f5e6cdd8a26c83c07c6a2ce5d235dfbc3ba58c64418fcf3
  Normal  Pulling         10m   kubelet  Pulling image "quay.io/openshift-release-dev/ocp-release@sha256:39efe13ef67cb4449f5e6cdd8a26c83c07c6a2ce5d235dfbc3ba58c64418fcf3"
  Normal  Pulled          10m   kubelet  Successfully pulled image "quay.io/openshift-release-dev/ocp-release@sha256:39efe13ef67cb4449f5e6cdd8a26c83c07c6a2ce5d235dfbc3ba58c64418fcf3" in 2.976460275s

Comment 7 errata-xmlrpc 2022-05-02 18:38:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.12 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Comment 8 W. Trevor King 2022-08-18 02:25:03 UTC
We're considering taking this back to 4.9.z in [1].

[1]: https://issues.redhat.com/browse/OCPBUGS-230

Note You need to log in before you can comment on or make changes to this bug.