Bug 2108692

Summary: CGU status does not reflect timeouts from earlier batches
Product: OpenShift Container Platform Reporter: jun
Component: Telco EdgeAssignee: jun
Telco Edge sub component: TALO QA Contact: yliu1
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: medium    
Priority: high CC: ijolliff, jun, rauherna, yliu1
Version: 4.10   
Target Milestone: ---   
Target Release: 4.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2087125 Environment:
Last Closed: 2023-07-04 15:16:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2108639    
Bug Blocks:    

Description jun 2022-07-19 17:30:35 UTC
+++ This bug was initially created as a clone of Bug #2087125 +++

Description of problem:
For an upgrade with multiple batches, the final CGU status becomes completed as long as the last batch becomes all compliant within the overall timeout period. Timeouts from earlier batches are not reflected.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. 
2.
3.

Actual results:


Expected results:
After last batch is completed, the final status should be set to completed only if all batches are compliant

Additional info:

Comment 2 yliu1 2022-08-24 20:22:02 UTC
Verification is currently blocked because fix for this bz is not in 4.10 yet: https://bugzilla.redhat.com/show_bug.cgi?id=2117038
Changing state to POST..

Comment 4 yliu1 2022-10-25 23:40:43 UTC
installplans now gets approved in about 1 minute using latest 4.11.2 TALM.

Comment 5 yliu1 2022-10-26 00:28:35 UTC
build used: topology-aware-lifecycle-manager.4.10.0-202210241500   Topology Aware Lifecycle Manager   4.10.0-202210241500                                      Succeeded
 
Overall status is timeout while second batch passed.

 status:
    computedMaxConcurrency: 1
    conditions:
    - lastTransitionTime: "2022-10-26T00:05:05Z"
      message: The ClusterGroupUpgrade CR policies are taking too long to complete
      reason: UpgradeTimedOut
      status: "False"
      type: Ready
    managedPoliciesContent:
      du-upgrade-cluster-version-policy1: "null"
    managedPoliciesForUpgrade:
    - name: du-upgrade-cluster-version-policy1
      namespace: ztp-upgrade
    managedPoliciesNs:
      du-upgrade-cluster-version-policy1: ztp-upgrade
    precaching:
      spec: {}
    remediationPlan:
    - - spoke-5
    - - spoke-3
    status:
      currentBatch: 2
      currentBatchRemediationProgress:
        spoke-3:
          state: Completed
      currentBatchStartedAt: "2022-10-26T00:15:05Z"
      startedAt: "2022-10-26T00:05:05Z"