Bug 1862524

Summary: CVO marks an upgrade as failed when an operator takes more than 10 minutes to rollout
Product: OpenShift Container Platform Reporter: Scott Dodson <sdodson>
Component: Cluster Version OperatorAssignee: W. Trevor King <wking>
Status: CLOSED ERRATA QA Contact: Johnny Liu <jialiu>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: aos-bugs, jokerman
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1866480 (view as bug list) Environment:
Last Closed: 2020-10-27 16:21:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1866480    

Description Scott Dodson 2020-07-31 15:56:18 UTC
Currently the CVO marks an upgrade as failed whenever an operator takes longer than 10 minutes to rollout. It's very common on clusters of any size to take more than 10 minutes to rollout operators which have daemonsets running on all hosts, in particular MCO, network, and dns operators. By moving this to 20 minutes we'll significantly reduce the noise so we can focus on upgrades which have real problems.

There's follow up to make more significant implementation changes here but we'll push those out more slowly

https://issues.redhat.com/browse/OTA-247

Comment 3 Johnny Liu 2020-08-05 14:28:05 UTC
Set up a 4.5 cluster with 3 masters + 9 workers, trigger upgrade towards 4.6.0-0.nightly-2020-08-04-035157.

Check one time per 5 mins, everything is working well.


08-05 20:50:01 The cluster will be updated to 4.6.0-0.nightly-2020-08-04-035157
08-05 20:50:01 Updating to release image registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-08-04-035157
08-05 20:55:02 Status: Working towards 4.6.0-0.nightly-2020-08-04-035157: 30% complete Progress: True Available: True
08-05 21:00:03 Status: Working towards 4.6.0-0.nightly-2020-08-04-035157: 79% complete Progress: True Available: True
08-05 21:05:03 Status: Working towards 4.6.0-0.nightly-2020-08-04-035157: 79% complete Progress: True Available: True
08-05 21:10:13 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator network has not yet successfully rolled out Progress: True Available: True
08-05 21:15:14 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator network has not yet successfully rolled out Progress: True Available: True
08-05 21:20:15 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator monitoring is degraded Progress: True Available: True
08-05 21:25:15 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator monitoring is degraded Progress: True Available: True
08-05 21:30:16 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator monitoring is degraded Progress: True Available: True
08-05 21:35:20 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator monitoring is degraded Progress: True Available: True
08-05 21:40:22 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator monitoring is degraded Progress: True Available: True
08-05 21:45:22 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator monitoring is degraded Progress: True Available: True
08-05 21:50:23 Status: Working towards 4.6.0-0.nightly-2020-08-04-035157: 79% complete Progress: True Available: True
08-05 21:55:24 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator network has not yet successfully rolled out Progress: True Available: True
08-05 22:00:25 Status: Working towards 4.6.0-0.nightly-2020-08-04-035157: 84% complete Progress: True Available: True
08-05 22:05:25 Status: Working towards 4.6.0-0.nightly-2020-08-04-035157: 84% complete Progress: True Available: True
08-05 22:10:27 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator machine-config has not yet successfully rolled out Progress: True Available: True
08-05 22:15:27 Status: Working towards 4.6.0-0.nightly-2020-08-04-035157: 28% complete Progress: True Available: True
08-05 22:20:28 Status: Cluster version is 4.6.0-0.nightly-2020-08-04-035157 Progress: False Available: True

Comment 4 W. Trevor King 2020-08-05 23:03:46 UTC
I don't think we need doc text for this temporary bandaid.  We can add doc text when we raise the limit to infinity ;).

Comment 6 errata-xmlrpc 2020-10-27 16:21:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196