Bug 1862524 - CVO marks an upgrade as failed when an operator takes more than 10 minutes to rollout
Summary: CVO marks an upgrade as failed when an operator takes more than 10 minutes to...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: W. Trevor King
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks: 1866480
TreeView+ depends on / blocked
 
Reported: 2020-07-31 15:56 UTC by Scott Dodson
Modified: 2020-10-27 16:22 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1866480 (view as bug list)
Environment:
Last Closed: 2020-10-27 16:21:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 422 0 None closed Bug 1862524: pkg/cvo/status: Raise Operator leveling grace-period to 20 minutes 2021-01-28 20:48:39 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:22:21 UTC

Description Scott Dodson 2020-07-31 15:56:18 UTC
Currently the CVO marks an upgrade as failed whenever an operator takes longer than 10 minutes to rollout. It's very common on clusters of any size to take more than 10 minutes to rollout operators which have daemonsets running on all hosts, in particular MCO, network, and dns operators. By moving this to 20 minutes we'll significantly reduce the noise so we can focus on upgrades which have real problems.

There's follow up to make more significant implementation changes here but we'll push those out more slowly

https://issues.redhat.com/browse/OTA-247

Comment 3 Johnny Liu 2020-08-05 14:28:05 UTC
Set up a 4.5 cluster with 3 masters + 9 workers, trigger upgrade towards 4.6.0-0.nightly-2020-08-04-035157.

Check one time per 5 mins, everything is working well.


08-05 20:50:01 The cluster will be updated to 4.6.0-0.nightly-2020-08-04-035157
08-05 20:50:01 Updating to release image registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-08-04-035157
08-05 20:55:02 Status: Working towards 4.6.0-0.nightly-2020-08-04-035157: 30% complete Progress: True Available: True
08-05 21:00:03 Status: Working towards 4.6.0-0.nightly-2020-08-04-035157: 79% complete Progress: True Available: True
08-05 21:05:03 Status: Working towards 4.6.0-0.nightly-2020-08-04-035157: 79% complete Progress: True Available: True
08-05 21:10:13 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator network has not yet successfully rolled out Progress: True Available: True
08-05 21:15:14 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator network has not yet successfully rolled out Progress: True Available: True
08-05 21:20:15 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator monitoring is degraded Progress: True Available: True
08-05 21:25:15 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator monitoring is degraded Progress: True Available: True
08-05 21:30:16 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator monitoring is degraded Progress: True Available: True
08-05 21:35:20 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator monitoring is degraded Progress: True Available: True
08-05 21:40:22 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator monitoring is degraded Progress: True Available: True
08-05 21:45:22 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator monitoring is degraded Progress: True Available: True
08-05 21:50:23 Status: Working towards 4.6.0-0.nightly-2020-08-04-035157: 79% complete Progress: True Available: True
08-05 21:55:24 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator network has not yet successfully rolled out Progress: True Available: True
08-05 22:00:25 Status: Working towards 4.6.0-0.nightly-2020-08-04-035157: 84% complete Progress: True Available: True
08-05 22:05:25 Status: Working towards 4.6.0-0.nightly-2020-08-04-035157: 84% complete Progress: True Available: True
08-05 22:10:27 Status: Unable to apply 4.6.0-0.nightly-2020-08-04-035157: the cluster operator machine-config has not yet successfully rolled out Progress: True Available: True
08-05 22:15:27 Status: Working towards 4.6.0-0.nightly-2020-08-04-035157: 28% complete Progress: True Available: True
08-05 22:20:28 Status: Cluster version is 4.6.0-0.nightly-2020-08-04-035157 Progress: False Available: True

Comment 4 W. Trevor King 2020-08-05 23:03:46 UTC
I don't think we need doc text for this temporary bandaid.  We can add doc text when we raise the limit to infinity ;).

Comment 6 errata-xmlrpc 2020-10-27 16:21:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.