Bug 1703879 - [upgrade] Upgrades take too long, likely due to misbehaving operators
Summary: [upgrade] Upgrades take too long, likely due to misbehaving operators
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.1.0
Assignee: Ben Bennett
QA Contact: liujia
URL:
Whiteboard:
Depends On: 1702390 1702414 1703699
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-29 01:33 UTC by Clayton Coleman
Modified: 2019-05-07 15:28 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-07 15:27:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Clayton Coleman 2019-04-29 01:33:23 UTC
Current upgrade runs take about 40-60 minutes, even when only some changes are occurring. This is a long period of disruption and any misbehaving operators need individual bugs.  On average, components except for control plane and the MCD should take less than 5 minutes.

Here is a run that took ~70 minutes. There are lots of errors in the monitor log, which likely indicates errors that delayed normal roll out. The list of reported errors needs to be triaged by individual teams as bugs linked to this issue. 

https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/895

Please work with team leads to ensure they investigate runs like this and subdivide work.

Comment 1 Scott Dodson 2019-04-29 18:23:37 UTC
Known bugs related to this overall theme:

https://bugzilla.redhat.com/show_bug.cgi?id=1702414
https://bugzilla.redhat.com/show_bug.cgi?id=1702390

Assign to group lead assigned to build-cop responsibilities, leaving component set to Upgrades.

Comment 2 Ben Bennett 2019-05-07 15:27:38 UTC
Closed because the immediate problems are resolved.

I'm still working on getting information on how long components are expected to take, and working out how to determine how long a component took to complete the upgrade.

Comment 3 Ben Bennett 2019-05-07 15:28:31 UTC
https://github.com/sjenning/oschart can help determine the slowdowns


Note You need to log in before you can comment on or make changes to this bug.