1694216 – openshift controller manager failed to upgrade

Bug 1694216 - openshift controller manager failed to upgrade

Summary: openshift controller manager failed to upgrade

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Build
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	4.1.0
Assignee:	Corey Daley
QA Contact:	wewang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-03-29 19:46 UTC by Ben Parees
Modified:	2019-06-04 10:46 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-04 10:46:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1691505	0	unspecified	CLOSED	openshift-controller-manager-operator does not report 'Progressing=true' during daemonset rollout	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHBA-2019:0758	0	None	None	None	2019-06-04 10:46:47 UTC

Internal Links: 1691505

Description Ben Parees 2019-03-29 19:46:42 UTC

Description of problem:
Mar 29 18:11:08.848: INFO: cluster upgrade is failing: Cluster operator openshift-controller-manager is still updating

https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/753


How reproducible:
flake


Note:  We are seeing this form of failure for several operators, I will cross link them as i open the bugs.

I have not dug into whether this operator specifically failed to upgrade, or if something earlier in the process took so long that this operator was the "victim" of the eventual timeout.  As you investigate the job that failed, feel free to reassign this if you think there is a root cause that is independent of your operator.

If your operator currently lacks sufficient events/logs/etc to determine when it started upgrading and what it was doing when we timed out, consider using this bug to introduce that information.

Comment 1 Ben Parees 2019-03-29 19:54:10 UTC

Related "operator failed to upgrade" bug: https://bugzilla.redhat.com/show_bug.cgi?id=1694220

Comment 2 Ben Parees 2019-03-29 19:56:24 UTC

Related operator failed to upgrade bug: https://bugzilla.redhat.com/show_bug.cgi?id=1694222

Comment 3 Ben Parees 2019-03-29 20:04:26 UTC

recurrence:
https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/719

Comment 4 Adam Kaplan 2019-04-01 15:01:28 UTC

Potential related/duplicate bug: https://bugzilla.redhat.com/show_bug.cgi?id=1691505

Comment 5 wewang 2019-04-02 07:56:12 UTC

I Update version from 4.0.0-0.nightly-2019-03-25-180911 to 4.0.0-0.nightly-2019-03-28-030453, did not met "Cluster operator openshift-controller-manager is still updating", and pods are running in openshift-controller-manager, so how to reproduce it?
$ oc get clusteroperator/openshift-controller-manager  
NAME                           VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
openshift-controller-manager   4.0.0-0.nightly-2019-03-28-030453   True        False         False     23m

Comment 6 wewang 2019-04-02 08:08:45 UTC

FYI, pods are also running in namespace openshift-controller-manager-cluster, and image are updated. it saw it's a flake bug, it's hard to reproduce it, right?

Comment 7 Adam Kaplan 2019-04-03 20:39:19 UTC

@wewang it may not be consistent - it depends on how long it takes the openshift-controller-manager daemonset to roll out. We've observed this enough to know that it is a bug - at minimum we must have Progressing=true on the cluster operator instance during the daemonset rollout.

Comment 8 Adam Kaplan 2019-04-11 12:56:44 UTC

Corey's PR: https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/85

Comment 9 Corey Daley 2019-04-15 01:39:41 UTC

Pull request has merged.

Comment 11 wewang 2019-04-16 05:38:57 UTC

Upgraded the clusterversion from 4.1.0-0.ci-2019-04-11-015355 to 4.1.0-0.ci-2019-04-15-224704,During clusteroperator/openshift-controller-manager upgrade, when daemonset rollout, Progressing are setting true now, and clusteroperator updated to 4.1.0-0.ci-2019-04-15-224704, check follow info


$ oc get clusteroperator/openshift-controller-manager
NAME                           VERSION                        AVAILABLE   PROGRESSING   FAILING   SINCE
openshift-controller-manager   4.1.0-0.ci-2019-04-15-224704   True        False         False     118m

$oc logs -f pod/openshift-controller-manager-operator-856b59c84c-wb9dx -n openshift-controller-manager-operator --loglevel=5  |grep -i PROGRESSING

I0416 03:35:30.231803       1 status_controller.go:152] clusteroperator/openshift-controller-manager diff {"status":{"conditions":[{"lastTransitionTime":"2019-04-16T03:35:29Z","reason":"AsExpected","status":"False","type":"Failing"},{"lastTransitionTime":"2019-04-16T03:35:30Z","message":"Progressing: daemonset/controller-manager: observed generation is 6, desired generation is 7.\nProgressing: openshiftcontrollermanagers.operator.openshift.io/cluster: observed generation is 3, desired generation is 4.","reason":"Progressing","status":"True","type":"Progressing"},{"lastTransitionTime":"2019-04-16T03:35:29Z","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2019-04-16T03:35:29Z","reason":"NoData","status":"Unknown","type":"Upgradeable"}],"versions":[{"name":"operator","version":"4.1.0-0.ci-2019-04-15-224704"}]}}
I0416 03:35:30.237320       1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-controller-manager-operator", Name:"openshift-controller-manager-operator", UID:"996a0c7e-5fec-11e9-bae2-0279093214fc", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for operator openshift-controller-manager changed: Progressing changed from False to True ("Progressing: daemonset/controller-manager: observed generation is 6, desired generation is 7.\nProgressing: openshiftcontrollermanagers.operator.openshift.io/cluster: observed generation is 3, desired generation is 4.")

Comment 13 errata-xmlrpc 2019-06-04 10:46:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Note You need to log in before you can comment on or make changes to this bug.