Bug 1694216 - openshift controller manager failed to upgrade
Summary: openshift controller manager failed to upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.1.0
Assignee: Corey Daley
QA Contact: wewang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-29 19:46 UTC by Ben Parees
Modified: 2019-06-04 10:46 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:46:40 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:46:47 UTC
Red Hat Bugzilla 1691505 None CLOSED openshift-controller-manager-operator does not report 'Progressing=true' during daemonset rollout 2019-09-18 08:30:26 UTC

Internal Links: 1691505

Description Ben Parees 2019-03-29 19:46:42 UTC
Description of problem:
Mar 29 18:11:08.848: INFO: cluster upgrade is failing: Cluster operator openshift-controller-manager is still updating

https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/753


How reproducible:
flake


Note:  We are seeing this form of failure for several operators, I will cross link them as i open the bugs.

I have not dug into whether this operator specifically failed to upgrade, or if something earlier in the process took so long that this operator was the "victim" of the eventual timeout.  As you investigate the job that failed, feel free to reassign this if you think there is a root cause that is independent of your operator.

If your operator currently lacks sufficient events/logs/etc to determine when it started upgrading and what it was doing when we timed out, consider using this bug to introduce that information.

Comment 1 Ben Parees 2019-03-29 19:54:10 UTC
Related "operator failed to upgrade" bug: https://bugzilla.redhat.com/show_bug.cgi?id=1694220

Comment 2 Ben Parees 2019-03-29 19:56:24 UTC
Related operator failed to upgrade bug: https://bugzilla.redhat.com/show_bug.cgi?id=1694222

Comment 4 Adam Kaplan 2019-04-01 15:01:28 UTC
Potential related/duplicate bug: https://bugzilla.redhat.com/show_bug.cgi?id=1691505

Comment 5 wewang 2019-04-02 07:56:12 UTC
I Update version from 4.0.0-0.nightly-2019-03-25-180911 to 4.0.0-0.nightly-2019-03-28-030453, did not met "Cluster operator openshift-controller-manager is still updating", and pods are running in openshift-controller-manager, so how to reproduce it?
$ oc get clusteroperator/openshift-controller-manager  
NAME                           VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
openshift-controller-manager   4.0.0-0.nightly-2019-03-28-030453   True        False         False     23m

Comment 6 wewang 2019-04-02 08:08:45 UTC
FYI, pods are also running in namespace openshift-controller-manager-cluster, and image are updated. it saw it's a flake bug, it's hard to reproduce it, right?

Comment 7 Adam Kaplan 2019-04-03 20:39:19 UTC
@wewang it may not be consistent - it depends on how long it takes the openshift-controller-manager daemonset to roll out. We've observed this enough to know that it is a bug - at minimum we must have Progressing=true on the cluster operator instance during the daemonset rollout.

Comment 9 Corey Daley 2019-04-15 01:39:41 UTC
Pull request has merged.

Comment 11 wewang 2019-04-16 05:38:57 UTC
Upgraded the clusterversion from 4.1.0-0.ci-2019-04-11-015355 to 4.1.0-0.ci-2019-04-15-224704,During clusteroperator/openshift-controller-manager upgrade, when daemonset rollout, Progressing are setting true now, and clusteroperator updated to 4.1.0-0.ci-2019-04-15-224704, check follow info


$ oc get clusteroperator/openshift-controller-manager
NAME                           VERSION                        AVAILABLE   PROGRESSING   FAILING   SINCE
openshift-controller-manager   4.1.0-0.ci-2019-04-15-224704   True        False         False     118m

$oc logs -f pod/openshift-controller-manager-operator-856b59c84c-wb9dx -n openshift-controller-manager-operator --loglevel=5  |grep -i PROGRESSING

I0416 03:35:30.231803       1 status_controller.go:152] clusteroperator/openshift-controller-manager diff {"status":{"conditions":[{"lastTransitionTime":"2019-04-16T03:35:29Z","reason":"AsExpected","status":"False","type":"Failing"},{"lastTransitionTime":"2019-04-16T03:35:30Z","message":"Progressing: daemonset/controller-manager: observed generation is 6, desired generation is 7.\nProgressing: openshiftcontrollermanagers.operator.openshift.io/cluster: observed generation is 3, desired generation is 4.","reason":"Progressing","status":"True","type":"Progressing"},{"lastTransitionTime":"2019-04-16T03:35:29Z","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2019-04-16T03:35:29Z","reason":"NoData","status":"Unknown","type":"Upgradeable"}],"versions":[{"name":"operator","version":"4.1.0-0.ci-2019-04-15-224704"}]}}
I0416 03:35:30.237320       1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-controller-manager-operator", Name:"openshift-controller-manager-operator", UID:"996a0c7e-5fec-11e9-bae2-0279093214fc", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for operator openshift-controller-manager changed: Progressing changed from False to True ("Progressing: daemonset/controller-manager: observed generation is 6, desired generation is 7.\nProgressing: openshiftcontrollermanagers.operator.openshift.io/cluster: observed generation is 3, desired generation is 4.")

Comment 13 errata-xmlrpc 2019-06-04 10:46:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.