Hide Forgot
Description of problem: Mar 29 18:11:08.848: INFO: cluster upgrade is failing: Cluster operator openshift-controller-manager is still updating https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/753 How reproducible: flake Note: We are seeing this form of failure for several operators, I will cross link them as i open the bugs. I have not dug into whether this operator specifically failed to upgrade, or if something earlier in the process took so long that this operator was the "victim" of the eventual timeout. As you investigate the job that failed, feel free to reassign this if you think there is a root cause that is independent of your operator. If your operator currently lacks sufficient events/logs/etc to determine when it started upgrading and what it was doing when we timed out, consider using this bug to introduce that information.
Related "operator failed to upgrade" bug: https://bugzilla.redhat.com/show_bug.cgi?id=1694220
Related operator failed to upgrade bug: https://bugzilla.redhat.com/show_bug.cgi?id=1694222
recurrence: https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/719
Potential related/duplicate bug: https://bugzilla.redhat.com/show_bug.cgi?id=1691505
I Update version from 4.0.0-0.nightly-2019-03-25-180911 to 4.0.0-0.nightly-2019-03-28-030453, did not met "Cluster operator openshift-controller-manager is still updating", and pods are running in openshift-controller-manager, so how to reproduce it? $ oc get clusteroperator/openshift-controller-manager NAME VERSION AVAILABLE PROGRESSING FAILING SINCE openshift-controller-manager 4.0.0-0.nightly-2019-03-28-030453 True False False 23m
FYI, pods are also running in namespace openshift-controller-manager-cluster, and image are updated. it saw it's a flake bug, it's hard to reproduce it, right?
@wewang it may not be consistent - it depends on how long it takes the openshift-controller-manager daemonset to roll out. We've observed this enough to know that it is a bug - at minimum we must have Progressing=true on the cluster operator instance during the daemonset rollout.
Corey's PR: https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/85
Pull request has merged.
Upgraded the clusterversion from 4.1.0-0.ci-2019-04-11-015355 to 4.1.0-0.ci-2019-04-15-224704,During clusteroperator/openshift-controller-manager upgrade, when daemonset rollout, Progressing are setting true now, and clusteroperator updated to 4.1.0-0.ci-2019-04-15-224704, check follow info $ oc get clusteroperator/openshift-controller-manager NAME VERSION AVAILABLE PROGRESSING FAILING SINCE openshift-controller-manager 4.1.0-0.ci-2019-04-15-224704 True False False 118m $oc logs -f pod/openshift-controller-manager-operator-856b59c84c-wb9dx -n openshift-controller-manager-operator --loglevel=5 |grep -i PROGRESSING I0416 03:35:30.231803 1 status_controller.go:152] clusteroperator/openshift-controller-manager diff {"status":{"conditions":[{"lastTransitionTime":"2019-04-16T03:35:29Z","reason":"AsExpected","status":"False","type":"Failing"},{"lastTransitionTime":"2019-04-16T03:35:30Z","message":"Progressing: daemonset/controller-manager: observed generation is 6, desired generation is 7.\nProgressing: openshiftcontrollermanagers.operator.openshift.io/cluster: observed generation is 3, desired generation is 4.","reason":"Progressing","status":"True","type":"Progressing"},{"lastTransitionTime":"2019-04-16T03:35:29Z","reason":"AsExpected","status":"True","type":"Available"},{"lastTransitionTime":"2019-04-16T03:35:29Z","reason":"NoData","status":"Unknown","type":"Upgradeable"}],"versions":[{"name":"operator","version":"4.1.0-0.ci-2019-04-15-224704"}]}} I0416 03:35:30.237320 1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-controller-manager-operator", Name:"openshift-controller-manager-operator", UID:"996a0c7e-5fec-11e9-bae2-0279093214fc", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for operator openshift-controller-manager changed: Progressing changed from False to True ("Progressing: daemonset/controller-manager: observed generation is 6, desired generation is 7.\nProgressing: openshiftcontrollermanagers.operator.openshift.io/cluster: observed generation is 3, desired generation is 4.")
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758