While debugging ci failures for the 1.18.2 rebase, the console operator was observed in an available=false/progressing=true state for an extended period of time (30+m) despite the managed resources appearing healthy. I checked the operator log and the following entry was repeated approximately once per second: E0503 16:17:08.883797 1 status.go:121] status update error: Operation cannot be fulfilled on consoles.operator.openshift.io "cluster": the object has been modified; please apply your changes to the latest version and try again A survey of the operator code revealed 3 separate sync loops (clidownloads, service, operator) writing status via the SyncStatus method: https://github.com/openshift/console-operator/blob/master/pkg/console/status/status.go#L116 It would appear that these 3 controllers are racing each other. I ran a modified operator against the same cluster - with the calls to SyncStatus in the service and clidownloads controllers commented out - and the operator status was updated to available=true/progressing=false in ~20s. To minimize the potential for conflicting updates prevent the timely reporting of accurate status, consider updating the controllers to set operator status via library-go's UpdateStatus method: https://github.com/openshift/library-go/blob/master/pkg/operator/v1helpers/helpers.go#L154 UpdateStatus accepts a set of functions that apply the conditions computed by a sync loop, and repeatedly tries to apply them to the current state of the status resource. Its use is recommended when multiple actors in an operator need to set status. The changes required would involve collecting condition changes rather than setting them directly on the resource to be updated, as per a similar change recently merged to the auth operator: https://github.com/openshift/cluster-authentication-operator/pull/269 It also appears that no caller of SyncStatus checks the returned error. This suggests the addition of a golint check to the verify suite to avoid this and other common sources of mechanical error.
> While debugging ci failures for the 1.18.2 rebase Hi, I want to confirm where did you find the failure log of console-operator?
(In reply to Yadan Pei from comment #3) > > While debugging ci failures for the 1.18.2 rebase > > Hi, I want to confirm where did you find the failure log of console-operator? The reported failures were observed in the pod log of the operator. Note that verification of this bz requires that the 1.18.2 rebase be completed, but it is still in progress. If a 4.5 cluster based on 1.18.2 is deployed healthy, then the problem reported in the bz has been fixed.
Created attachment 1688809 [details] console cluster operator status Available
Created attachment 1688810 [details] version is successfully deployed
Thank you Maru! Now the latest 4.5 cluster based on 1.18.2 is deployed healthy, attached the screenshot for refer. So this bug could be Verified on below version: OpenShift Version 4.5.0-0.nightly-2020-05-15-011814 Kubernetes Version v1.18.2 Channel stable-4.5
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409