Bug 1855839

Summary: machine-api-operator keeps updating clusteroperator/machine-api in stable state
Product: OpenShift Container Platform Reporter: Tomáš Nožička <tnozicka>
Component: Cloud ComputeAssignee: Joel Speed <jspeed>
Cloud Compute sub component: Other Providers QA Contact: Milind Yadav <miyadav>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: unspecified CC: zhsun
Version: 4.5   
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The status was set to progressing on every reconcile even when no updates were due to be rolled out. Consequence: Status flicked consistently between two states Fix: Only update the status when changes are due to be rolled out. Result: Status is now stable
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:13:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tomáš Nožička 2020-07-10 16:24:16 UTC
Description of problem:

machine-config-operator keeps updating clusteroperator/machine-api in stable state it keeps oscillating between

  - lastTransitionTime: "2020-07-09T11:03:20Z"
    status: "False"
    type: Progressing

and

  - lastTransitionTime: "2020-07-09T11:03:20Z"
    message: 'Running resync for operator: 4.5.0-0.ci-2020-07-08-231102'
    reason: SyncingResources
    status: "False"
    type: Progressing

in a loop.

The operator state isn't changing so this looks like accidental update that we shouldn't need.


Version-Release number of selected component (if applicable):
Client Version: v4.2.0-alpha.0-670-g9060d2f
Server Version: 4.5.0-0.ci-2020-07-08-231102
Kubernetes Version: v1.18.3


How reproducible:
always


Steps to Reproduce:
1. stable cluster
2. $ oc get clusteroperators/machine-api -w -o yaml
3.

Actual results:
The clusteroperator keeps updating back and forth with dummy info.


Expected results:
Operator doesn't cause updates to the resource in stable state.


Additional info:

Comment 1 Joel Speed 2020-07-21 10:54:51 UTC
This flickering of the status is currently by design. We currently update the status at the beginning of each reconcile whether there has been a drift in the configuration or not.
It is not particularly easy for us to determine if a resource has been modified due to the way that the operator applies updates to the resource.
We deemed it safer to have the status suggest something might change and then nothing to change, rather than the status saying everything is in sync and then the controller update the resources it manages because of a manual change somewhere.

I will investigate how we could determine earlier if a change will need to be made or not and see if we can short circuit before updating the status.

Comment 2 Joel Speed 2020-07-29 16:32:43 UTC
Going to need to spend some time looking into this, will do so next sprint

Comment 3 Joel Speed 2020-08-18 10:47:42 UTC
Didn't get a chance to look into this during this sprint as higher priority work took precedence, will try to look into this next sprint

Comment 6 Milind Yadav 2020-09-14 05:18:15 UTC
[miyadav@miyadav ~]$ oc project openshift-machine-api
Now using project "openshift-machine-api" on server "https://api.o46-miyadav-0914.qe.devcluster.openshift.com:6443".
[miyadav@miyadav ~]$ oc get clusteroperator/machine-api -w
NAME          VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
machine-api   4.6.0-0.nightly-2020-09-12-230035   True        False         False      41m

..
..
..


Status didnt change , Moving to VERIFIED

Comment 8 errata-xmlrpc 2020-10-27 16:13:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196