Bug 1855839 - machine-api-operator keeps updating clusteroperator/machine-api in stable state
Summary: machine-api-operator keeps updating clusteroperator/machine-api in stable state
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.5
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.6.0
Assignee: Joel Speed
QA Contact: Milind Yadav
Depends On:
TreeView+ depends on / blocked
Reported: 2020-07-10 16:24 UTC by Tomáš Nožička
Modified: 2020-09-21 10:05 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The status was set to progressing on every reconcile even when no updates were due to be rolled out. Consequence: Status flicked consistently between two states Fix: Only update the status when changes are due to be rolled out. Result: Status is now stable
Clone Of:
Last Closed:
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Github openshift machine-api-operator pull 695 None closed Bug 1855839: Ensure Progressing condition is stable when no updates to apply 2020-09-21 10:02:41 UTC

Description Tomáš Nožička 2020-07-10 16:24:16 UTC
Description of problem:

machine-config-operator keeps updating clusteroperator/machine-api in stable state it keeps oscillating between

  - lastTransitionTime: "2020-07-09T11:03:20Z"
    status: "False"
    type: Progressing


  - lastTransitionTime: "2020-07-09T11:03:20Z"
    message: 'Running resync for operator: 4.5.0-0.ci-2020-07-08-231102'
    reason: SyncingResources
    status: "False"
    type: Progressing

in a loop.

The operator state isn't changing so this looks like accidental update that we shouldn't need.

Version-Release number of selected component (if applicable):
Client Version: v4.2.0-alpha.0-670-g9060d2f
Server Version: 4.5.0-0.ci-2020-07-08-231102
Kubernetes Version: v1.18.3

How reproducible:

Steps to Reproduce:
1. stable cluster
2. $ oc get clusteroperators/machine-api -w -o yaml

Actual results:
The clusteroperator keeps updating back and forth with dummy info.

Expected results:
Operator doesn't cause updates to the resource in stable state.

Additional info:

Comment 1 Joel Speed 2020-07-21 10:54:51 UTC
This flickering of the status is currently by design. We currently update the status at the beginning of each reconcile whether there has been a drift in the configuration or not.
It is not particularly easy for us to determine if a resource has been modified due to the way that the operator applies updates to the resource.
We deemed it safer to have the status suggest something might change and then nothing to change, rather than the status saying everything is in sync and then the controller update the resources it manages because of a manual change somewhere.

I will investigate how we could determine earlier if a change will need to be made or not and see if we can short circuit before updating the status.

Comment 2 Joel Speed 2020-07-29 16:32:43 UTC
Going to need to spend some time looking into this, will do so next sprint

Comment 3 Joel Speed 2020-08-18 10:47:42 UTC
Didn't get a chance to look into this during this sprint as higher priority work took precedence, will try to look into this next sprint

Comment 6 Milind Yadav 2020-09-14 05:18:15 UTC
[miyadav@miyadav ~]$ oc project openshift-machine-api
Now using project "openshift-machine-api" on server "https://api.o46-miyadav-0914.qe.devcluster.openshift.com:6443".
[miyadav@miyadav ~]$ oc get clusteroperator/machine-api -w
NAME          VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
machine-api   4.6.0-0.nightly-2020-09-12-230035   True        False         False      41m


Status didnt change , Moving to VERIFIED

Note You need to log in before you can comment on or make changes to this bug.