Bug 1855839 - machine-api-operator keeps updating clusteroperator/machine-api in stable state
Summary: machine-api-operator keeps updating clusteroperator/machine-api in stable state
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 4.6.0
Assignee: Joel Speed
QA Contact: Milind Yadav
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-10 16:24 UTC by Tomáš Nožička
Modified: 2020-10-27 16:14 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The status was set to progressing on every reconcile even when no updates were due to be rolled out. Consequence: Status flicked consistently between two states Fix: Only update the status when changes are due to be rolled out. Result: Status is now stable
Clone Of:
Environment:
Last Closed: 2020-10-27 16:13:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-api-operator pull 695 0 None closed Bug 1855839: Ensure Progressing condition is stable when no updates to apply 2020-10-27 00:24:12 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:14:21 UTC

Description Tomáš Nožička 2020-07-10 16:24:16 UTC
Description of problem:

machine-config-operator keeps updating clusteroperator/machine-api in stable state it keeps oscillating between

  - lastTransitionTime: "2020-07-09T11:03:20Z"
    status: "False"
    type: Progressing

and

  - lastTransitionTime: "2020-07-09T11:03:20Z"
    message: 'Running resync for operator: 4.5.0-0.ci-2020-07-08-231102'
    reason: SyncingResources
    status: "False"
    type: Progressing

in a loop.

The operator state isn't changing so this looks like accidental update that we shouldn't need.


Version-Release number of selected component (if applicable):
Client Version: v4.2.0-alpha.0-670-g9060d2f
Server Version: 4.5.0-0.ci-2020-07-08-231102
Kubernetes Version: v1.18.3


How reproducible:
always


Steps to Reproduce:
1. stable cluster
2. $ oc get clusteroperators/machine-api -w -o yaml
3.

Actual results:
The clusteroperator keeps updating back and forth with dummy info.


Expected results:
Operator doesn't cause updates to the resource in stable state.


Additional info:

Comment 1 Joel Speed 2020-07-21 10:54:51 UTC
This flickering of the status is currently by design. We currently update the status at the beginning of each reconcile whether there has been a drift in the configuration or not.
It is not particularly easy for us to determine if a resource has been modified due to the way that the operator applies updates to the resource.
We deemed it safer to have the status suggest something might change and then nothing to change, rather than the status saying everything is in sync and then the controller update the resources it manages because of a manual change somewhere.

I will investigate how we could determine earlier if a change will need to be made or not and see if we can short circuit before updating the status.

Comment 2 Joel Speed 2020-07-29 16:32:43 UTC
Going to need to spend some time looking into this, will do so next sprint

Comment 3 Joel Speed 2020-08-18 10:47:42 UTC
Didn't get a chance to look into this during this sprint as higher priority work took precedence, will try to look into this next sprint

Comment 6 Milind Yadav 2020-09-14 05:18:15 UTC
[miyadav@miyadav ~]$ oc project openshift-machine-api
Now using project "openshift-machine-api" on server "https://api.o46-miyadav-0914.qe.devcluster.openshift.com:6443".
[miyadav@miyadav ~]$ oc get clusteroperator/machine-api -w
NAME          VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
machine-api   4.6.0-0.nightly-2020-09-12-230035   True        False         False      41m

..
..
..


Status didnt change , Moving to VERIFIED

Comment 8 errata-xmlrpc 2020-10-27 16:13:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.