1855839 – machine-api-operator keeps updating clusteroperator/machine-api in stable state

Bug 1855839 - machine-api-operator keeps updating clusteroperator/machine-api in stable state

Summary: machine-api-operator keeps updating clusteroperator/machine-api in stable state

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Joel Speed
QA Contact:	Milind Yadav
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-07-10 16:24 UTC by Tomáš Nožička
Modified:	2020-10-27 16:14 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: The status was set to progressing on every reconcile even when no updates were due to be rolled out. Consequence: Status flicked consistently between two states Fix: Only update the status when changes are due to be rolled out. Result: Status is now stable
Clone Of:
Environment:
Last Closed:	2020-10-27 16:13:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-api-operator pull 695	0	None	closed	Bug 1855839: Ensure Progressing condition is stable when no updates to apply	2020-10-27 00:24:12 UTC
Red Hat Product Errata	RHBA-2020:4196	0	None	None	None	2020-10-27 16:14:21 UTC

Description Tomáš Nožička 2020-07-10 16:24:16 UTC

Description of problem:

machine-config-operator keeps updating clusteroperator/machine-api in stable state it keeps oscillating between

  - lastTransitionTime: "2020-07-09T11:03:20Z"
    status: "False"
    type: Progressing

and

  - lastTransitionTime: "2020-07-09T11:03:20Z"
    message: 'Running resync for operator: 4.5.0-0.ci-2020-07-08-231102'
    reason: SyncingResources
    status: "False"
    type: Progressing

in a loop.

The operator state isn't changing so this looks like accidental update that we shouldn't need.


Version-Release number of selected component (if applicable):
Client Version: v4.2.0-alpha.0-670-g9060d2f
Server Version: 4.5.0-0.ci-2020-07-08-231102
Kubernetes Version: v1.18.3


How reproducible:
always


Steps to Reproduce:
1. stable cluster
2. $ oc get clusteroperators/machine-api -w -o yaml
3.

Actual results:
The clusteroperator keeps updating back and forth with dummy info.


Expected results:
Operator doesn't cause updates to the resource in stable state.


Additional info:

Comment 1 Joel Speed 2020-07-21 10:54:51 UTC

This flickering of the status is currently by design. We currently update the status at the beginning of each reconcile whether there has been a drift in the configuration or not.
It is not particularly easy for us to determine if a resource has been modified due to the way that the operator applies updates to the resource.
We deemed it safer to have the status suggest something might change and then nothing to change, rather than the status saying everything is in sync and then the controller update the resources it manages because of a manual change somewhere.

I will investigate how we could determine earlier if a change will need to be made or not and see if we can short circuit before updating the status.

Comment 2 Joel Speed 2020-07-29 16:32:43 UTC

Going to need to spend some time looking into this, will do so next sprint

Comment 3 Joel Speed 2020-08-18 10:47:42 UTC

Didn't get a chance to look into this during this sprint as higher priority work took precedence, will try to look into this next sprint

Comment 6 Milind Yadav 2020-09-14 05:18:15 UTC

[miyadav@miyadav ~]$ oc project openshift-machine-api
Now using project "openshift-machine-api" on server "https://api.o46-miyadav-0914.qe.devcluster.openshift.com:6443".
[miyadav@miyadav ~]$ oc get clusteroperator/machine-api -w
NAME          VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
machine-api   4.6.0-0.nightly-2020-09-12-230035   True        False         False      41m

..
..
..


Status didnt change , Moving to VERIFIED

Comment 8 errata-xmlrpc 2020-10-27 16:13:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.