Origin recently began watching ClusterOperator conditions for surprising behavior [1]. That's turned up things like [2,3]: [bz-Networking] clusteroperator/network should not change condition/Progressing Run #0: Failed 0s 2 unexpected clusteroperator state transitions during e2e test run network was Progressing=false, but became Progressing=true at 2021-03-16 18:58:24.146588772 +0000 UTC -- DaemonSet "openshift-sdn/ovs" update is rolling out (6 out of 7 updated) network was Progressing=true, but became Progressing=false at 2021-03-16 19:01:38.792711425 +0000 UTC -- Per the API docs, however, Progressing is for: Progressing indicates that the operator is actively rolling out new code, propagating config changes, or otherwise moving from one steady state to another. Operators should not report progressing when they are reconciling a previously known state. That makes "my operand DaemonSet is not completely reconciled right now" a bit complicated, because you need to remember if it is the first attempt at reconciling the current configuration or a later attempt at reconciling the current configuration. In this case, the 18:58 disruption seems to have been a new node coming up: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-serial-4.8/1371878957158240256/artifacts/e2e-aws-serial/e2e.log | grep 18:58: | head -n2 Mar 16 18:58:23.542 I node/ip-10-0-138-230.us-west-2.compute.internal reason/Starting Starting kubelet. Mar 16 18:58:23.870 I node/ip-10-0-138-230.us-west-2.compute.internal reason/NodeHasSufficientPID Node ip-10-0-138-230.us-west-2.compute.internal status is now: NodeHasSufficientPID From the MachineSet scaling test: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-serial-4.8/1371878957158240256/artifacts/e2e-aws-serial/e2e.log | grep 'e2e-test/\|Starting kubelet' | grep -1 'Starting kubelet' Mar 16 18:54:17.923 I e2e-test/"[sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Suite:openshift/conformance/serial]" started Mar 16 18:58:23.542 I node/ip-10-0-138-230.us-west-2.compute.internal reason/Starting Starting kubelet. Mar 16 18:58:27.023 I node/ip-10-0-238-216.us-west-2.compute.internal reason/Starting Starting kubelet. Mar 16 19:00:50.519 I e2e-test/"[sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Suite:openshift/conformance/serial]" finishedStatus/Passed One possibility for distinguishing between "I just bumped the DaemonSet" (ideally Progressing=True) and "it's reacting to the cluster shifting under it (ideally Progressing=False) would be storing the version string (and possibly status.observedGeneration, to account for config changes) in the ClusterOperator's status.versions [5], with names keyed by operand. So moving from the current: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-serial-4.8/1371878957158240256/artifacts/e2e-aws-serial/clusteroperators.json | jq '.items[] | select(.metadata.name == "network").status.versions' [ { "name": "operator", "version": "4.8.0-0.nightly-2021-03-16-173612" } ] To something like: [ { "name": "operator", "version": "4.8.0-0.nightly-2021-03-16-173612" }, { "name": "ovs", "version": "4.8.0-0.nightly-2021-03-16-173612 generation 1" }, ...other operands... ] [1]: https://github.com/openshift/origin/pull/25918#event-4423357757 [2]: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-blocking#release-openshift-ocp-installer-e2e-aws-serial-4.8 [3]: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-serial-4.8/1371878957158240256 [4]: https://github.com/openshift/api/blob/8356aa4d4afb94790d3ad58c4debe0e1bdabcbe9/config/v1/types_cluster_operator.go#L147-L151 [5]: https://github.com/openshift/api/blob/8356aa4d4afb94790d3ad58c4debe0e1bdabcbe9/config/v1/types_cluster_operator.go#L43-L47
Clayton points out that you should also be able to compare your current, leveled 'operator' version with your desired version to decide if you are updating. And then... do something to see if you bumped the DaemonSet due to a config change. If there are no operator-config knobs that feed into the DaemonSet config, then great :). If there are, you could always record something about the most-recently-leveled config generation(s) or hashes or whatever under ClusterOperator's status.versions. I dunno; gets a bit fiddly.
Seems to be NOTABUG since API documentation has changed https://github.com/openshift/api/pull/935 and Progressing=True now supposed normal when a new node is added Also, see similar bug for Storage https://bugzilla.redhat.com/show_bug.cgi?id=1940286 (resolved as NOTABUG)
Based on the previous comment, close this as NOTABUG.