Bug 1988992

Summary: Worker machine object updated too many times [Azure]
Product: OpenShift Container Platform Reporter: Jan Chaloupka <jchaloup>
Component: Cloud ComputeAssignee: Mike Fedosin <mfedosin>
Cloud Compute sub component: Other Providers QA Contact: sunzhaohua <zhsun>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: lakshmi.ravichandran1, mfedosin, suryadav
Version: 4.9Keywords: Reopened
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-29 12:26:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Chaloupka 2021-08-02 08:50:13 UTC
The recent CI runs of https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic-ci-openshift-release-master-ci-4.9-e2e-azure-serial fail with (https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-azure-serial/1421911764836028416):

```
1 events happened too frequently

event happened 21 times, something is wrong: ns/openshift-machine-api machine/ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l - reason/Updated Updated machine "ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l"
```

The failures originates from "[sig-arch] events should not repeat pathologically" test introduced in https://github.com/openshift/origin/pull/26323. The test fails if the same event is repeated too many times in a row (at least 20 occurrences).

It might be a valid failure. Or, at least motivation to decrease the number of "Updated machine" events.

Checking machine controller logs from https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-azure-serial/1421911764836028416/artifacts/e2e-azure-serial/gather-extra/artifacts/pods/openshift-machine-api_machine-api-controllers-d8cbc5f4d-m7jhp_machine-controller.log:
```
I0801 20:10:35.665776       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"49186"} "reason"="Updated"
I0801 20:10:40.718697       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"49199"} "reason"="Updated"
I0801 20:10:42.773245       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"49236"} "reason"="Updated"
I0801 20:10:55.379793       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"49236"} "reason"="Updated"
I0801 20:11:25.684447       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"49236"} "reason"="Updated"
I0801 20:11:55.961125       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"49236"} "reason"="Updated"
I0801 20:12:26.296860       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"49236"} "reason"="Updated"
I0801 20:12:26.611072       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"49967"} "reason"="Updated"
I0801 20:12:56.635267       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"49967"} "reason"="Updated"
I0801 20:13:26.959965       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"49967"} "reason"="Updated"
I0801 20:13:57.334110       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"49967"} "reason"="Updated"
I0801 20:14:27.681824       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"49967"} "reason"="Updated"
I0801 20:14:58.058939       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"49967"} "reason"="Updated"
I0801 20:15:22.036123       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"51350"} "reason"="Updated"
I0801 20:15:22.355123       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"51409"} "reason"="Updated"
I0801 20:15:28.349501       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"51409"} "reason"="Updated"
I0801 20:16:12.400193       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"52207"} "reason"="Updated"
I0801 20:17:33.264669       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"52207"} "reason"="Updated"
I0801 20:27:02.482935       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"52207"} "reason"="Updated"
I0801 20:37:23.248666       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"52207"} "reason"="Updated"
I0801 20:47:46.701092       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-c8k19g5q-52e47-6kzkj-worker-centralus1-xwg7l","uid":"1413a2eb-c904-4abe-97e8-1c806bb38240","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"52207"} "reason"="Updated"
```

and the corresponding timestamps:
```
20:10:35.665776
20:10:40.718697
20:10:42.773245
20:10:55.379793
20:11:25.684447
20:11:55.961125
20:12:26.296860
20:12:26.611072
20:12:56.635267
20:13:26.959965
20:13:57.334110
20:14:27.681824
20:14:58.058939
20:15:22.036123
20:15:22.355123
20:15:28.349501
20:16:12.400193
20:17:33.264669
20:27:02.482935
20:37:23.248666
20:47:46.701092
```

Please let us know if the amount of machine object updates is acceptable (and thus we need to update the test to ignore the events for the machine controller or increase the threshold) or if the amount can be reduced.

Comment 2 sunzhaohua 2021-08-11 06:49:52 UTC
Verified
clusterversion: 4.9.0-0.nightly-2021-08-07-175228

$ oc logs machine-api-controllers-86cbb699b6-cqqs7 -c machine-controller | grep "Updated machine"
$  

After delete a machine, logs show "Updated machine"
$  oc delete machine zhsun-811-px5nm-worker-centralus1-jwdww
machine.machine.openshift.io "zhsun-811-px5nm-worker-centralus1-jwdww" deleted
$  oc logs machine-api-controllers-86cbb699b6-cqqs7 -c machine-controller | grep "Updated machine"
I0811 06:35:41.473296       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"zhsun-811-px5nm-worker-centralus1-6wc42\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsun-811-px5nm-worker-centralus1-6wc42","uid":"336d9822-243d-48b6-9453-c8f1525478b6","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"98150"} "reason"="Updated"
I0811 06:36:42.145008       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Updated machine \"zhsun-811-px5nm-worker-centralus1-6wc42\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsun-811-px5nm-worker-centralus1-6wc42","uid":"336d9822-243d-48b6-9453-c8f1525478b6","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"98280"} "reason"="Updated"

Comment 5 errata-xmlrpc 2021-10-18 17:43:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Comment 6 Surender Yadav 2021-10-25 15:41:13 UTC
We can see similar failure with recent CI upgrade runs of OCP 4.9 to 4.10 and 4.8 to 4.9 for s390x. 


: [sig-arch] events should not repeat pathologically expand_less	0s
2 events happened too frequently

event happened 34 times, something is wrong: ns/openshift-machine-api machine/libvirt-s390x-0-0-708-wltbm-worker-0-ck7tg - reason/Updated Updated Machine libvirt-s390x-0-0-708-wltbm-worker-0-ck7tg
event happened 34 times, something is wrong: ns/openshift-machine-api machine/libvirt-s390x-0-0-708-wltbm-worker-0-trzd6 - reason/Updated Updated Machine libvirt-s390x-0-0-708-wltbm-worker-0-trzd6

1) 4.9 to 4.10 upgrade --> https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.10-upgrade-from-nightly-4.9-ocp-remote-libvirt-s390x/1452575786342027264

2) 4.8 to 4.9 upgrade --> https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-upgrade-from-nightly-4.8-ocp-remote-libvirt-s390x/1452288892060307456

Search results --> https://search.ci.openshift.org/?search=reason%2FUpdated+Updated+Machine&maxAge=48h&context=1&type=bug%2Bjunit&name=s390x&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 7 Mike Fedosin 2021-10-29 12:26:40 UTC
Folks, I think the issue you see is not related to Azure platform. The bug for Azure has been fixed in cluster-api-provider-azure in 4.9 by this commit: https://github.com/openshift/cluster-api-provider-azure/pull/227

Since you see a similar issue for libvirt, I recommend to create another bz and specify Libvirt platform in the description, so it can be fixed in https://github.com/openshift/cluster-api-provider-libvirt