Bug 2061833
Summary: | A single worker can be assigned to multiple baremetal hosts | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Stephen Benjamin <stbenjam> |
Component: | Bare Metal Hardware Provisioning | Assignee: | Steven Hardy <shardy> |
Bare Metal Hardware Provisioning sub component: | cluster-api-provider | QA Contact: | Eldar Weiss <eweiss> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | bfournie, dgoodwin, eweiss, zbitter |
Version: | 4.11 | Keywords: | Triaged |
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: |
job=periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6=all
|
|
Last Closed: | 2022-08-10 10:52:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Stephen Benjamin
2022-03-08 15:31:59 UTC
2022-03-07T22:28:10.310571787Z 2022/03/07 22:28:10 Creating machine ostest-rbt8c-worker-0-cpkx7 2022-03-07T22:28:10.310862634Z 2022/03/07 22:28:10 Host 'ostest-worker-0' matched hostSelector for Machine 'ostest-rbt8c-worker-0-cpkx7' 2022-03-07T22:28:10.310862634Z 2022/03/07 22:28:10 Host 'ostest-worker-1' matched hostSelector for Machine 'ostest-rbt8c-worker-0-cpkx7' 2022-03-07T22:28:10.310862634Z 2022/03/07 22:28:10 Host 'ostest-worker-2' matched hostSelector for Machine 'ostest-rbt8c-worker-0-cpkx7' 2022-03-07T22:28:10.310862634Z 2022/03/07 22:28:10 3 hosts available while choosing host for machine 'ostest-rbt8c-worker-0-cpkx7' 2022-03-07T22:28:10.310862634Z 2022/03/07 22:28:10 Associating machine ostest-rbt8c-worker-0-cpkx7 with host ostest-worker-2 2022-03-07T22:28:10.328955287Z 2022/03/07 22:28:10 setting annotation for ostest-rbt8c-worker-0-cpkx7 to metal3.io/BareMetalHost="openshift-machine-api/ostest-worker-2" 2022-03-07T22:28:10.328955287Z 2022/03/07 22:28:10 setting annotation for ostest-rbt8c-worker-0-cpkx7 to machine.openshift.io/instance-state="available" 2022-03-07T22:28:10.338793000Z W0307 22:28:10.338744 1 controller.go:366] ostest-rbt8c-worker-0-cpkx7: failed to create machine: requeue in: 0s 2022-03-07T22:28:10.338793000Z I0307 22:28:10.338772 1 controller.go:460] Actuator returned requeue-after error: requeue in: 0s 2022-03-07T22:28:10.697696323Z 2022/03/07 22:28:10 Creating machine ostest-rbt8c-worker-0-cpkx7 2022-03-07T22:28:10.698005837Z 2022/03/07 22:28:10 Host 'ostest-worker-2' matched hostSelector for Machine 'ostest-rbt8c-worker-0-cpkx7' 2022-03-07T22:28:10.698005837Z 2022/03/07 22:28:10 Host 'ostest-worker-0' matched hostSelector for Machine 'ostest-rbt8c-worker-0-cpkx7' 2022-03-07T22:28:10.698005837Z 2022/03/07 22:28:10 Host 'ostest-worker-1' matched hostSelector for Machine 'ostest-rbt8c-worker-0-cpkx7' 2022-03-07T22:28:10.698005837Z 2022/03/07 22:28:10 3 hosts available while choosing host for machine 'ostest-rbt8c-worker-0-cpkx7' 2022-03-07T22:28:10.698021319Z 2022/03/07 22:28:10 Associating machine ostest-rbt8c-worker-0-cpkx7 with host ostest-worker-1 2022-03-07T22:28:10.720600833Z 2022/03/07 22:28:10 setting annotation for ostest-rbt8c-worker-0-cpkx7 to metal3.io/BareMetalHost="openshift-machine-api/ostest-worker-1" 2022-03-07T22:28:10.720600833Z 2022/03/07 22:28:10 setting annotation for ostest-rbt8c-worker-0-cpkx7 to machine.openshift.io/instance-state="available" 2022-03-07T22:28:10.725309867Z W0307 22:28:10.725253 1 controller.go:366] ostest-rbt8c-worker-0-cpkx7: failed to create machine: failed to update machine annotation: Operation cannot be fulfilled on machines.machine.openshift.io "ostest-rbt8c-worker-0-cpkx7": the object has been modified; please apply your changes to the latest version and try again 2022-03-07T22:28:10.725339030Z E0307 22:28:10.725305 1 controller.go:304] controller-runtime/manager/controller/machine_controller "msg"="Reconciler error" "error"="failed to update machine annotation: Operation cannot be fulfilled on machines.machine.openshift.io \"ostest-rbt8c-worker-0-cpkx7\": the object has been modified; please apply your changes to the latest version and try again" "name"="ostest-rbt8c-worker-0-cpkx7" "namespace"="openshift-machine-api" It looks like we're requeuing before the cache has been updated with either the Consumer ref in the BMH ostest-worker-2 or the annotation on the Machine, and thus it is selecting another Machine and also claiming that one. We fail to update the annotation, but by then it is too late. It's not clear to me what could possibly have changed recently to cause this? The easiest fix is probably to add a delay at https://github.com/openshift/cluster-api-provider-baremetal/blob/master/pkg/cloud/baremetal/actuators/machine/actuator.go#L672 We don't actually want to requeue at all - we should just let the reconcile be triggered when the change to the annotations shows up in the cache - but we have to return RequeueAfterError{} to prevent the controller deciding we have completed the Create phase prematurely. PR sitting for a few weeks: https://github.com/openshift/cluster-api-provider-baremetal/pull/167 needs an approver. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |