Description of problem: On deleting a machine, machine and node are deleted successfully, but machine controller logged an error: ``` Failed to remove finalizer from machine "jhou-49x9c-worker-1-8j84h": Operation cannot be fulfilled on machines.machine.openshift.io "jhou-49x9c-worker-1-8j84h": StorageError: invalid object, Code: 4, Key: /kubernetes.io/machine.openshift.io/machines/openshift-machine-api/jhou-49x9c-worker-1-8j84h, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 361ad86f-54c0-4ad2-a2bf-336b0bed55f3, UID in object meta: ``` Version-Release number of selected component (if applicable): 4.5.0-0.nightly-2020-04-21-103613 How reproducible: Always Steps to Reproduce: 1. Delete a machine, or scale down a machinset 2. Monitor machine controller log Actual results: Machine and node are successfully deleted, but there is an error in machine controller: ``` I0424 08:23:34.499466 1 controller.go:428] Machine "jhou-49x9c-worker-1-8j84h" going into phase "Deleting" I0424 08:23:34.517717 1 controller.go:227] Reconciling machine "jhou-49x9c-worker-1-8j84h" triggers delete I0424 08:23:34.542105 1 info.go:20] cordoned node "jhou-49x9c-worker-1-8j84h" I0424 08:23:34.668164 1 info.go:16] ignoring DaemonSet-managed pods: tuned-sdj49, dns-default-cfsb7, node-ca-kpxhv, machine-config-daemon-tq79j, node-exporter-hd552, multus-2g82b, ovs-7cz2l, sdn-ql2wv I0424 08:23:34.668196 1 info.go:20] drained node "jhou-49x9c-worker-1-8j84h" I0424 08:23:34.668212 1 controller.go:368] drain successful for machine "jhou-49x9c-worker-1-8j84h" I0424 08:23:34.680932 1 machineservice.go:230] Cloud provider CA cert not provided, using system trust bundle I0424 08:23:34.804892 1 machineservice.go:230] Cloud provider CA cert not provided, using system trust bundle I0424 08:23:50.088208 1 controller.go:254] Deleting node "jhou-49x9c-worker-1-8j84h" for machine "jhou-49x9c-worker-1-8j84h" I0424 08:23:50.144537 1 controller.go:268] Machine "jhou-49x9c-worker-1-8j84h" deletion successful I0424 08:23:50.144622 1 controller.go:164] Reconciling Machine "jhou-49x9c-worker-1-8j84h" I0424 08:23:50.144631 1 controller.go:376] Machine "jhou-49x9c-worker-1-8j84h" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0424 08:23:50.144641 1 controller.go:227] Reconciling machine "jhou-49x9c-worker-1-8j84h" triggers delete I0424 08:23:50.179136 1 controller.go:343] Could not find node from noderef, it may have already been deleted: jhou-49x9c-worker-1-8j84h I0424 08:23:50.191768 1 machineservice.go:230] Cloud provider CA cert not provided, using system trust bundle I0424 08:23:55.298947 1 machineservice.go:230] Cloud provider CA cert not provided, using system trust bundle I0424 08:24:00.558353 1 actuator.go:314] Skipped deleting jhou-49x9c-worker-1-8j84h that is already deleted. I0424 08:24:00.558514 1 controller.go:254] Deleting node "jhou-49x9c-worker-1-8j84h" for machine "jhou-49x9c-worker-1-8j84h" I0424 08:24:00.558576 1 controller.go:397] Node "jhou-49x9c-worker-1-8j84h" not found E0424 08:24:00.686310 1 controller.go:264] Failed to remove finalizer from machine "jhou-49x9c-worker-1-8j84h": Operation cannot be fulfilled on machines.machine.openshift.io "jhou-49x9c-worker-1-8j84h": StorageError: invalid object, Code: 4, Key: /kubernetes.io/machine.openshift.io/machines/openshift-machine-api/jhou-49x9c-worker-1-8j84h, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 361ad86f-54c0-4ad2-a2bf-336b0bed55f3, UID in object meta: ``` Expected results: Should not expect error in the log Additional info:
@Jianwei Could you confirm for me whether or not the Machine doesn't actually get deleted? It looks to me like the cache is going stale and the controller is reconciling a second time after already removing the finalizer, so it gets a conflict when it tries to update the finalizer for a second time. If that's the case, I'm not really sure what we can do in that case apart from either not using a cached client or detecting this error in particular and ignoring it/not logging it
Hi Joel, as I checked from the openshift-machine-api namespace, they were deleted, but I didn't check whether they were deleted from the cloud provider side(openstack console). I'm trying to reproduce this today, sorry but I can not reproduce it until now. Lowering the priority and if we can't reproduce this in next few days, we can close this one.
Closing as per https://bugzilla.redhat.com/show_bug.cgi?id=1827589#c2 Please reopen if still relevant