Description of problem: I have a 4.5.6 OCP cluster deployed with 3 masters and 5 workers. If I scale down the default worker machineset: # oc scale machineset/ostest-worker-0 --replicas=4 -n openshift-machine-api ...I see that an associated BMH is deprovisioned: # oc get bmh -A openshift-machine-api worker-4 OK ready ipmi://10.0.1.2:6254 libvirt false ...but the associated node remains stuck like so: # oc get nodes ostest-worker-4 NotReady,SchedulingDisabled worker 2d17h v1.18.3+002a51f Version-Release number of selected component (if applicable): 4.5.6 How reproducible: 100% Steps to Reproduce: 1. Deploy a 4.5.6 OCP cluster 2. Scale down the default worker machineset 3. See that the node(s) removed are not actually deleted Actual results: Deprovisioned node is not deleted Expected results: Deprovisioned node is deleted Additional info:
This looks just like https://bugzilla.redhat.com/show_bug.cgi?id=1869318, we probably need to get it backported to 4.5. @Zane, could you have a look at this? The 4.6 PR didn't cleanly apply, so we can probably use this BZ to backport it for you (if these are indeed the same problem).
It *looks* like the same bug, but the patch that added a finalizer isn't present in the release-4.5 branch. Can you do "oc edit node ostest-worker-4" and confirm whether there is a DeletionTimestamp and what finalizers, if any, are present? (Note that this information is hidden in "oc describe", so don't bother looking there.)
Also check the status of the Machine - the most likely cause is bug 1863010, which was fixed last week. If that's the case you will likely see that the Machine still exists and is in the Deleting phase, and that's why the Node has not been deleted.
I don't see any DeletionTimestamp not finalizers on the node itself (I reproduced using a different node): # oc get node/ostest-worker-2 -o yaml | grep -i delet # oc get node/ostest-worker-2 -o yaml | grep -i final I used "oc edit" as well and scanned through it manually. It does, however, appear that the associated machine is stuck in the deleting state: # oc get machines -A NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api ostest-worker-0-bhdbk Deleting 3d
That's completely consistent with bug 1863010. I'll close as a duplicate, but feel free to reopen if you reproduce this in a build that has the fix. *** This bug has been marked as a duplicate of bug 1863010 ***