Description of problem: With a MachineHealthCheck created for the workers machineset, if we delete a worker machine on openstack (openstack server delete <<UUID>>), it is not recreated. $ oc get nodes,machines,machineset,MachineHealthCheck -A NAME STATUS ROLES AGE VERSION node/mrnd-tst-6dlcb-master-0 Ready master 43m v1.18.3+a637491 node/mrnd-tst-6dlcb-master-1 Ready master 43m v1.18.3+a637491 node/mrnd-tst-6dlcb-master-2 Ready master 43m v1.18.3+a637491 node/mrnd-tst-6dlcb-worker-9wnc2 Ready worker 30m v1.18.3+a637491 node/mrnd-tst-6dlcb-worker-bgqgq NotReady worker 27m v1.18.3+a637491 NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api machine.machine.openshift.io/mrnd-tst-6dlcb-master-0 Running ci.m1.xlarge regionOne nova 43m openshift-machine-api machine.machine.openshift.io/mrnd-tst-6dlcb-master-1 Running ci.m1.xlarge regionOne nova 43m openshift-machine-api machine.machine.openshift.io/mrnd-tst-6dlcb-master-2 Running ci.m1.xlarge regionOne nova 43m openshift-machine-api machine.machine.openshift.io/mrnd-tst-6dlcb-worker-9wnc2 Running ci.m1.xlarge regionOne nova 36m openshift-machine-api machine.machine.openshift.io/mrnd-tst-6dlcb-worker-bgqgq Failed ci.m1.xlarge regionOne nova 36m NAMESPACE NAME DESIRED CURRENT READY AVAILABLE AGE openshift-machine-api machineset.machine.openshift.io/mrnd-tst-6dlcb-worker 2 2 1 1 43m NAMESPACE NAME MAXUNHEALTHY EXPECTEDMACHINES CURRENTHEALTHY openshift-machine-api machinehealthcheck.machine.openshift.io/openstack-health-check 40% 2 1 Related log from machine-api-controllers: I0603 15:26:45.455332 1 controller.go:165] mrnd-tst-6dlcb-worker-bgqgq: reconciling Machine I0603 15:26:45.497169 1 utils.go:99] Cloud provider CA cert not provided, using system trust bundle I0603 15:26:50.751532 1 controller.go:420] mrnd-tst-6dlcb-worker-bgqgq: going into phase "Failed" I0603 15:26:50.771680 1 controller.go:165] mrnd-tst-6dlcb-worker-bgqgq: reconciling Machine W0603 15:26:50.771710 1 controller.go:262] mrnd-tst-6dlcb-worker-bgqgq: machine has gone "Failed" phase. It won't reconcile Version-Release number of selected component (if applicable): 4.5.0-0.nightly-2020-06-03-105031 How reproducible: Steps to Reproduce: 1.Install cluster IPI on OSP 2.Create a MachineHealthCheck for workers machineset 3.Destroy worker instance on openstack Actual results: Worker is not recreated Expected results: Worker is recreated by the machineset Additional info:
The fix has been merged, so I move this bug to ON_QA: https://github.com/openshift/cluster-api-provider-openstack/pull/101
Verified on 4.6.0-0.nightly-2020-06-20-011219
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196