Description of problem:Delete is triggered when MHC has "healthchecking.openshift.io/strategy: reboot" annotation [Azure Set up] Version-Release number of selected component (if applicable): NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-03-01-215047 True False 8h Cluster version is 4.4.0-0.nightly-2020-03-01-215047 Steps to Reproduce : 1.Create a mhc --- apiVersion: machine.openshift.io/v1beta1 kind: MachineHealthCheck metadata: creationTimestamp: "2020-02-14T09:47:08Z" generation: 1 name: "<User defined Name>" namespace: openshift-machine-api resourceVersion: "71059" selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machinehealthchecks/mhc-miyadav-1402-drlvf-worker-us-east-2c uid: ef74b735-e58e-4c24-aa69-015d90998b77 spec: maxUnhealthy: 3 selector: matchLabels: machine.openshift.io/cluster-api-cluster: "<Your Cluster Name>" machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: "<Your Machine Set>" unhealthyConditions: - status: "False" timeout: 300s type: Ready - status: Unknown timeout: 300s type: Ready Result:MHC created successfully 2.Annotate 'reboot' remediation strategy to the mhc oc annotate mhc NAME healthchecking.openshift.io/strategy=reboot Result : annotation done successfully 3.Go to cloud provider console, stop the instance of the node Result : instance stopped successfully 4.oc get machine <machine-name> -o=jsonpath="{.metadata.annotations}" Actual results:Getting map[machine.openshift.io/instance-state:Updating] Expected results:Should be reboot instead of Updating and machine should not get deleted Additional info: oc describe mhc mhc1 Name: mhc1 Namespace: openshift-machine-api Labels: <none> Annotations: healthchecking.openshift.io/strategy: reboot API Version: machine.openshift.io/v1beta1 Kind: MachineHealthCheck Metadata: Creation Timestamp: 2020-03-02T09:43:22Z Generation: 1 Resource Version: 159714 Self Link: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machinehealthchecks/mhc1 UID: 29ed6db5-bf78-4349-a8f1-536029b9a394 Spec: Max Unhealthy: 1 Selector: Match Labels: machine.openshift.io/cluster-api-cluster: zhsun-b6sbk machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: zhsun-b6sbk-worker-centralus2 Unhealthy Conditions: Status: False Timeout: 300s Type: Ready Status: Unknown Timeout: 300s Type: Ready Status: Current Healthy: 1 Expected Machines: 1 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning RemediationRestricted 9m4s (x26 over 15m) machinehealthcheck-controller Remediation restricted due to exceeded number of unhealthy machines (total: 2, unhealthy: 2, maxUnhealthy: 1) Logs : I0302 09:57:23.000322 1 machinehealthcheck_controller.go:268] Reconciling openshift-machine-api/mhc1/zhsun-b6sbk-worker-centralus2-6mm5n/zhsun-b6sbk-worker-centralus2-6mm5n: health checking I0302 09:57:24.657182 1 machinehealthcheck_controller.go:268] Reconciling openshift-machine-api/mhc1/zhsun-b6sbk-worker-centralus2-6mm5n/zhsun-b6sbk-worker-centralus2-6mm5n: health checking I0302 09:57:24.664357 1 machinehealthcheck_controller.go:268] Reconciling openshift-machine-api/mhc1/zhsun-b6sbk-worker-centralus2-6mm5n/zhsun-b6sbk-worker-centralus2-6mm5n: health checking I0302 09:57:34.675665 1 machinehealthcheck_controller.go:268] Reconciling openshift-machine-api/mhc1/zhsun-b6sbk-worker-centralus2-6mm5n/zhsun-b6sbk-worker-centralus2-6mm5n: health checking I0302 09:57:34.699337 1 machinehealthcheck_controller.go:204] Reconciling openshift-machine-api/mhc1/zhsun-b6sbk-worker-centralus2-6mm5n/zhsun-b6sbk-worker-centralus2-6mm5n: meet unhealthy criteria, triggers remediation I0302 09:57:34.699372 1 machinehealthcheck_controller.go:428] openshift-machine-api/mhc1/zhsun-b6sbk-worker-centralus2-6mm5n/zhsun-b6sbk-worker-centralus2-6mm5n: start remediation logic I0302 09:57:34.699383 1 machinehealthcheck_controller.go:452] openshift-machine-api/mhc1/zhsun-b6sbk-worker-centralus2-6mm5n/zhsun-b6sbk-worker-centralus2-6mm5n: deleting I0302 09:57:34.716910 1 machinehealthcheck_controller.go:268] Reconciling openshift-machine-api/mhc1/zhsun-b6sbk-worker-centralus2-6mm5n/zhsun-b6sbk-worker-centralus2-6mm5n: health checking I0302 09:57:34.736539 1 machinehealthcheck_controller.go:204] Reconciling openshift-machine-api/mhc1/zhsun-b6sbk-worker-centralus2-6mm5n/zhsun-b6sbk-worker-centralus2-6mm5n: meet unhealthy criteria, triggers remediation I0302 09:57:34.736545 1 machinehealthcheck_controller.go:428] openshift-machine-api/mhc1/zhsun-b6sbk-worker-centralus2-6mm5n/zhsun-b6sbk-worker-centralus2-6mm5n: start remediation logic I0302 09:57:34.736551 1 machinehealthcheck_controller.go:452] openshift-machine-api/mhc1/zhsun-b6sbk-worker-centralus2-6mm5n/zhsun-b6sbk-worker-centralus2-6mm5n: deleting I0302 09:57:38.963250 1 machinehealthcheck_controller.go:268] Reconciling openshift-machine-api/mhc1/zhsun-b6sbk-worker-centralus2-6mm5n/zhsun-b6sbk-worker-centralus2-6mm5n: health checking I0302 09:57:38.974295 1 machinehealthcheck_controller.go:204] Reconciling openshift-machine-api/mhc1/zhsun-b6sbk-worker-centralus2-6mm5n/zhsun-b6sbk-worker-centralus2-6mm5n: meet unhealthy criteria, triggers remediation I0302 09:57:38.974301 1 machinehealthcheck_controller.go:428] openshift-machine-api/mhc1/zhsun-b6sbk-worker-centralus2-6mm5n/zhsun-b6sbk-worker-centralus2-6mm5n: start remediation logic I0302 09:57:38.974309 1 machinehealthcheck_controller.go:452] openshift-machine-api/mhc1/zhsun-b6sbk-worker-centralus2-6mm5n/zhsun-b6sbk-worker-centralus2-6mm5n: deleting
Annotation was renamed to `host.metal3.io/external-remediation` https://github.com/openshift/machine-api-operator/pull/476/files#diff-614d58186947ca2e4e215d42c496d72eR31
Description of problem:Delete is triggered when MHC has "healthchecking.openshift.io/strategy: reboot" annotation [Azure Set up] Version-Release number of selected component (if applicable): NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-03-12-041748 True False 153m Cluster version is 4.5.0-0.nightly-2020-03-12-041748 Steps to Reproduce : 1.Create a mhc --- apiVersion: machine.openshift.io/v1beta1 kind: MachineHealthCheck metadata: creationTimestamp: "2020-02-14T09:47:08Z" generation: 1 name: "<User defined Name>" namespace: openshift-machine-api resourceVersion: "71059" selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machinehealthchecks/mhc-miyadav-1402-drlvf-worker-us-east-2c uid: ef74b735-e58e-4c24-aa69-015d90998b77 spec: maxUnhealthy: 3 selector: matchLabels: machine.openshift.io/cluster-api-cluster: "<Your Cluster Name>" machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: "<Your Machine Set>" unhealthyConditions: - status: "False" timeout: 300s type: Ready - status: Unknown timeout: 300s type: Ready Result:MHC created successfully 2.Annotate 'reboot' remediation strategy to the mhc oc annotate mhc NAME machine.openshift.io/remediation-strategy=external-baremetal Result : annotation done successfully 3.Go to cloud provider console, stop the instance of the node Result : instance stopped successfully 4.oc get machine <machine-name> -o jsonpath="{.metadata.annotations}" Actual results:map[host.metal3.io/external-remediation: machine.openshift.io/instance-state:Running Expected results:Remediation should trigger, but should not delete the machine. Instead, should add an annotation "host.metal3.io/external-remediation" to the machine
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409