Description of problem: Since MHC has no control over how the machine is remediated, it would be better not to imply that it will (only) be via reboot. Update the annotation, variables, functions, and logging as appropriate Version-Release number of selected component (if applicable): 4.3
For Testing the steps are : create a mhc -> annotate stratergy -> stop instance from Provider console -> Monitor mhc Expected : if annotated with machine.openshift.io/remediation-strategy=external-baremetal it will not be deleted and remediated by the healthcheck controller. So needed more info on , if the above steps suffice ?
-- Expecting the below steps to cover the testing for the change -- version : NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-03-08-213224 True False 6h45m Cluster version is 4.4.0-0.nightly-2020-03-08-213224 Steps : 1.Create mhc use below yaml : --- apiVersion: machine.openshift.io/v1beta1 kind: MachineHealthCheck metadata: name: mh1 namespace: openshift-machine-api spec: maxUnhealthy: 3 selector: matchLabels: machine.openshift.io/cluster-api-cluster: <Your cluster> machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: <Your machineset> unhealthyConditions: - status: "False" timeout: 300s type: Ready - status: Unknown timeout: 300s type: Ready 2.Annotate mhc : oc annotate mhc <mhc name> healthchecking.openshift.io/strategy=machine.openshift.io/remediation-strategy=external-baremetal 3.Terminate the machine of the machineset being monitored by mhc using the IAAS console (AWS in this) Actual : Machine remediation did not happen and it stays in Failed state Expected : No remediation should take place
(In reply to Milind Yadav from comment #4) > -- Expecting the below steps to cover the testing for the change -- > > version : > NAME VERSION AVAILABLE PROGRESSING > SINCE STATUS > version 4.4.0-0.nightly-2020-03-08-213224 True False > 6h45m Cluster version is 4.4.0-0.nightly-2020-03-08-213224 > > > Steps : > > 1.Create mhc use below yaml : > --- > apiVersion: machine.openshift.io/v1beta1 > kind: MachineHealthCheck > metadata: > name: mh1 > namespace: openshift-machine-api > spec: > maxUnhealthy: 3 > selector: > matchLabels: > machine.openshift.io/cluster-api-cluster: <Your cluster> > machine.openshift.io/cluster-api-machine-role: worker > machine.openshift.io/cluster-api-machine-type: worker > machine.openshift.io/cluster-api-machineset: <Your machineset> > unhealthyConditions: > - > status: "False" > timeout: 300s > type: Ready > - > status: Unknown > timeout: 300s > type: Ready > > 2.Annotate mhc : > oc annotate mhc <mhc name> > healthchecking.openshift.io/strategy=machine.openshift.io/remediation- > strategy=external-baremetal > This looks wrong. I think you want: oc annotate mhc <mhc name> machine.openshift.io/remediation-strategy=external-baremetal > 3.Terminate the machine of the machineset being monitored by mhc using the > IAAS console (AWS in this) > > Actual : Machine remediation did not happen and it stays in Failed state > Expected : No remediation should take place Was the 'host.metal3.io/external-remediation' annotation added to the machine associated with the failed node?
I cannot check annotation at the node as , node died after the Instance that was containing it got terminated . Do you mean the annotation 'host.meta3.io/external-remediation' was added or not on the machine that is showing failed status ? Then , no , it wasnt , the annotation was annotations: machine.openshift.io/instance-state: running
(In reply to Milind Yadav from comment #6) > I cannot check annotation at the node as , node died after the Instance > that was containing it got terminated . It should be on the Machine, not the node. If the Node got deleted, then you've tested the default remediation strategy (deletion) not the baremetal one. > > Do you mean the annotation 'host.meta3.io/external-remediation' was added or > not on the machine that is showing failed status ? > > Then , no , it wasnt , the annotation was > > annotations: > machine.openshift.io/instance-state: running I would recommend retesting with 'oc annotate mhc <mhc name> machine.openshift.io/remediation-strategy=external-baremetal'
@Andrew , I think this is what you expected and is correct , I will update the annotation value as you suggested , Thanks , the case still is VERIFIED In the validation steps updated : 2.Annotate mhc : > oc annotate mhc <mhc name> > healthchecking.openshift.io/strategy=machine.openshift.io/remediation- > strategy=external-baremetal to 'oc annotate mhc <mhc name> machine.openshift.io/remediation-strategy=external-baremetal' [miyadav@miyadav bug1800425]$ oc describe machine aiyengar-1103-6nfzf-worker-us-east-2c-q8p6j Name: aiyengar-1103-6nfzf-worker-us-east-2c-q8p6j Namespace: openshift-machine-api Labels: machine.openshift.io/cluster-api-cluster=aiyengar-1103-6nfzf machine.openshift.io/cluster-api-machine-role=worker machine.openshift.io/cluster-api-machine-type=worker machine.openshift.io/cluster-api-machineset=aiyengar-1103-6nfzf-worker-us-east-2c machine.openshift.io/instance-type=m4.large machine.openshift.io/region=us-east-2 machine.openshift.io/zone=us-east-2c Annotations: host.metal3.io/external-remediation: machine.openshift.io/instance-state: running