Description of problem: The default mhc machine-api-termination-handler couldn't watch spot instance, if we create some spot instances, the mhc total torgets is 0. Version-Release number of selected component (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-12-09-112139 True False 3d1h Cluster version is 4.7.0-0.nightly-2020-12-09-112139 How reproducible: always Steps to Reproduce: 1. Create a spot instance with "preemptible: true" 2. Check mhc machine-api-termination-handler 3. Actual results: The default mhc machine-api-termination-handler total targets is 0. $ oc get ds NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE machine-api-termination-handler 1 1 1 1 1 machine.openshift.io/interruptible-instance= 3d3h $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsungcp11-bjhl5-master-0 Running n1-standard-4 us-central1 us-central1-a 3d3h zhsungcp11-bjhl5-master-1 Running n1-standard-4 us-central1 us-central1-b 3d3h zhsungcp11-bjhl5-master-2 Running n1-standard-4 us-central1 us-central1-c 3d3h zhsungcp11-bjhl5-worker-a-5k5cd Running n1-standard-4 us-central1 us-central1-a 3d3h zhsungcp11-bjhl5-worker-b-vwv2r Running n1-standard-4 us-central1 us-central1-b 3d3h zhsungcp11-bjhl5-worker-c-54m7p Running n1-standard-4 us-central1 us-central1-c 163m $ oc get mhc NAME MAXUNHEALTHY EXPECTEDMACHINES CURRENTHEALTHY machine-api-termination-handler 100% 0 0 $ oc logs -f machine-api-controllers-6dddcb4fff-fjsxv -c machine-healthcheck-controller I1211 14:11:05.851755 1 machinehealthcheck_controller.go:153] Reconciling openshift-machine-api/machine-api-termination-handler I1211 14:11:05.851803 1 machinehealthcheck_controller.go:171] Reconciling openshift-machine-api/machine-api-termination-handler: finding targets I1211 14:11:05.851880 1 machinehealthcheck_controller.go:228] Remediations are allowed for openshift-machine-api/machine-api-termination-handler: total targets: 0, max unhealthy: 100%, unhealthy targets: 0 I1211 14:11:05.859989 1 machinehealthcheck_controller.go:263] Reconciling openshift-machine-api/machine-api-termination-handler: no more targets meet unhealthy criteria $ oc edit machine zhsungcp11-bjhl5-worker-c-54m7p apiVersion: machine.openshift.io/v1beta1 kind: Machine metadata: annotations: machine.openshift.io/instance-state: RUNNING creationTimestamp: "2020-12-14T02:56:43Z" finalizers: - machine.machine.openshift.io generateName: zhsungcp11-bjhl5-worker-c- generation: 2 labels: machine.openshift.io/cluster-api-cluster: zhsungcp11-bjhl5 machine.openshift.io/cluster-api-machine-role: worker machine.openshift.io/cluster-api-machine-type: worker machine.openshift.io/cluster-api-machineset: zhsungcp11-bjhl5-worker-c machine.openshift.io/instance-type: n1-standard-4 machine.openshift.io/region: us-central1 machine.openshift.io/zone: us-central1-c name: zhsungcp11-bjhl5-worker-c-54m7p ... spec: metadata: labels: machine.openshift.io/interruptible-instance: "" providerID: gce://openshift-qe/us-central1-c/zhsungcp11-bjhl5-worker-c-54m7p Expected results: The total target number is equal to the spot instance number. Additional info:
Verified on aws, gcp and azure clusterversion: 4.7.0-0.nightly-2021-01-12-203716 $ oc get machine zhsunaws113-9ksps-worker-us-east-2c-2gv7h --show-labels NAME PHASE TYPE REGION ZONE AGE LABELS zhsunaws113-9ksps-worker-us-east-2c-2gv7h Running m5.large us-east-2 us-east-2c 22m machine.openshift.io/cluster-api-cluster=zhsunaws113-9ksps,machine.openshift.io/cluster-api-machine-role=worker,machine.openshift.io/cluster-api-machine-type=worker,machine.openshift.io/cluster-api-machineset=zhsunaws113-9ksps-worker-us-east-2c,machine.openshift.io/instance-type=m5.large,machine.openshift.io/interruptible-instance=,machine.openshift.io/region=us-east-2,machine.openshift.io/zone=us-east-2c [szh@bogon aws]$ oc get mhc NAME MAXUNHEALTHY EXPECTEDMACHINES CURRENTHEALTHY machine-api-termination-handler 100% 1 1 $ oc get machine zhsungcp113-zgxg5-worker-c-928pf --show-labels NAME PHASE TYPE REGION ZONE AGE LABELS zhsungcp113-zgxg5-worker-c-928pf Running n1-standard-4 us-central1 us-central1-c 22m machine.openshift.io/cluster-api-cluster=zhsungcp113-zgxg5,machine.openshift.io/cluster-api-machine-role=worker,machine.openshift.io/cluster-api-machine-type=worker,machine.openshift.io/cluster-api-machineset=zhsungcp113-zgxg5-worker-c,machine.openshift.io/instance-type=n1-standard-4,machine.openshift.io/interruptible-instance=,machine.openshift.io/region=us-central1,machine.openshift.io/zone=us-central1-c [szh@bogon gcp]$ oc get mhc NAME MAXUNHEALTHY EXPECTEDMACHINES CURRENTHEALTHY machine-api-termination-handler 100% 1 1 $ oc get machine zhsunazure-tpj2g-worker-northcentralus1-2q8kv --show-labels NAME PHASE TYPE REGION ZONE AGE LABELS zhsunazure-tpj2g-worker-northcentralus1-2q8kv Running Standard_D2s_v3 northcentralus 9m35s machine.openshift.io/cluster-api-cluster=zhsunazure-tpj2g,machine.openshift.io/cluster-api-machine-role=worker,machine.openshift.io/cluster-api-machine-type=worker,machine.openshift.io/cluster-api-machineset=zhsunazure-tpj2g-worker-northcentralus1,machine.openshift.io/instance-type=Standard_D2s_v3,machine.openshift.io/interruptible-instance=,machine.openshift.io/region=northcentralus [szh@bogon azure]$ oc get mhc NAME MAXUNHEALTHY EXPECTEDMACHINES CURRENTHEALTHY machine-api-termination-handler 100% 1 1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633