Bug 1914837 - Machine API Termination Handlers should be tested
Summary: Machine API Termination Handlers should be tested
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.9.0
Assignee: Joel Speed
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On: 1907286
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-11 10:15 UTC by Joel Speed
Modified: 2023-08-22 23:33 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 1907286
Environment:
Last Closed: 2021-10-18 17:29:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-api-actuator-pkg pull 202 0 None closed Bug 1914837: Re-enable termination handler tests 2021-07-12 09:50:37 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:29:49 UTC

Description Joel Speed 2021-01-11 10:15:50 UTC
+++ This bug was initially created as a clone of Bug #1907286 +++

Description of problem:
The default mhc machine-api-termination-handler couldn't watch spot instance, if we create some spot instances, the mhc total torgets is 0.

Version-Release number of selected component (if applicable):
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-12-09-112139   True        False         3d1h    Cluster version is 4.7.0-0.nightly-2020-12-09-112139

How reproducible:
always

Steps to Reproduce:
1. Create a spot instance with "preemptible: true"
2. Check mhc machine-api-termination-handler
3.

Actual results:
The default mhc machine-api-termination-handler total targets is 0.

$ oc get ds
NAME                              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                  AGE
machine-api-termination-handler   1         1         1       1            1           machine.openshift.io/interruptible-instance=   3d3h

$ oc get machine
NAME                              PHASE     TYPE            REGION        ZONE            AGE
zhsungcp11-bjhl5-master-0         Running   n1-standard-4   us-central1   us-central1-a   3d3h
zhsungcp11-bjhl5-master-1         Running   n1-standard-4   us-central1   us-central1-b   3d3h
zhsungcp11-bjhl5-master-2         Running   n1-standard-4   us-central1   us-central1-c   3d3h
zhsungcp11-bjhl5-worker-a-5k5cd   Running   n1-standard-4   us-central1   us-central1-a   3d3h
zhsungcp11-bjhl5-worker-b-vwv2r   Running   n1-standard-4   us-central1   us-central1-b   3d3h
zhsungcp11-bjhl5-worker-c-54m7p   Running   n1-standard-4   us-central1   us-central1-c   163m


$ oc get mhc
NAME                              MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
machine-api-termination-handler   100%           0                  0

$ oc logs -f machine-api-controllers-6dddcb4fff-fjsxv -c machine-healthcheck-controller
I1211 14:11:05.851755       1 machinehealthcheck_controller.go:153] Reconciling openshift-machine-api/machine-api-termination-handler
I1211 14:11:05.851803       1 machinehealthcheck_controller.go:171] Reconciling openshift-machine-api/machine-api-termination-handler: finding targets
I1211 14:11:05.851880       1 machinehealthcheck_controller.go:228] Remediations are allowed for openshift-machine-api/machine-api-termination-handler: total targets: 0,  max unhealthy: 100%, unhealthy targets: 0
I1211 14:11:05.859989       1 machinehealthcheck_controller.go:263] Reconciling openshift-machine-api/machine-api-termination-handler: no more targets meet unhealthy criteria

$ oc edit machine zhsungcp11-bjhl5-worker-c-54m7p
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  annotations:
    machine.openshift.io/instance-state: RUNNING
  creationTimestamp: "2020-12-14T02:56:43Z"
  finalizers:
  - machine.machine.openshift.io
  generateName: zhsungcp11-bjhl5-worker-c-
  generation: 2
  labels:
    machine.openshift.io/cluster-api-cluster: zhsungcp11-bjhl5
    machine.openshift.io/cluster-api-machine-role: worker
    machine.openshift.io/cluster-api-machine-type: worker
    machine.openshift.io/cluster-api-machineset: zhsungcp11-bjhl5-worker-c
    machine.openshift.io/instance-type: n1-standard-4
    machine.openshift.io/region: us-central1
    machine.openshift.io/zone: us-central1-c
  name: zhsungcp11-bjhl5-worker-c-54m7p
...
spec:
  metadata:
    labels:
      machine.openshift.io/interruptible-instance: ""
  providerID: gce://openshift-qe/us-central1-c/zhsungcp11-bjhl5-worker-c-54m7p
  
    
Expected results:
The total target number is equal to the spot instance number.

Additional info:

Comment 1 Joel Speed 2021-02-01 11:58:13 UTC
We are still having issues with the GCP version of the termination handler during testing. Needs more investigation.

Comment 2 Joel Speed 2021-02-25 14:05:35 UTC
Still not sure how to resolve this, our testing breaks the GCP handler because of how DNS works on GCP, we can't re-enable the tests until we work out a way around this

Comment 3 Joel Speed 2021-03-19 14:29:48 UTC
Still not had time to get into this, hopefully will have time next sprint

Comment 4 Joel Speed 2021-04-19 16:14:56 UTC
Still no time to investigate how to test GCP effectively. Our current approach relies on overriding localhost binding of the metadata API but all DNS traffic also takes this route.
An alternative approach may be to configure a proxy that intercepts the traffic and configure the termination handlers to observe that proxy, not sure if that will work either though.

Comment 8 sunzhaohua 2021-07-15 09:36:59 UTC
move to verified as it is e2e test, not affect function.

Comment 11 errata-xmlrpc 2021-10-18 17:29:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Comment 12 milesjr 2022-10-16 22:08:40 UTC Comment hidden (spam)

Note You need to log in before you can comment on or make changes to this bug.