Description of Problem: The DaemonSet machine-api-termination-handler couldn't allocate Pod Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2020-11-26-221840 How Reproducible: Always Steps to Reproduce: 1. Create a spot instance 2. Check daemonset machine-api-termination-handler 3. Actual results: $ oc get ds NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE machine-api-termination-handler 0 0 0 0 0 machine.openshift.io/interruptible-instance= 4h49m $ oc get node --show-labels |grep machine.openshift.io/interruptible-instance= ip-10-0-213-198.us-east-2.compute.internal Ready worker 48m v1.19.2+ad738ba beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.large,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-213-198,kubernetes.io/os=linux,machine.openshift.io/interruptible-instance=,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.large,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2c,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2c $ oc describe ds machine-api-termination-handler Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreate 2m15s (x20 over 22m) daemonset-controller Error creating: pods "machine-api-termination-handler-" is forbidden: unable to validate against any security context constraint: [provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.volumes[2]: Invalid value: "secret": secret volumes are not allowed to be used] Expected results: DaemonSet machine-api-termination-handler could create Pods on node with label "machine.openshift.io/interruptible-instance=" Additional info:
Failed to verify, machine-api-termination-handler pod stuck in CrashLoopBackOff status. $ oc get po NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-6c655f5d8c-5xjrj 2/2 Running 0 38m cluster-baremetal-operator-77c9666dc4-4bdv7 1/1 Running 0 38m machine-api-controllers-549cf9c695-v6r9h 7/7 Running 0 42m machine-api-operator-5b4fcbdbb7-j7xgf 2/2 Running 0 38m machine-api-termination-handler-z7g49 0/1 CrashLoopBackOff 6 7m2s $ oc logs -f machine-api-termination-handler-z7g49 I1130 02:45:09.603797 1 cert_rotation.go:137] Starting client certificate rotation controller I1130 02:45:09.703793 1 request.go:581] Throttling request took 76.347045ms, request: GET:https://api-int.zhsunaws30.qe.devcluster.openshift.com:6443/apis/apiextensions.k8s.io/v1beta1?timeout=32s E1130 02:45:12.012929 1 main.go:70] "msg"="Error starting termination handler" "error"="error fetching machine for node (\"ip-10-0-219-126.us-east-2.compute.internal\"): error listing machines: machines.machine.openshift.io is forbidden: User \"system:node:ip-10-0-219-126.us-east-2.compute.internal\" cannot list resource \"machines\" in API group \"machine.openshift.io\" in the namespace \"openshift-machine-api\""
It's known that these weren't going to work initially, we are completely reimplementing the model of how these work. I've attached the three implementation PRs, this should be ready to verify once they are all merged.
the PR associated with this has all the necessary labels and is currently waiting on CI to pass
Verifed on aws, gcp and azure. clusterversion: 4.7.0-0.nightly-2020-12-09-112139 $ oc get ds NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE machine-api-termination-handler 1 1 1 1 1 machine.openshift.io/interruptible-instance= 4h56m $ oc get po NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-6c74c8db9c-tkvj2 2/2 Running 0 4h34m cluster-baremetal-operator-6769569bd7-gxfkd 1/1 Running 0 4h34m machine-api-controllers-d9c44bd94-7xqtt 7/7 Running 0 4h32m machine-api-operator-78d65b4969-7hgwh 2/2 Running 0 4h32m machine-api-termination-handler-gj8pf 1/1 Running 0 6m28s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633