Bug 1823308 - [aws] The DaemonSet machine-api-termination-handler couldn’t allocate any Pod due to SCC
Summary: [aws] The DaemonSet machine-api-termination-handler couldn’t allocate any Pod...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.5.0
Assignee: Joel Speed
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-13 08:38 UTC by sunzhaohua
Modified: 2020-07-13 17:27 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The service account for the machine-api-termination-handler was not assigned a SCC but requires host networking Consequence: The DaemonSet could not create pods Fix: Grant the service account permission to use the hostNetwork SCC Result: The DaemonSet can now create pods and behaves as expected
Clone Of:
Environment:
Last Closed: 2020-07-13 17:27:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-api-operator pull 555 0 None closed BUG 1823308: Allow machine-api-termination-handler to use hostnetwork SCC 2020-11-27 05:57:29 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:27:45 UTC

Description sunzhaohua 2020-04-13 08:38:36 UTC
Description of problem:
The DaemonSet machine-api-termination-handler couldn’t allocate any Pod due to SCC

Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-04-12-180647

How reproducible:
Always

Steps to Reproduce:
1. Create a spot instance
2. Check daemonset machine-api-termination-handler
3.

Actual results:
$ oc get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
machine-api-termination-handler 0 0 0 0 0 machine.openshift.io/interruptible-instance= 148m

$ oc get node --show-labels |grep machine.openshift.io/interruptible-instance=
ip-10-0-166-52.us-east-2.compute.internal Ready worker 50m v1.17.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.large,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-166-52,kubernetes.io/os=linux,machine.openshift.io/interruptible-instance=,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m4.large,node.openshift.io/os_id=rhcos,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2c

$ oc describe ds machine-api-termination-handler
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 47s (x60 over 4h26m) daemonset-controller Error creating: pods "machine-api-termination-handler-" is forbidden: unable to validate against any security context constraint: [provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used]


Expected results:
DaemonSet machine-api-termination-handler could create Pods on node with label "machine.openshift.io/interruptible-instance="

Additional info:

After add the ServiceAccount to a scc with enough privileges. 
$ oc adm policy add-scc-to-user privileged system:serviceaccount:machine-api-termination-handler:machine-api-termination-handler
securitycontextconstraints.security.openshift.io/privileged added to: ["system:serviceaccount:machine-api-termination-handler:machine-api-termination-handler"]

$ oc get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
machine-api-termination-handler 1 1 1 1 1 machine.openshift.io/interruptible-instance= 6h29m

Comment 1 Joel Speed 2020-04-14 10:58:11 UTC
In case anyone is looking at this, the following has the wrong namespace for the service account, it should be openshift-machine-api 

> After add the ServiceAccount to a scc with enough privileges. 
> $ oc adm policy add-scc-to-user privileged system:serviceaccount:machine-api-termination-handler:machine-api-termination-handler
> securitycontextconstraints.security.openshift.io/privileged added to: ["system:serviceaccount:machine-api-termination-handler:machine-api-termination-handler"]

We can also use the hostnetwork scc rather than the privileged one, it has fewer privileges

$ oc adm policy add-scc-to-user hostnetwork system:serviceaccount:openshift-machine-api:machine-api-termination-handler

Will look into how to install this by default

Comment 4 sunzhaohua 2020-04-15 03:29:47 UTC
Verified
clusterversion: 4.5.0-0.nightly-2020-04-14-221451

$ oc get ds
NAME                              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                  AGE
machine-api-termination-handler   1         1         1       1            1           machine.openshift.io/interruptible-instance=   33m

$ oc get po
NAME                                           READY   STATUS    RESTARTS   AGE
cluster-autoscaler-operator-5996c77467-msjg6   2/2     Running   0          33m
machine-api-controllers-58cdd794bf-gz46c       4/4     Running   0          34m
machine-api-operator-6f857c9fb7-v9xml          2/2     Running   0          35m
machine-api-termination-handler-5s8sf          1/1     Running   0          5m1s

Comment 5 errata-xmlrpc 2020-07-13 17:27:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.