Bug 1823308

Summary: [aws] The DaemonSet machine-api-termination-handler couldn’t allocate any Pod due to SCC
Product: OpenShift Container Platform Reporter: sunzhaohua <zhsun>
Component: Cloud ComputeAssignee: Joel Speed <jspeed>
Cloud Compute sub component: Other Providers QA Contact: sunzhaohua <zhsun>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium    
Version: 4.5   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The service account for the machine-api-termination-handler was not assigned a SCC but requires host networking Consequence: The DaemonSet could not create pods Fix: Grant the service account permission to use the hostNetwork SCC Result: The DaemonSet can now create pods and behaves as expected
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:27:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description sunzhaohua 2020-04-13 08:38:36 UTC
Description of problem:
The DaemonSet machine-api-termination-handler couldn’t allocate any Pod due to SCC

Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-04-12-180647

How reproducible:
Always

Steps to Reproduce:
1. Create a spot instance
2. Check daemonset machine-api-termination-handler
3.

Actual results:
$ oc get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
machine-api-termination-handler 0 0 0 0 0 machine.openshift.io/interruptible-instance= 148m

$ oc get node --show-labels |grep machine.openshift.io/interruptible-instance=
ip-10-0-166-52.us-east-2.compute.internal Ready worker 50m v1.17.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.large,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-166-52,kubernetes.io/os=linux,machine.openshift.io/interruptible-instance=,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m4.large,node.openshift.io/os_id=rhcos,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2c

$ oc describe ds machine-api-termination-handler
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 47s (x60 over 4h26m) daemonset-controller Error creating: pods "machine-api-termination-handler-" is forbidden: unable to validate against any security context constraint: [provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used]


Expected results:
DaemonSet machine-api-termination-handler could create Pods on node with label "machine.openshift.io/interruptible-instance="

Additional info:

After add the ServiceAccount to a scc with enough privileges. 
$ oc adm policy add-scc-to-user privileged system:serviceaccount:machine-api-termination-handler:machine-api-termination-handler
securitycontextconstraints.security.openshift.io/privileged added to: ["system:serviceaccount:machine-api-termination-handler:machine-api-termination-handler"]

$ oc get ds
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
machine-api-termination-handler 1 1 1 1 1 machine.openshift.io/interruptible-instance= 6h29m

Comment 1 Joel Speed 2020-04-14 10:58:11 UTC
In case anyone is looking at this, the following has the wrong namespace for the service account, it should be openshift-machine-api 

> After add the ServiceAccount to a scc with enough privileges. 
> $ oc adm policy add-scc-to-user privileged system:serviceaccount:machine-api-termination-handler:machine-api-termination-handler
> securitycontextconstraints.security.openshift.io/privileged added to: ["system:serviceaccount:machine-api-termination-handler:machine-api-termination-handler"]

We can also use the hostnetwork scc rather than the privileged one, it has fewer privileges

$ oc adm policy add-scc-to-user hostnetwork system:serviceaccount:openshift-machine-api:machine-api-termination-handler

Will look into how to install this by default

Comment 4 sunzhaohua 2020-04-15 03:29:47 UTC
Verified
clusterversion: 4.5.0-0.nightly-2020-04-14-221451

$ oc get ds
NAME                              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                  AGE
machine-api-termination-handler   1         1         1       1            1           machine.openshift.io/interruptible-instance=   33m

$ oc get po
NAME                                           READY   STATUS    RESTARTS   AGE
cluster-autoscaler-operator-5996c77467-msjg6   2/2     Running   0          33m
machine-api-controllers-58cdd794bf-gz46c       4/4     Running   0          34m
machine-api-operator-6f857c9fb7-v9xml          2/2     Running   0          35m
machine-api-termination-handler-5s8sf          1/1     Running   0          5m1s

Comment 5 errata-xmlrpc 2020-07-13 17:27:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409