Bug 1768858
| Summary: | Failed to create pod on windows node when project is not "default" | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | gaoshang <sgao> |
| Component: | Windows Containers | Assignee: | Sebastian Soto <ssoto> |
| Status: | CLOSED DEFERRED | QA Contact: | gaoshang <sgao> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.3.0 | CC: | anusaxen, aos-bugs, aravindh, dhumphri, gmarkley, mifiedle, rgudimet, ssoto, sumehta, wewang |
| Target Milestone: | --- | Keywords: | TestBlocker |
| Target Release: | 4.4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-02-03 03:02:17 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
gaoshang
2019-11-05 12:13:04 UTC
@gaoshang Can you try pulling the mcr.microsoft.com/windows/servercore:ltsc2019 image on the instance first before the deployment? Instructions present here : https://docs.google.com/document/d/1zAidTs8wbWHzamh4G3pwdaPomyT-doSbo9xTAcIK9no/edit#heading=h.9sbyxso0hjcb This could be because of the timing out of kubelet run, given the size of docker image for windows (In reply to sumehta from comment #1) > @gaoshang Can you try pulling the > mcr.microsoft.com/windows/servercore:ltsc2019 image on the instance first > before the deployment? > Instructions present here : > https://docs.google.com/document/d/1zAidTs8wbWHzamh4G3pwdaPomyT- > doSbo9xTAcIK9no/edit#heading=h.9sbyxso0hjcb > This could be because of the timing out of kubelet run, given the size of > docker image for windows After pulling mcr.microsoft.com/windows/servercore:ltsc2019 image on the instance first, pod still can not be created in a new project, found replicaset error: "Error creating: pods "win-webserver-8648d6f7b8-" is forbidden: unable to validate against any security context constraint" # oc get pod No resources found. # oc get all NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/win-webserver LoadBalancer 172.30.46.81 <pending> 80:30685/TCP 41m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/win-webserver 0/1 0 0 41m NAME DESIRED CURRENT READY AGE replicaset.apps/win-webserver-8648d6f7b8 1 0 0 7m12s # oc describe replicaset.apps/win-webserver-8648d6f7b8 Name: win-webserver-8648d6f7b8 Namespace: prosgao Selector: app=win-webserver,pod-template-hash=8648d6f7b8 Labels: app=win-webserver pod-template-hash=8648d6f7b8 Annotations: deployment.kubernetes.io/desired-replicas: 1 deployment.kubernetes.io/max-replicas: 2 deployment.kubernetes.io/revision: 1 Controlled By: Deployment/win-webserver Replicas: 0 current / 1 desired Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app=win-webserver pod-template-hash=8648d6f7b8 Containers: windowswebserver: Image: mcr.microsoft.com/windows/servercore:ltsc2019 Port: <none> Host Port: <none> Command: powershell.exe -command <#code used from https://gist.github.com/wagnerandrade/5424431#> ; $$listener = New-Object System.Net.HttpListener ; $$listener.Prefixes.Add('http://*:80/') ; $$listener.Start() ; $$callerCounts = @{} ; Write-Host('Listening at http://*:80/') ; while ($$listener.IsListening) { ;$$context = $$listener.GetContext() ;$$requestUrl = $$context.Request.Url ;$$clientIP = $$context.Request.RemoteEndPoint.Address ;$$response = $$context.Response ;Write-Host '' ;Write-Host('> {0}' -f $$requestUrl) ; ;$$count = 1 ;$$k=$$callerCounts.Get_Item($$clientIP) ;if ($$k -ne $$null) { $$count += $$k } ;$$callerCounts.Set_Item($$clientIP, $$count) ;$$ip=(Get-NetAdapter | Get-NetIpAddress); $$header='<html><body><H1>Windows Container Web Server</H1>' ;$$callerCountsString='' ;$$callerCounts.Keys | % { $$callerCountsString+='<p>IP {0} callerCount {1} ' -f $$ip[1].IPAddress,$$callerCounts.Item($$_) } ;$$footer='</body></html>' ;$$content='{0}{1}{2}' -f $$header,$$callerCountsString,$$footer ;Write-Output $$content ;$$buffer = [System.Text.Encoding]::UTF8.GetBytes($$content) ;$$response.ContentLength64 = $$buffer.Length ;$$response.OutputStream.Write($$buffer, 0, $$buffer.Length) ;$$response.Close() ;$$responseStatus = $$response.StatusCode ;Write-Host('< {0}' -f $$responseStatus) } ; Environment: <none> Mounts: <none> Volumes: <none> Conditions: Type Status Reason ---- ------ ------ ReplicaFailure True FailedCreate Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreate 5m35s (x21 over 27m) replicaset-controller Error creating: pods "win-webserver-8648d6f7b8-" is forbidden: unable to validate against any security context constraint: [] Update: This bug also exist in OCP 4.3.0-0.nightly-2019-11-24-183610, I think it's a SCC related issue and can be workaround by following steps: Version-Release number of selected component (if applicable): # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.0-0.nightly-2019-11-24-183610 True False 5m35s Cluster version is 4.3.0-0.nightly-2019-11-24-183610 windows-machine-config-operator commit: # git show commit 1eb1f983774101b5077828fd2efb4dfb711d5886 1. Install OCP 4.3 and scale up a windows node. 2. Create a new project, edit restricted scc, change following 2 section from: runAsUser: type: MustRunAsRange seLinuxContext: type: MustRunAs to: runAsUser: type: RunAsAny seLinuxContext: type: RunAsAny # oc new-project prosgao # oc edit scc restricted # oc replace -f /tmp/oc-edit-hlxvr.yaml 3. Now windows pod can be created # oc create -f https://gist.githubusercontent.com/suhanime/683ee7b5a2f55c11e3a26a4223170582/raw/86376218c26eadc0e709607b9a3354f275c52132/WinWebServer.yaml # oc get pod NAME READY STATUS RESTARTS AGE win-webserver-79b64df8b9-5cgk6 1/1 Running 0 52s Update:
Creating window pod will fail with restricted security context constraints, another workaround is to use privileged scc
Steps:
1, create new project and add privileged scc to user
# oc new-project winc
# oc adm policy add-scc-to-user privileged system:serviceaccount:winc:default
# oc get scc privileged -o yaml | grep users -A 5
users:
- system:admin
- system:serviceaccount:openshift-infra:build-controller
- system:serviceaccount:winc:default
2, in deployment WinWebServer.yaml, add privileged securityContext
# cat WinWebServer.yaml | grep containers: -A 6
containers:
- name: windowswebserver
image: mcr.microsoft.com/windows/servercore:ltsc2019
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
command:
3, create deployment, windows pod works
# oc create -f WinWebServer.yaml
# oc get pods
NAME READY STATUS RESTARTS AGE
win-webserver-7fd94cd8f-pzt2c 1/1 Running 0 5m22s
This looks to be hitting this error https://github.com/docker/docker-ce/blob/58a1084222834a52f8e20e9641aa5b5fb927bef0/components/engine/daemon/oci_windows.go#L321 I've noticed that spinning up a pod in a namespace other than default causes these security options to be added to the pod container spec by default: ``` securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID runAsUser: 1000560000 ``` and these to the pod spec: ``` securityContext: fsGroup: 1000560000 seLinuxOptions: level: s0:c24,c4 ``` The hostconfig (C:\ProgramData\docker\containers\<container_id>\hostconfig)for a pod hitting this error contains:
```
"SecurityOpt": [
"label=level:s0:c24,c4"
],
```
Which is an invalid value, the only valid key is "credentialspec"
The SELinux options are coming from the SCC attached to the project/namespace https://docs.openshift.com/container-platform/4.2/authentication/managing-security-context-constraints.html *** Bug 1785787 has been marked as a duplicate of this bug. *** This bug can be worked around by disabling SCC in specific namespaces. This should not be used in production, and in general any namespace that this has been done to should not be used to run linux pods. To skip SCC for a namespace the label "openshift.io/run-level = 1" should be applied to the namespace. This will apply to both linux and windows pods, and thus linux pods should not be deployed into this namespace. This information will be added to the development preview doc. Long term, we may add a webhook that will mutate Windows pods to remove non-windows options from the pod. This will make this a non-issue and remove the need to do the above workaround. Followup work will be tracked in https://issues.redhat.com/browse/WINC-213 @gaoshang please close this bug given we have a workaround. For GA in the operator time frame, the feature we are adding will overcome this problem. Sure, closed this bug and will follow up WINC-213, thanks. |