Bug 1942837 - [OCPv4.6] unable to deploy pod with unsafe sysctls
Summary: [OCPv4.6] unable to deploy pod with unsafe sysctls
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.8.0
Assignee: Peter Hunt
QA Contact: MinLi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-25 07:38 UTC by Angelo Gabrieli
Modified: 2024-10-01 17:46 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:55:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-docs pull 32935 0 None open Bug 1942837: sysctl tutorial: drop net.ipv4.route.min_pmtu 2021-05-27 19:13:49 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:56:10 UTC

Description Angelo Gabrieli 2021-03-25 07:38:45 UTC
Description of problem:
Following:

https://docs.openshift.com/container-platform/4.6/nodes/containers/nodes-containers-sysctls.html

pods are stuck on ContainerCreating state:

[agabriel@agabriel OCPv4]$ oc get pod 
NAME                     READY   STATUS              RESTARTS   AGE
httpd-57f9f97d4b-5ppqf   0/1     ContainerCreating   0          20s
httpd-b45f5c85c-ch7ng    1/1     Running             0          23h
[agabriel@agabriel OCPv4]$ 

with the error:

7s          Warning   FailedCreatePodSandBox   pod/httpd-57f9f97d4b-5ppqf    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to cleanup [/var/run/netns/dccafd5f-b79d-453a-9eba-300b3d5676ee /var/run/ipcns/dccafd5f-b79d-453a-9eba-300b3d5676ee /var/run/utsns/dccafd5f-b79d-453a-9eba-300b3d5676ee] after pinns failure  exit status 1


Version-Release number of selected component (if applicable):
Openshift 4.6

How reproducible:


Steps to Reproduce:
1. follow https://docs.openshift.com/container-platform/4.6/nodes/containers/nodes-containers-sysctls.html

[agabriel@agabriel OCPv4]$ oc get machineconfig 99-worker-generated-kubelet -o jsonpath='{.metadata.ownerReferences}{"\n"}'
[{"apiVersion":"machineconfiguration.openshift.io/v1","blockOwnerDeletion":true,"controller":true,"kind":"KubeletConfig","name":"custom-kubelet","uid":"84c5d5bd-463d-435a-ae58-1c834e08d38a"}]
[agabriel@agabriel OCPv4]
[agabriel@agabriel OCPv4]$ oc get kubeletconfig custom-kubelet -o jsonpath='{.spec.kubeletConfig}{"\n"}'
{"allowedUnsafeSysctls":["kernel.msg*","net.ipv4.route.min_pmtu"]}
[agabriel@agabriel OCPv4]

2. create a new SCC, bind it to the ServiceAccount and assign it to the pod

[agabriel@agabriel OCPv4]$ oc create -f restricted_scc.yml 
securitycontextconstraints.security.openshift.io/restricted-syscts created
[agabriel@agabriel OCPv4]$ 
[agabriel@agabriel OCPv4]$ 
[agabriel@agabriel OCPv4]$ 
[agabriel@agabriel OCPv4]$ 
[agabriel@agabriel OCPv4]$ oc get scc
NAME                PRIV    CAPS         SELINUX     RUNASUSER          FSGROUP     SUPGROUP    PRIORITY     READONLYROOTFS   VOLUMES
anyuid              false   <no value>   MustRunAs   RunAsAny           RunAsAny    RunAsAny    10           false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
hostaccess          false   <no value>   MustRunAs   MustRunAsRange     MustRunAs   RunAsAny    <no value>   false            ["configMap","downwardAPI","emptyDir","hostPath","persistentVolumeClaim","projected","secret"]
hostmount-anyuid    false   <no value>   MustRunAs   RunAsAny           RunAsAny    RunAsAny    <no value>   false            ["configMap","downwardAPI","emptyDir","hostPath","nfs","persistentVolumeClaim","projected","secret"]
hostnetwork         false   <no value>   MustRunAs   MustRunAsRange     MustRunAs   MustRunAs   <no value>   false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
node-exporter       true    <no value>   RunAsAny    RunAsAny           RunAsAny    RunAsAny    <no value>   false            ["*"]
nonroot             false   <no value>   MustRunAs   MustRunAsNonRoot   RunAsAny    RunAsAny    <no value>   false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
privileged          true    ["*"]        RunAsAny    RunAsAny           RunAsAny    RunAsAny    <no value>   false            ["*"]
restricted          false   <no value>   MustRunAs   MustRunAsRange     MustRunAs   RunAsAny    <no value>   false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
restricted-syscts   false   <no value>   MustRunAs   MustRunAsRange     MustRunAs   RunAsAny    <no value>   false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]
[agabriel@agabriel OCPv4]$ 
[agabriel@agabriel OCPv4]$ 
[agabriel@agabriel OCPv4]$ oc get scc restricted-syscts -o jsonpath='{.allowedUnsafeSysctls}{"\n"}'
["kernel.msg*","net.ipv4.route.min_pmtu","kernel.msgmax"]
[agabriel@agabriel OCPv4]$
[agabriel@agabriel OCPv4]$ 
[agabriel@agabriel OCPv4]$ 
[agabriel@agabriel OCPv4]$ oc create serviceaccount sysctls
serviceaccount/sysctls created
[agabriel@agabriel OCPv4]$ oc adm policy add-scc-to-user restricted-sysctls -z sysctls
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:restricted-sysctls added: "sysctls"
[agabriel@agabriel OCPv4]$ 
[agabriel@agabriel OCPv4]$ 
[agabriel@agabriel OCPv4]$ oc get all
NAME                        READY   STATUS    RESTARTS   AGE
pod/httpd-b45f5c85c-5rw59   1/1     Running   0          21h

NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/httpd   ClusterIP   172.30.129.83   <none>        8080/TCP,8443/TCP   21h

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/httpd   1/1     0            1           21h

NAME                               DESIRED   CURRENT   READY   AGE
replicaset.apps/httpd-65b69b845d   1         0         0       21h
replicaset.apps/httpd-7ddd77d5fb   0         0         0       21h
replicaset.apps/httpd-b45f5c85c    1         1         1       21h

NAME                                   IMAGE REPOSITORY                                               TAGS      UPDATED
imagestream.image.openshift.io/httpd   image-registry.openshift-image-registry.svc:5000/test3/httpd   2.4-el8   22 hours ago
[agabriel@agabriel OCPv4]$ 
[agabriel@agabriel OCPv4]$ 
[agabriel@agabriel OCPv4]$ oc patch deployment.apps/httpd --patch '{"spec":{"template":{"spec":{"serviceAccountName": "sysctls"}}}}'
deployment.apps/httpd patched
[agabriel@agabriel OCPv4]$

3.

Actual results:
The new pod with the unsafe sysctls is unable to start

Expected results:
The new pod with the unsafe sysctls is able to start


Additional info:
The SCC part is missing from the documentation. A documentation bug was already opened for that: https://bugzilla.redhat.com/show_bug.cgi?id=1893607

Comment 1 Peter Hunt 2021-03-25 15:48:57 UTC
can I have the crio logs from the affected node?

Comment 3 Angelo Gabrieli 2021-03-26 08:23:43 UTC
crio.logs updated (with a private attachment).
We have also the sosreport, please let me know if you need it

Comment 4 Peter Hunt 2021-04-01 19:13:35 UTC
sorry, I don't think these logs are from the correct node. I don't find "failed to cleanup" in it

Comment 5 Angelo Gabrieli 2021-04-02 11:04:42 UTC
Hi Peter,

that logs come from kubelet, for example:


Mar 25 13:28:02 worker2.example hyperkube[1757]: E0325 13:28:02.275454    1757 remote_runtime.go:113] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to cleanup [/var/run/netns/0c85093d-661e-40c8-bdd5-499a6c3a416f /var/run/ipcns/0c85093d-661e-40c8-bdd5-499a6c3a416f /var/run/utsns/0c85093d-661e-40c8-bdd5-499a6c3a416f] after pinns failure  exit status 1
Mar 25 13:28:02 worker2.example.com hyperkube[1757]: E0325 13:28:02.275509    1757 kuberuntime_sandbox.go:70] CreatePodSandbox for pod "base-pod_test-priv(0de8292b-6eae-43c7-b8f5-686285434920)" failed: rpc error: code = Unknown desc = failed to cleanup [/var/run/netns/0c85093d-661e-40c8-bdd5-499a6c3a416f /var/run/ipcns/0c85093d-661e-40c8-bdd5-499a6c3a416f /var/run/utsns/0c85093d-661e-40c8-bdd5-499a6c3a416f] after pinns failure  exit status 1
Mar 25 13:28:02 worker2.example.com hyperkube[1757]: E0325 13:28:02.275520    1757 kuberuntime_manager.go:741] createPodSandbox for pod "base-pod_test-priv(0de8292b-6eae-43c7-b8f5-686285434920)" failed: rpc error: code = Unknown desc = failed to cleanup [/var/run/netns/0c85093d-661e-40c8-bdd5-499a6c3a416f /var/run/ipcns/0c85093d-661e-40c8-bdd5-499a6c3a416f /var/run/utsns/0c85093d-661e-40c8-bdd5-499a6c3a416f] after pinns failure  exit status 1


I'm going to attach the sosreport

Comment 7 Peter Hunt 2021-04-28 18:15:12 UTC
what sysctl are you trying to use exactly? It would be useful to me to have the pod spec of the failing pod creation

Comment 8 Angelo Gabrieli 2021-05-10 12:55:33 UTC
Those are the sysctls:


      securityContext:
        sysctls:
        - name: kernel.shm_rmid_forced
          value: "0"
        - name: net.ipv4.route.min_pmtu
          value: "552"
        - name: kernel.msgmax
          value: "65536"

Comment 10 Peter Hunt 2021-05-20 19:01:34 UTC
didn't have time this sprint, hopefully I will next

Comment 11 Peter Hunt 2021-05-27 19:16:50 UTC
I finally got a chance to look at this.

It is not clear this has *ever* worked. The error is coming from the fact that min_ptmu is a host-only sysctl, and is not available in a network namespace. One can see this by doing `sudo unshare -n -- sysctl -a | grep min_ptmu`

I am pretty sure this example was borrowed from usptream kube docs, which have since been updated: https://github.com/kubernetes/website/pull/15248

Thus, I have submitted a fix to the documentation (attached). It should likely be backported to all supported versions

Comment 13 MinLi 2021-06-03 03:49:44 UTC
verified

Comment 16 errata-xmlrpc 2021-07-27 22:55:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.