Description of problem: Following: https://docs.openshift.com/container-platform/4.6/nodes/containers/nodes-containers-sysctls.html pods are stuck on ContainerCreating state: [agabriel@agabriel OCPv4]$ oc get pod NAME READY STATUS RESTARTS AGE httpd-57f9f97d4b-5ppqf 0/1 ContainerCreating 0 20s httpd-b45f5c85c-ch7ng 1/1 Running 0 23h [agabriel@agabriel OCPv4]$ with the error: 7s Warning FailedCreatePodSandBox pod/httpd-57f9f97d4b-5ppqf Failed to create pod sandbox: rpc error: code = Unknown desc = failed to cleanup [/var/run/netns/dccafd5f-b79d-453a-9eba-300b3d5676ee /var/run/ipcns/dccafd5f-b79d-453a-9eba-300b3d5676ee /var/run/utsns/dccafd5f-b79d-453a-9eba-300b3d5676ee] after pinns failure exit status 1 Version-Release number of selected component (if applicable): Openshift 4.6 How reproducible: Steps to Reproduce: 1. follow https://docs.openshift.com/container-platform/4.6/nodes/containers/nodes-containers-sysctls.html [agabriel@agabriel OCPv4]$ oc get machineconfig 99-worker-generated-kubelet -o jsonpath='{.metadata.ownerReferences}{"\n"}' [{"apiVersion":"machineconfiguration.openshift.io/v1","blockOwnerDeletion":true,"controller":true,"kind":"KubeletConfig","name":"custom-kubelet","uid":"84c5d5bd-463d-435a-ae58-1c834e08d38a"}] [agabriel@agabriel OCPv4] [agabriel@agabriel OCPv4]$ oc get kubeletconfig custom-kubelet -o jsonpath='{.spec.kubeletConfig}{"\n"}' {"allowedUnsafeSysctls":["kernel.msg*","net.ipv4.route.min_pmtu"]} [agabriel@agabriel OCPv4] 2. create a new SCC, bind it to the ServiceAccount and assign it to the pod [agabriel@agabriel OCPv4]$ oc create -f restricted_scc.yml securitycontextconstraints.security.openshift.io/restricted-syscts created [agabriel@agabriel OCPv4]$ [agabriel@agabriel OCPv4]$ [agabriel@agabriel OCPv4]$ [agabriel@agabriel OCPv4]$ [agabriel@agabriel OCPv4]$ oc get scc NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP PRIORITY READONLYROOTFS VOLUMES anyuid false <no value> MustRunAs RunAsAny RunAsAny RunAsAny 10 false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"] hostaccess false <no value> MustRunAs MustRunAsRange MustRunAs RunAsAny <no value> false ["configMap","downwardAPI","emptyDir","hostPath","persistentVolumeClaim","projected","secret"] hostmount-anyuid false <no value> MustRunAs RunAsAny RunAsAny RunAsAny <no value> false ["configMap","downwardAPI","emptyDir","hostPath","nfs","persistentVolumeClaim","projected","secret"] hostnetwork false <no value> MustRunAs MustRunAsRange MustRunAs MustRunAs <no value> false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"] node-exporter true <no value> RunAsAny RunAsAny RunAsAny RunAsAny <no value> false ["*"] nonroot false <no value> MustRunAs MustRunAsNonRoot RunAsAny RunAsAny <no value> false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"] privileged true ["*"] RunAsAny RunAsAny RunAsAny RunAsAny <no value> false ["*"] restricted false <no value> MustRunAs MustRunAsRange MustRunAs RunAsAny <no value> false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"] restricted-syscts false <no value> MustRunAs MustRunAsRange MustRunAs RunAsAny <no value> false ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"] [agabriel@agabriel OCPv4]$ [agabriel@agabriel OCPv4]$ [agabriel@agabriel OCPv4]$ oc get scc restricted-syscts -o jsonpath='{.allowedUnsafeSysctls}{"\n"}' ["kernel.msg*","net.ipv4.route.min_pmtu","kernel.msgmax"] [agabriel@agabriel OCPv4]$ [agabriel@agabriel OCPv4]$ [agabriel@agabriel OCPv4]$ [agabriel@agabriel OCPv4]$ oc create serviceaccount sysctls serviceaccount/sysctls created [agabriel@agabriel OCPv4]$ oc adm policy add-scc-to-user restricted-sysctls -z sysctls clusterrole.rbac.authorization.k8s.io/system:openshift:scc:restricted-sysctls added: "sysctls" [agabriel@agabriel OCPv4]$ [agabriel@agabriel OCPv4]$ [agabriel@agabriel OCPv4]$ oc get all NAME READY STATUS RESTARTS AGE pod/httpd-b45f5c85c-5rw59 1/1 Running 0 21h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/httpd ClusterIP 172.30.129.83 <none> 8080/TCP,8443/TCP 21h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/httpd 1/1 0 1 21h NAME DESIRED CURRENT READY AGE replicaset.apps/httpd-65b69b845d 1 0 0 21h replicaset.apps/httpd-7ddd77d5fb 0 0 0 21h replicaset.apps/httpd-b45f5c85c 1 1 1 21h NAME IMAGE REPOSITORY TAGS UPDATED imagestream.image.openshift.io/httpd image-registry.openshift-image-registry.svc:5000/test3/httpd 2.4-el8 22 hours ago [agabriel@agabriel OCPv4]$ [agabriel@agabriel OCPv4]$ [agabriel@agabriel OCPv4]$ oc patch deployment.apps/httpd --patch '{"spec":{"template":{"spec":{"serviceAccountName": "sysctls"}}}}' deployment.apps/httpd patched [agabriel@agabriel OCPv4]$ 3. Actual results: The new pod with the unsafe sysctls is unable to start Expected results: The new pod with the unsafe sysctls is able to start Additional info: The SCC part is missing from the documentation. A documentation bug was already opened for that: https://bugzilla.redhat.com/show_bug.cgi?id=1893607
can I have the crio logs from the affected node?
crio.logs updated (with a private attachment). We have also the sosreport, please let me know if you need it
sorry, I don't think these logs are from the correct node. I don't find "failed to cleanup" in it
Hi Peter, that logs come from kubelet, for example: Mar 25 13:28:02 worker2.example hyperkube[1757]: E0325 13:28:02.275454 1757 remote_runtime.go:113] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to cleanup [/var/run/netns/0c85093d-661e-40c8-bdd5-499a6c3a416f /var/run/ipcns/0c85093d-661e-40c8-bdd5-499a6c3a416f /var/run/utsns/0c85093d-661e-40c8-bdd5-499a6c3a416f] after pinns failure exit status 1 Mar 25 13:28:02 worker2.example.com hyperkube[1757]: E0325 13:28:02.275509 1757 kuberuntime_sandbox.go:70] CreatePodSandbox for pod "base-pod_test-priv(0de8292b-6eae-43c7-b8f5-686285434920)" failed: rpc error: code = Unknown desc = failed to cleanup [/var/run/netns/0c85093d-661e-40c8-bdd5-499a6c3a416f /var/run/ipcns/0c85093d-661e-40c8-bdd5-499a6c3a416f /var/run/utsns/0c85093d-661e-40c8-bdd5-499a6c3a416f] after pinns failure exit status 1 Mar 25 13:28:02 worker2.example.com hyperkube[1757]: E0325 13:28:02.275520 1757 kuberuntime_manager.go:741] createPodSandbox for pod "base-pod_test-priv(0de8292b-6eae-43c7-b8f5-686285434920)" failed: rpc error: code = Unknown desc = failed to cleanup [/var/run/netns/0c85093d-661e-40c8-bdd5-499a6c3a416f /var/run/ipcns/0c85093d-661e-40c8-bdd5-499a6c3a416f /var/run/utsns/0c85093d-661e-40c8-bdd5-499a6c3a416f] after pinns failure exit status 1 I'm going to attach the sosreport
what sysctl are you trying to use exactly? It would be useful to me to have the pod spec of the failing pod creation
Those are the sysctls: securityContext: sysctls: - name: kernel.shm_rmid_forced value: "0" - name: net.ipv4.route.min_pmtu value: "552" - name: kernel.msgmax value: "65536"
didn't have time this sprint, hopefully I will next
I finally got a chance to look at this. It is not clear this has *ever* worked. The error is coming from the fact that min_ptmu is a host-only sysctl, and is not available in a network namespace. One can see this by doing `sudo unshare -n -- sysctl -a | grep min_ptmu` I am pretty sure this example was borrowed from usptream kube docs, which have since been updated: https://github.com/kubernetes/website/pull/15248 Thus, I have submitted a fix to the documentation (attached). It should likely be backported to all supported versions
verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438