Description of problem: This problem is similar to issue #4262861 - "Pods evicted due to lack of Ephemeral Storage" https://access.redhat.com/solutions/4262861. The problem I am reporting was solved by increasing space in /var. However, the GPU node had to be restarted because it entered "NotReady" state. Pod's that require too much ephemeral storage should not cause the node they are scheduled on to enter "NotReady" state. We encountered this problem at Penguin Computing on an OCP 4.2 cluster with one GPU enabled worker node (with 8 Nvidia V100s). The GPU node is running RHEL 7.6. The first error occurred when I ran" "oc create -f transformer-mlperf.yaml" #Note: the image for this pod is 4.6GB. First error: "Warning Evicted 61s kubelet, node029.penguincomputing.com The node was low on resource: ephemeral-storage. Container transformer was using 572Ki, which exceeds its request of 0." is thrown. " I deleted the pod and re-ran the same command ("oc create -f transformer-mlperf.yaml") to see if the problem was repeatable, This time a different error was thrown (see "Second error") and the node entered "NotReady" state. Second error: Warning Failed 3m33s (x3 over 3m55s) kubelet, asgnode029.tundra-lab.penguincomputing.com Error: container create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:413: running prestart hook 0 caused \\\"fork/exec /run/nvidia/driver/usr/bin/nvidia-container-toolkit: invalid argument\\\"\"" Warning NetworkNotReady 2m48s (x17 over 3m20s) kubelet, asgnode029.tundra-lab.penguincomputing.com network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network Increasing the size of /var solved the problem, however, the node should not have entered "NotReady" state (and gotten the second error shown above). Pods that require too much ephemeral storage should be evicted and not bring the node down. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Install Openshift 4.2 on a baremetal cluster. One worker node in the cluster must run RHEL 7.6 and have 8 Nvidia V100 GPUs. Node "node029" in our case. [dfeddema@master ~]$ oc get nodes NAME STATUS ROLES AGE VERSION node013 Ready worker 8d v1.14.6+c07e432da node014 Ready worker 8d v1.14.6+c07e432da node016.lab.computing.com Ready master 9d v1.14.6+c07e432da node017.lab.computing.com Ready master 9d v1.14.6+c07e432da node018.lab.computing.com Ready master 9d v1.14.6+c07e432da node024.lab.computing.com Ready worker 9d v1.14.6+c07e432da node025.lab.computing.com Ready worker 8d v1.14.6+c07e432da node026.lab.computing.com Ready worker 8d v1.14.6+c07e432da node029.lab.computing.com Ready worker 8d v1.14.6+c07e432da 2. set up the following devices # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda2 104806400 86615808 18190592 83% / devtmpfs 649374096 0 649374096 0% /dev tmpfs 649385612 84 649385528 1% /dev/shm tmpfs 649385612 3889952 645495660 1% /run tmpfs 649385612 0 649385612 0% /sys/fs/cgroup /dev/sda1 511720 9904 501816 2% /boot/efi tmpfs 129877124 0 129877124 0% /run/user/1001 overlay 104806400 86615808 18190592 83% /run/nvidia/driver tmpfs 65536 0 65536 0% /run/nvidia/driver/dev shm 65536 0 65536 0% /run/nvidia/driver/dev/shm 3.follow these instructions to use GPUs with Openshift 4.X https://docs.google.com/document/d/1dBVSaAgTB8H9GcWCFv5hOn-Z_piAfeYzpQLfJKJ76EQ/edit 3."oc create -f transformer-mlperf.yaml" file transformer-mlperf.yaml is here: https://gist.githubusercontent.com/dfeddema/072f9d702d5f3e28572626ddfaea605d/raw/ced66dfc36169b4e51766c37818ef88725ce871b/penguin_transformer-mlperf.yaml Actual results: Expected results: Additional info:
Upstream issue: https://github.com/kubernetes/kubernetes/issues/78865
Upstream PR: https://github.com/kubernetes/kubernetes/pull/81516
Fixed with an ephemeral reservation in BZ 1800319. *** This bug has been marked as a duplicate of bug 1800319 ***