1767284 – Pod Evicted due to lack of Ephemeral Storage and node marked as "NotReady"

Bug 1767284 - Pod Evicted due to lack of Ephemeral Storage and node marked as "NotReady"

Summary: Pod Evicted due to lack of Ephemeral Storage and node marked as "NotReady"

Keywords:
Status:	CLOSED DUPLICATE of bug 1800319
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.2.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Ryan Phillips
QA Contact:	Sunil Choudhary
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-10-31 04:34 UTC by Diane Feddema
Modified:	2020-03-26 13:42 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-03-26 13:42:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Diane Feddema 2019-10-31 04:34:44 UTC

Description of problem:

This problem is similar to issue #4262861 - "Pods evicted due to lack of Ephemeral Storage" https://access.redhat.com/solutions/4262861.  The problem I am reporting was solved by increasing space in /var.   However, the GPU node had to be restarted because it entered "NotReady" state.  Pod's that require too much ephemeral storage should not cause the node they are scheduled on to enter "NotReady" state. 

We encountered this problem at Penguin Computing on an OCP 4.2 cluster with one GPU enabled worker node (with 8 Nvidia V100s).   The GPU node is running RHEL 7.6. 

The first error occurred when I ran"
"oc create -f transformer-mlperf.yaml"  #Note: the image for this pod is 4.6GB. 

First error:
"Warning  Evicted    61s   kubelet, node029.penguincomputing.com  The node was low on resource: ephemeral-storage. Container transformer was using 572Ki, which exceeds its request of 0." is thrown. "

I deleted the pod and re-ran the same command ("oc create -f transformer-mlperf.yaml") to see if the problem was repeatable, 
 
This time a different error was thrown (see "Second error") and the node entered "NotReady" state. 

Second error:
Warning  Failed           3m33s (x3 over 3m55s)   kubelet, asgnode029.tundra-lab.penguincomputing.com  Error: container create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:413: running prestart hook 0 caused \\\"fork/exec /run/nvidia/driver/usr/bin/nvidia-container-toolkit: invalid argument\\\"\""
  Warning  NetworkNotReady  2m48s (x17 over 3m20s)  kubelet, asgnode029.tundra-lab.penguincomputing.com  network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network
  
Increasing the size of /var solved the problem, however, the node should not have entered "NotReady" state  (and gotten the second error shown above).  Pods that require too much ephemeral storage should be evicted and not bring the node down. 


Version-Release number of selected component (if applicable):


How reproducible:
100%


Steps to Reproduce:

1. Install Openshift 4.2 on a baremetal cluster.  One worker node in the cluster must run RHEL 7.6 and have 8 Nvidia V100 GPUs.  Node "node029" in our case.

[dfeddema@master ~]$ oc get nodes
NAME                                      STATUS   ROLES    AGE   VERSION
node013                                   Ready    worker   8d    v1.14.6+c07e432da
node014                                   Ready    worker   8d    v1.14.6+c07e432da
node016.lab.computing.com                 Ready    master   9d    v1.14.6+c07e432da
node017.lab.computing.com                 Ready    master   9d    v1.14.6+c07e432da
node018.lab.computing.com                 Ready    master   9d    v1.14.6+c07e432da
node024.lab.computing.com                 Ready    worker   9d    v1.14.6+c07e432da
node025.lab.computing.com                 Ready    worker   8d    v1.14.6+c07e432da
node026.lab.computing.com                 Ready    worker   8d    v1.14.6+c07e432da
node029.lab.computing.com                 Ready    worker   8d    v1.14.6+c07e432da


2. set up the following devices
# df
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/sda2      104806400 86615808  18190592  83% /
devtmpfs       649374096        0 649374096   0% /dev
tmpfs          649385612       84 649385528   1% /dev/shm
tmpfs          649385612  3889952 645495660   1% /run
tmpfs          649385612        0 649385612   0% /sys/fs/cgroup
/dev/sda1         511720     9904    501816   2% /boot/efi
tmpfs          129877124        0 129877124   0% /run/user/1001
overlay        104806400 86615808  18190592  83% /run/nvidia/driver
tmpfs              65536        0     65536   0% /run/nvidia/driver/dev
shm                65536        0     65536   0% /run/nvidia/driver/dev/shm

3.follow these instructions to use GPUs with Openshift 4.X
https://docs.google.com/document/d/1dBVSaAgTB8H9GcWCFv5hOn-Z_piAfeYzpQLfJKJ76EQ/edit

3."oc create -f transformer-mlperf.yaml"  
file transformer-mlperf.yaml is here:
https://gist.githubusercontent.com/dfeddema/072f9d702d5f3e28572626ddfaea605d/raw/ced66dfc36169b4e51766c37818ef88725ce871b/penguin_transformer-mlperf.yaml


Actual results:


Expected results:


Additional info:

Comment 1 Ryan Phillips 2019-11-07 16:23:07 UTC

Upstream issue: https://github.com/kubernetes/kubernetes/issues/78865

Comment 2 Ryan Phillips 2020-02-04 22:56:26 UTC

Upstream PR: https://github.com/kubernetes/kubernetes/pull/81516

Comment 3 Ryan Phillips 2020-03-26 13:42:59 UTC

Fixed with an ephemeral reservation in BZ 1800319.

*** This bug has been marked as a duplicate of bug 1800319 ***

Note You need to log in before you can comment on or make changes to this bug.