Bug 1422049

Summary: EmptyDir could lead to memory exhaustion
Product: OpenShift Container Platform Reporter: Sergi Jimenez Romero <sjr>
Component: NodeAssignee: Seth Jennings <sjenning>
Status: CLOSED ERRATA QA Contact: Qixuan Wang <qixuan.wang>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: aos-bugs, decarr, fanlong_meng, jokerman, mmccomas, qixuan.wang, sjenning, sreber, wmeng
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: A design limitation in previous versions does not account memory-backed volumes against the pod's cumulative memory limit. Consequence: It is possible for a user to exhaust memory on the node by creating a large file in an memory-backed volume, regardless of the memory limit. Fix: Pod-level cgroups were added to, among other things, enforce limits on memory-backed volumes. Result: Memory-backed volume sizes are now bounded by cumulative pod memory limits.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 21:52:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sergi Jimenez Romero 2017-02-14 11:24:49 UTC
Description of problem:

OpenShift allows users to create in-memory EmptyDir volumes for their pods (by setting the option "medium: memory"), which translates into a tmpfs file system mounted inside the container. However, the API does not allow to limit the size of these tmpfs filesystems, which defaults to the half of the node RAM (as usual in Linux). This situation could lead to a memory exhaustion on the node where the pods are running.



Version-Release number of selected component (if applicable):

3.3.0

How reproducible:
Always

Steps to Reproduce:
1. Start pod with EmptyDir (medium: memory) and set a memory limit.
2. Rsh to the pod and use dd to create a file on the EmptyDir bigger than the pod's memory limit.
3. The pod will be restarted.
4. Repeat 1-2.

Actual results:
The EmptyDir keeps all the files and based on kernel documentation [1][2], that could potentially lead to memory exhaustion on the host.

Expected results:

- It should be possible to set a fixed size for an in-memory EmptyDir.
- It should be possible to limit the size of all the in-memory EmptyDir volumes defined in each user pod via limits (as we currently can limit RAM and CPU).
- It should be possible to limit how many GB can a user allocate for in-memory EmptyDir volumes via the quota.

Additional info:

Comment 5 Derek Carr 2017-02-14 22:06:22 UTC
To address this issue, we need pod level cgroup hierarchy planned in Kubernetes 1.6.

Comment 11 Seth Jennings 2017-05-05 17:14:24 UTC
I just verified that having pod cgroups enabled on the node in 3.6 (it is enabled by default) enforces the memory limit wrt memory backed emptydirs.

[root@test ~]# cat busybox.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  containers:
  - name: busybox
    image: busybox
    resources:
      limits:
        memory: 1Gi
        cpu: 1
    command:
    - dd
    - if=/dev/zero
    - of=/mnt/zero
    - bs=1M
    - count=2000
    volumeMounts:
    - name: myvol
      mountPath: /mnt
  terminationGracePeriodSeconds: 0
  volumes:
  - name: myvol
    emptyDir:
      medium: Memory
[root@test ~]# oc create -f busybox.yaml 
pod "busybox" created
[root@test ~]# oc describe pod | grep -A 5 "Last State"
    Last State:		Terminated
      Reason:		OOMKilled
      Exit Code:	137
      Started:		Fri, 05 May 2017 17:10:42 +0000
      Finished:		Fri, 05 May 2017 17:10:42 +0000
    Ready:		False
# mount | grep myvol
tmpfs on /var/lib/origin/openshift.local.volumes/pods/c296d664-31b5-11e7-a96c-fa163e71bc65/volumes/kubernetes.io~empty-dir/myvol type tmpfs (rw,relatime,seclabel)
[root@test ~]# cd /var/lib/origin/openshift.local.volumes/pods/c296d664-31b5-11e7-a96c-fa163e71bc65/volumes/kubernetes.io~empty-dir/myvol
[root@test myvol]# ls -alh
total 1023M
drwxrwsrwt. 2 root       1000040000    60 May  5 17:10 .
drwxr-xr-x. 3 root       root          19 May  5 17:10 ..
-rw-r--r--. 1 1000040000 1000040000 1023M May  5 17:11 zero

Even though the pod tries to write a 2Gi file, it is OOMKilled when the file reached 1Gi in size i.e. the memory limit set on the container.

Comment 12 Seth Jennings 2017-05-08 13:41:09 UTC
Upstream PR:
https://github.com/kubernetes/kubernetes/pull/41349

Comment 13 Seth Jennings 2017-05-19 18:48:00 UTC
Included in Origin 1.6.1 rebase:
https://github.com/openshift/origin/pull/13653

Comment 14 Qixuan Wang 2017-05-24 06:56:07 UTC
Tested on OCP3.6 (openshift v3.6.79, kubernetes v1.6.1+5115d708d7, etcd 3.1.0)

EmptyDir won't exhaust memory. Move the bug to VERIFIED, thanks.

Comment 21 errata-xmlrpc 2017-11-28 21:52:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Comment 22 fanlong 2018-12-12 19:24:30 UTC
you can use sizeLimit and  i verified it already.
even though after entering container, df -h , you see the emptydir is 128G, but you could only use the space under the sizeLimit.


apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  containers:
  - name: busybox
    image: gcr.io/google_containers/busybox:1.24
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        memory: 1Gi
        cpu: 1
    command: ['sh', '-c', 'echo Hello Kubernetes!>/test-pd/mfltest.txt && sleep 3600' ]
    ports:
    - containerPort: 80
    volumeMounts:
    - mountPath: /test-pd  
      name: test-volume
  volumes:
  - name: test-volume
    emptyDir:
      medium: Memory
      sizeLimit: "1M" 


after enter the container, you can verify by typing´╝Ü dd if=/dev/zero of=/test-pd/zero bs=1M count=10
the container exit.

if you have further question, please let met know fanlong_meng