Bug 1349311 - Using EmptyDir as storage option for openshift pods leads to filling up openshift node storage space
Summary: Using EmptyDir as storage option for openshift pods leads to filling up opens...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.2.1
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Seth Jennings
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-23 08:15 UTC by Elvir Kuric
Modified: 2019-12-16 05:58 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Using hostPath for storage could lead to out of disk space. Consequence: openshift root disk could become full and unusable. Fix: add support for pod eviction based on disk space. Result: If a pod using hostPath uses too much space it may be evicted from the node.
Clone Of:
Environment:
Last Closed: 2017-04-12 19:05:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0884 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.5 RPM Release Advisory 2017-04-12 22:50:07 UTC

Description Elvir Kuric 2016-06-23 08:15:54 UTC
Description of problem:

If one decide to use EmptyDir as storage option for openshift pods , and if intensively write inside pods then space under /var/lib/origin/openshift.local.volumes/pods will fill up without any limitation what will lead to situation that openshift node file system will be filled and normal operation of openshift node will be affected 


Version-Release number of selected component (if applicable):

I noticed this with below packages

atomic-openshift-clients-3.2.1.3-1.git.0.dfa4ad6.el7.x86_64
tuned-profiles-atomic-openshift-node-3.2.1.3-1.git.0.dfa4ad6.el7.x86_64
atomic-openshift-3.2.1.3-1.git.0.dfa4ad6.el7.x86_64
atomic-openshift-sdn-ovs-3.2.1.3-1.git.0.dfa4ad6.el7.x86_64
atomic-openshift-master-3.2.1.3-1.git.0.dfa4ad6.el7.x86_64
atomic-openshift-node-3.2.1.3-1.git.0.dfa4ad6.el7.x86_64

How reproducible:

always 


Steps to Reproduce:
1. create pod(s) with EmptyDir as storage option 
2. write data inside pod
3. watch space usage in /var/lib/origin/openshift.local.volumes/pods 


Actual results:

space usage on /var/lib/origin/openshift.local.volumes/pods will go up. If /var is not separate partition on openshift node this will lead that / space will filled up to 100% and services as atomic-openshift-node/docker will not function.
df from affected system 

# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda2       10G   10G   20K 100% /
devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           3.7G     0  3.7G   0% /dev/shm
tmpfs           3.7G  377M  3.4G  10% /run
tmpfs           3.7G     0  3.7G   0% /sys/fs/cgroup
tmpfs           757M     0  757M   0% /run/user/0


Expected results:

to prevent above behavior as this lead to situation where due to filled up / file system that openshift node service will not function properly and openshift cluster can be degraded 


Additional info:

Once above critical information happen, pods residing on affected  node will be moved to ( once affected openshift-node stops responding to master ) new node and process will repeat and new node will also get / ( /var/lib/origin/openshift.local.volumes/pods ) filled up, then next one and so on.

Comment 1 Andy Goldstein 2016-06-27 20:23:10 UTC
The out of disk eviction work that Derek is doing should help here (and might be sufficient to close this out).

Comment 2 Bradley Childs 2016-08-03 13:27:26 UTC
Upstream:

https://github.com/kubernetes/kubernetes/pull/27199

Comment 3 Bradley Childs 2016-08-08 17:23:45 UTC
@Hou Jianwei The disk pressure changes were merged, can you verify that this resolves the usability issue?

Comment 4 Andy Goldstein 2016-08-08 18:05:44 UTC
This is not in origin yet

Comment 7 Troy Dawson 2016-10-18 16:20:31 UTC
This has been merged into ose and is in OSE v3.4.0.12 or newer.

Comment 9 Wenqi He 2016-11-09 10:43:42 UTC
I have tested this on version below, this is fixed:
oc v3.4.0.23+24b1a58
kubernetes v1.4.0+776c994

The / is limited after I created a bigger size data:

bash-4.3$ dd if=/dev/zero of=/tmp/test bs=3072M count=1
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB) copied, 119.747 s, 17.9 MB/s

On the node:
$ df -h 
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        10G  6.6G  3.5G  66% /

So will update status to verified. Thanks.

Comment 12 Derek Carr 2016-11-28 22:17:28 UTC
Upstream tracking issue:
https://github.com/kubernetes/kubernetes/issues/35406

Comment 15 Seth Jennings 2017-01-25 20:55:53 UTC
Upstream PR merged
https://github.com/kubernetes/kubernetes/pull/37228

Origin PR opened
https://github.com/openshift/origin/pull/12669

Comment 16 Troy Dawson 2017-01-31 20:16:18 UTC
This has been merged into ocp and is in OCP v3.5.0.12 or newer.

Comment 17 DeShuai Ma 2017-02-03 09:32:27 UTC
Verified on openshift v3.5.0.14+20b49d0
When pod is terminated, kubelet should remove disk backed emptydir volume.

Steps:
1. Create Failed/Succeeded pods with host disk backed emptyDir volume
$ oc create -f https://raw.githubusercontent.com/mdshuai/testfile-openshift/master/k8s/terminatedpods/emtydir-host.yaml

2. On node, Make sure disk backed emptyDir volume removed when pod become Failed/Succeeded
# ls /var/lib/origin/openshift.local.volumes/pods/${pod.uid}/volumes/kubernetes.io~empty-dir/${volumeName}

Comment 19 errata-xmlrpc 2017-04-12 19:05:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884


Note You need to log in before you can comment on or make changes to this bug.