Bug 1379850

Summary: OpenShift is unable to remove volume if empty_dir fills volume.
Product: OpenShift Container Platform Reporter: Ryan Howe <rhowe>
Component: StorageAssignee: Hemant Kumar <hekumar>
Status: CLOSED CURRENTRELEASE QA Contact: Jianwei Hou <jhou>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.2.1CC: aos-bugs, rhowe
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-13 15:51:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Ryan Howe 2016-09-27 20:59:53 UTC
Description of problem:

If the volumeDirectory for a node gets filled, OpenShift is unable to delete the the volumes. 


Version-Release number of selected component (if applicable):
3.2

How reproducible:
100%

Steps to Reproduce:
1. Mount lv to volumeDirectory
/dev/mapper/rhelos-empty--dir on /var/lib/origin/openshift.local.volumes type xfs (rw,relatime,seclabel,attr2,inode64,grpquota)


2. Schedule pod with empty dir to node and fill volume. 

> oc rsh docker-registry-27-a02uc 
sh-4.2$ dd if=/dev/zero of=/registry/test.txt count=1024 bs=1048576
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 3.32775 s, 323 MB/s
sh-4.2$ dd if=/dev/zero of=/registry/test1.txt count=1024 bs=1048576
dd: error writing '/registry/test1.txt': No space left on device
166+0 records in
165+0 records out
173813760 bytes (174 MB) copied, 1.53395 s, 113 MB/s


3. Confirm 100% full 

# df 
/dev/mapper/rhelos-empty--dir   1251328 1251308        20 100% /var/lib/origin/openshift.local.volumes

4. Remove pod

Actual results:
Pod in unable to be removed and  volume will not be cleaned up. 


Expected results:
Pod to be removed and volume files to be deleted. 

Additional info:

When I manually remove the file the pod gets deleted the volume gets deleted and any new pods get scheduled. 



LOGS: 


6:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.272307   20522 volumes.go:234] Making a volume.Cleaner for volume kubernetes.io~empty-dir/registry-storage of pod cc9ae449-84ee-11e6-a3c6-fa163e46177a
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.272412   20522 volumes.go:316] Used volume plugin "kubernetes.io/empty-dir" to unmount cc9ae449-84ee-11e6-a3c6-fa163e46177a/kubernetes.io~empty-dir
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.272473   20522 volumes.go:234] Making a volume.Cleaner for volume kubernetes.io~secret/registry-token-u4ftc of pod cc9ae449-84ee-11e6-a3c6-fa163e46177a
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.272560   20522 volumes.go:316] Used volume plugin "kubernetes.io/secret" to unmount cc9ae449-84ee-11e6-a3c6-fa163e46177a/kubernetes.io~secret
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: W0927 16:22:40.272629   20522 kubelet.go:1995] Orphaned volume "cc9ae449-84ee-11e6-a3c6-fa163e46177a/registry-storage" found, tearing down volume
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: W0927 16:22:40.291447   20522 mount.go:99] could not determine device for path: "/var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6-fa163e46177a/volumes/kubernetes.io~empty-dir/registry-s
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.291591   20522 empty_dir_linux.go:39] Determining mount medium of /var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6-fa163e46177a/volumes/kubernetes.io~empty-dir/registry-s
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.291643   20522 nsenter_mount.go:180] findmnt command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/findmnt -o target --noheadings --target /var/lib/origin/openshift.local.volumes/pods/cc9a
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.321301   20522 nsenter_mount.go:193] IsLikelyNotMountPoint findmnt output: /var/lib/origin/openshift.local.volumes
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.321443   20522 empty_dir_linux.go:49] Statfs_t of /var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6-fa163e46177a/volumes/kubernetes.io~empty-dir/registry-storage: {Type:14
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: E0927 16:22:40.322276   20522 kubelet.go:2006] Could not tear down volume "cc9ae449-84ee-11e6-a3c6-fa163e46177a/registry-storage": mkdir /var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: W0927 16:22:40.322392   20522 kubelet.go:1995] Orphaned volume "cc9ae449-84ee-11e6-a3c6-fa163e46177a/registry-token-u4ftc" found, tearing down volume
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.335078   20522 secret.go:223] Tearing down volume registry-token-u4ftc for pod cc9ae449-84ee-11e6-a3c6-fa163e46177a at /var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6-fa
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.335238   20522 empty_dir_linux.go:39] Determining mount medium of /var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6-fa163e46177a/volumes/kubernetes.io~secret/registry-toke
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.335303   20522 nsenter_mount.go:180] findmnt command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/findmnt -o target --noheadings --target /var/lib/origin/openshift.local.volumes/pods/cc9a
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.348579   20522 nsenter_mount.go:193] IsLikelyNotMountPoint findmnt output: /var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6-fa163e46177a/volumes/kubernetes.io~secret/regi
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.348716   20522 empty_dir_linux.go:49] Statfs_t of /var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6-fa163e46177a/volumes/kubernetes.io~secret/registry-token-u4ftc: {Type:1
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.348849   20522 nsenter_mount.go:150] Unmount command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/umount /var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6-fa163e46177a/
Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.372686   20522 kubelet.go:1917] Orphaned pod "cc9ae449-84ee-11e6-a3c6-fa163e46177a" found, but volumes are not cleaned up; err: <nil>, volumes: [0xc2099ce7a0]

Comment 1 Hemant Kumar 2016-10-03 18:20:57 UTC
so correct me if I am wrong but since an emptyDir volume type is backed up by volume on which node itself is running (i.e root partition of node is shared as an emptyDir within the pod), filling the emptyDir volume directory from pod means - filling the root directory of the node itself.  Isn't that problematic to begin with? i.e - you never want to fill root directory of your Linux system since that makes all bets off?

Comment 2 Hemant Kumar 2016-10-03 18:40:04 UTC
I am wondering if you mean emptyDir or hostPath volume type. Please confirm.

Comment 3 Hemant Kumar 2016-10-03 19:58:17 UTC
Spoke with @rhowe on IRC and discussed this in person.

I think this bug has been fixed in openshift-1.3, because I couldn't reproduce this in that version:

# fallocate -l 400M /opt2/baz.img
fallocate: fallocate failed: No space left on device
# df -h
Filesystem                                                                                         Size  Used Avail Use% Mounted on
/dev/mapper/docker-252:1-1705139-2a1520170d8bb8a97de7884bf64cc39f2973a1ed14021e8c7a0394768b00f736   10G  630M  9.4G   7% /
tmpfs                                                                                             1001M     0 1001M   0% /dev
tmpfs                                                                                             1001M     0 1001M   0% /sys/fs/cgroup
/dev/vdb1                                                                                          2.0G  2.0G     0 100% /opt2
/dev/vda1                                                                                           40G  2.8G   35G   8% /run/secrets
shm                                                                                                 64M     0   64M   0% /dev/shm
tmpfs                                                                                             1001M   16K 1001M   1% /run/secrets/kubernetes.io/serviceaccount
# 
[vagrant@os3 ~]$ oc get pods
NAME                         READY     STATUS    RESTARTS   AGE
deployment-example-1-mm3lh   1/1       Running   0          1h
mysql-foo-1-itw33            1/1       Running   0          19m
nginx-brick-pod              1/1       Running   0          2m
[vagrant@os3 ~]$ oc delete pod nginx-brick-pod
pod "nginx-brick-pod" deleted
[vagrant@os3 ~]$ oc get pods
NAME                         READY     STATUS    RESTARTS   AGE
deployment-example-1-mm3lh   1/1       Running   0          1h
mysql-foo-1-itw33            1/1       Running   0          19m


Having said that, I will try and find the commit that fixed this bug, so as we are in all clear.

Comment 4 Hemant Kumar 2016-10-13 15:51:38 UTC
I am closing this as fixed since I can no longer reproduce it with latest version.