Hide Forgot
Description of problem: If the volumeDirectory for a node gets filled, OpenShift is unable to delete the the volumes. Version-Release number of selected component (if applicable): 3.2 How reproducible: 100% Steps to Reproduce: 1. Mount lv to volumeDirectory /dev/mapper/rhelos-empty--dir on /var/lib/origin/openshift.local.volumes type xfs (rw,relatime,seclabel,attr2,inode64,grpquota) 2. Schedule pod with empty dir to node and fill volume. > oc rsh docker-registry-27-a02uc sh-4.2$ dd if=/dev/zero of=/registry/test.txt count=1024 bs=1048576 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 3.32775 s, 323 MB/s sh-4.2$ dd if=/dev/zero of=/registry/test1.txt count=1024 bs=1048576 dd: error writing '/registry/test1.txt': No space left on device 166+0 records in 165+0 records out 173813760 bytes (174 MB) copied, 1.53395 s, 113 MB/s 3. Confirm 100% full # df /dev/mapper/rhelos-empty--dir 1251328 1251308 20 100% /var/lib/origin/openshift.local.volumes 4. Remove pod Actual results: Pod in unable to be removed and volume will not be cleaned up. Expected results: Pod to be removed and volume files to be deleted. Additional info: When I manually remove the file the pod gets deleted the volume gets deleted and any new pods get scheduled. LOGS: 6:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.272307 20522 volumes.go:234] Making a volume.Cleaner for volume kubernetes.io~empty-dir/registry-storage of pod cc9ae449-84ee-11e6-a3c6-fa163e46177a Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.272412 20522 volumes.go:316] Used volume plugin "kubernetes.io/empty-dir" to unmount cc9ae449-84ee-11e6-a3c6-fa163e46177a/kubernetes.io~empty-dir Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.272473 20522 volumes.go:234] Making a volume.Cleaner for volume kubernetes.io~secret/registry-token-u4ftc of pod cc9ae449-84ee-11e6-a3c6-fa163e46177a Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.272560 20522 volumes.go:316] Used volume plugin "kubernetes.io/secret" to unmount cc9ae449-84ee-11e6-a3c6-fa163e46177a/kubernetes.io~secret Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: W0927 16:22:40.272629 20522 kubelet.go:1995] Orphaned volume "cc9ae449-84ee-11e6-a3c6-fa163e46177a/registry-storage" found, tearing down volume Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: W0927 16:22:40.291447 20522 mount.go:99] could not determine device for path: "/var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6-fa163e46177a/volumes/kubernetes.io~empty-dir/registry-s Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.291591 20522 empty_dir_linux.go:39] Determining mount medium of /var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6-fa163e46177a/volumes/kubernetes.io~empty-dir/registry-s Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.291643 20522 nsenter_mount.go:180] findmnt command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/findmnt -o target --noheadings --target /var/lib/origin/openshift.local.volumes/pods/cc9a Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.321301 20522 nsenter_mount.go:193] IsLikelyNotMountPoint findmnt output: /var/lib/origin/openshift.local.volumes Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.321443 20522 empty_dir_linux.go:49] Statfs_t of /var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6-fa163e46177a/volumes/kubernetes.io~empty-dir/registry-storage: {Type:14 Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: E0927 16:22:40.322276 20522 kubelet.go:2006] Could not tear down volume "cc9ae449-84ee-11e6-a3c6-fa163e46177a/registry-storage": mkdir /var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6 Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: W0927 16:22:40.322392 20522 kubelet.go:1995] Orphaned volume "cc9ae449-84ee-11e6-a3c6-fa163e46177a/registry-token-u4ftc" found, tearing down volume Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.335078 20522 secret.go:223] Tearing down volume registry-token-u4ftc for pod cc9ae449-84ee-11e6-a3c6-fa163e46177a at /var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6-fa Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.335238 20522 empty_dir_linux.go:39] Determining mount medium of /var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6-fa163e46177a/volumes/kubernetes.io~secret/registry-toke Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.335303 20522 nsenter_mount.go:180] findmnt command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/findmnt -o target --noheadings --target /var/lib/origin/openshift.local.volumes/pods/cc9a Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.348579 20522 nsenter_mount.go:193] IsLikelyNotMountPoint findmnt output: /var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6-fa163e46177a/volumes/kubernetes.io~secret/regi Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.348716 20522 empty_dir_linux.go:49] Statfs_t of /var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6-fa163e46177a/volumes/kubernetes.io~secret/registry-token-u4ftc: {Type:1 Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.348849 20522 nsenter_mount.go:150] Unmount command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/umount /var/lib/origin/openshift.local.volumes/pods/cc9ae449-84ee-11e6-a3c6-fa163e46177a/ Sep 27 16:22:40 node-1.openshift.com atomic-openshift-node[20483]: I0927 16:22:40.372686 20522 kubelet.go:1917] Orphaned pod "cc9ae449-84ee-11e6-a3c6-fa163e46177a" found, but volumes are not cleaned up; err: <nil>, volumes: [0xc2099ce7a0]
so correct me if I am wrong but since an emptyDir volume type is backed up by volume on which node itself is running (i.e root partition of node is shared as an emptyDir within the pod), filling the emptyDir volume directory from pod means - filling the root directory of the node itself. Isn't that problematic to begin with? i.e - you never want to fill root directory of your Linux system since that makes all bets off?
I am wondering if you mean emptyDir or hostPath volume type. Please confirm.
Spoke with @rhowe on IRC and discussed this in person. I think this bug has been fixed in openshift-1.3, because I couldn't reproduce this in that version: # fallocate -l 400M /opt2/baz.img fallocate: fallocate failed: No space left on device # df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/docker-252:1-1705139-2a1520170d8bb8a97de7884bf64cc39f2973a1ed14021e8c7a0394768b00f736 10G 630M 9.4G 7% / tmpfs 1001M 0 1001M 0% /dev tmpfs 1001M 0 1001M 0% /sys/fs/cgroup /dev/vdb1 2.0G 2.0G 0 100% /opt2 /dev/vda1 40G 2.8G 35G 8% /run/secrets shm 64M 0 64M 0% /dev/shm tmpfs 1001M 16K 1001M 1% /run/secrets/kubernetes.io/serviceaccount # [vagrant@os3 ~]$ oc get pods NAME READY STATUS RESTARTS AGE deployment-example-1-mm3lh 1/1 Running 0 1h mysql-foo-1-itw33 1/1 Running 0 19m nginx-brick-pod 1/1 Running 0 2m [vagrant@os3 ~]$ oc delete pod nginx-brick-pod pod "nginx-brick-pod" deleted [vagrant@os3 ~]$ oc get pods NAME READY STATUS RESTARTS AGE deployment-example-1-mm3lh 1/1 Running 0 1h mysql-foo-1-itw33 1/1 Running 0 19m Having said that, I will try and find the commit that fixed this bug, so as we are in all clear.
I am closing this as fixed since I can no longer reproduce it with latest version.