Description of problem: Some of /dev/rbdX devices will not be deleted after pods using them are removed Version-Release number of selected component (if applicable): atomic-openshift-sdn-ovs-3.5.0.10-1.git.0.e377fa2.el7.x86_64 atomic-openshift-node-3.5.0.10-1.git.0.e377fa2.el7.x86_64 atomic-openshift-3.5.0.10-1.git.0.e377fa2.el7.x86_64 tuned-profiles-atomic-openshift-node-3.5.0.10-1.git.0.e377fa2.el7.x86_64 atomic-openshift-clients-3.5.0.10-1.git.0.e377fa2.el7.x86_64 atomic-openshift-master-3.5.0.10-1.git.0.e377fa2.el7.x86_64 # uname -a Linux gprfc074.sbu.lab.eng.bos.redhat.com 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux [root@gprfc074 ~]# rpm -qa | grep ceph libcephfs1-10.2.3-13.el7cp.x86_64 ceph-common-10.2.3-13.el7cp.x86_64 python-cephfs-10.2.3-13.el7cp.x86_64 How reproducible: always Steps to Reproduce: This test was run on 3 node OCP cluster. 1. use dynamic storage provisioning for ceph 2. create at least 400 pods , with one RBD mounted inside pod ( 400 pods = 400 rbds ) 3. delete pods fast eg : # for m in $(oc get pods | grep pod- | awk '{print $1}');do oc delete pod $m ; done Actual results: some /dev/rbdX devices will remain mapped - even pods are removed Expected results: Once pods using RBD devices are removed, all RBD devices to be unmapped from OCP node Additional info: These rbds will not be unmapped after some time, will stay in /dev/, and at ceph side for as long as they are not cleaned on nodes (umount, rbd unmapp /dev/rbdX ) If, there is "sleep 1" when pods are deleted issue cannot be reproduced eg: # for m in $(oc get pods | grep pod- | awk '{print $1}');do oc delete pod $m ; sleep 1; done Output from system when rbds are not unmapped properly - below view is visible across all OCP nodes. # mount | grep rbd /dev/rbd0 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-19f84a58-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered) /dev/rbd5 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-25252b38-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered) /dev/rbd22 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-4b2065f8-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered) /dev/rbd36 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-6a75bb2f-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered) /dev/rbd37 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-6cb2301d-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered) /dev/rbd45 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-7e9d1264-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered) /dev/rbd54 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-92c4aec3-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered) /dev/rbd60 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-a033e8ef-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered) /dev/rbd76 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-c426b9b9-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered) [root@gprfc074 nginx]# rbd showmapped id pool image snap device 0 kube kubernetes-dynamic-pvc-19f84a58-ea00-11e6-a775-90b11c188fc7 - /dev/rbd0 22 kube kubernetes-dynamic-pvc-4b2065f8-ea00-11e6-a775-90b11c188fc7 - /dev/rbd22 36 kube kubernetes-dynamic-pvc-6a75bb2f-ea00-11e6-a775-90b11c188fc7 - /dev/rbd36 37 kube kubernetes-dynamic-pvc-6cb2301d-ea00-11e6-a775-90b11c188fc7 - /dev/rbd37 45 kube kubernetes-dynamic-pvc-7e9d1264-ea00-11e6-a775-90b11c188fc7 - /dev/rbd45 5 kube kubernetes-dynamic-pvc-25252b38-ea00-11e6-a775-90b11c188fc7 - /dev/rbd5 54 kube kubernetes-dynamic-pvc-92c4aec3-ea00-11e6-a775-90b11c188fc7 - /dev/rbd54 60 kube kubernetes-dynamic-pvc-a033e8ef-ea00-11e6-a775-90b11c188fc7 - /dev/rbd60 76 kube kubernetes-dynamic-pvc-c426b9b9-ea00-11e6-a775-90b11c188fc7 - /dev/rbd76 [root@gprfc074 nginx]# ls -l /dev/ |grep rbd drwxr-xr-x. 3 root root 60 Feb 3 06:01 rbd brw-rw----. 1 root disk 252, 0 Feb 3 06:01 rbd0 brw-rw----. 1 root disk 230, 0 Feb 3 06:02 rbd22 brw-rw----. 1 root disk 216, 0 Feb 3 06:03 rbd36 brw-rw----. 1 root disk 215, 0 Feb 3 06:03 rbd37 brw-rw----. 1 root disk 207, 0 Feb 3 06:04 rbd45 brw-rw----. 1 root disk 247, 0 Feb 3 06:01 rbd5 brw-rw----. 1 root disk 198, 0 Feb 3 06:04 rbd54 brw-rw----. 1 root disk 192, 0 Feb 3 06:05 rbd60 brw-rw----. 1 root disk 176, 0 Feb 3 06:06 rbd76
http://tracker.ceph.com/issues/18768 talks about slowness in RBD volume deletion in Jewel and root cause.