Bug 1418988 - rbd devices are not unmapped once pods are deleted
Summary: rbd devices are not unmapped once pods are deleted
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.5.0
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: hchen
QA Contact: Jianwei Hou
URL:
Whiteboard: aos-scalability-35
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-03 11:22 UTC by Elvir Kuric
Modified: 2018-05-25 14:51 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-01-11 19:37:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Elvir Kuric 2017-02-03 11:22:39 UTC
Description of problem:

Some of /dev/rbdX devices will not be deleted after pods using them are removed 

Version-Release number of selected component (if applicable):

atomic-openshift-sdn-ovs-3.5.0.10-1.git.0.e377fa2.el7.x86_64
atomic-openshift-node-3.5.0.10-1.git.0.e377fa2.el7.x86_64
atomic-openshift-3.5.0.10-1.git.0.e377fa2.el7.x86_64
tuned-profiles-atomic-openshift-node-3.5.0.10-1.git.0.e377fa2.el7.x86_64
atomic-openshift-clients-3.5.0.10-1.git.0.e377fa2.el7.x86_64
atomic-openshift-master-3.5.0.10-1.git.0.e377fa2.el7.x86_64

# uname -a 
Linux gprfc074.sbu.lab.eng.bos.redhat.com 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
[root@gprfc074 ~]# rpm -qa | grep ceph
libcephfs1-10.2.3-13.el7cp.x86_64
 ceph-common-10.2.3-13.el7cp.x86_64
python-cephfs-10.2.3-13.el7cp.x86_64


How reproducible:

always 

Steps to Reproduce:

This test was run on 3 node OCP cluster. 

1. use dynamic storage provisioning for ceph 
2. create at least 400 pods , with one RBD mounted inside pod ( 400 pods = 400 rbds )

3. delete pods fast 
eg : 

# for m in $(oc get pods | grep pod- |  awk '{print $1}');do  oc delete pod $m ; done



Actual results:

some /dev/rbdX devices will remain mapped - even pods are removed 


Expected results:

Once pods using RBD devices are removed, all RBD devices to be unmapped from OCP node 

Additional info:
These rbds will not be unmapped after some time, will stay in /dev/, and at ceph side for as long as they are not cleaned on nodes  (umount, rbd unmapp /dev/rbdX ) 

If, there is "sleep 1" when pods are deleted issue cannot be reproduced 
eg: 
# for m in $(oc get pods | grep pod- |  awk '{print $1}');do  oc delete pod $m ; sleep 1; done


Output from system when rbds are not unmapped properly - below view is visible across all OCP nodes. 

# mount | grep rbd
/dev/rbd0 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-19f84a58-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)
/dev/rbd5 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-25252b38-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)
/dev/rbd22 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-4b2065f8-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)
/dev/rbd36 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-6a75bb2f-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)
/dev/rbd37 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-6cb2301d-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)
/dev/rbd45 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-7e9d1264-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)
/dev/rbd54 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-92c4aec3-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)
/dev/rbd60 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-a033e8ef-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)
/dev/rbd76 on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/rbd/rbd/kube-image-kubernetes-dynamic-pvc-c426b9b9-ea00-11e6-a775-90b11c188fc7 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)



[root@gprfc074 nginx]# rbd showmapped 
id pool image                                                       snap device     
0  kube kubernetes-dynamic-pvc-19f84a58-ea00-11e6-a775-90b11c188fc7 -    /dev/rbd0  
22 kube kubernetes-dynamic-pvc-4b2065f8-ea00-11e6-a775-90b11c188fc7 -    /dev/rbd22 
36 kube kubernetes-dynamic-pvc-6a75bb2f-ea00-11e6-a775-90b11c188fc7 -    /dev/rbd36 
37 kube kubernetes-dynamic-pvc-6cb2301d-ea00-11e6-a775-90b11c188fc7 -    /dev/rbd37 
45 kube kubernetes-dynamic-pvc-7e9d1264-ea00-11e6-a775-90b11c188fc7 -    /dev/rbd45 
5  kube kubernetes-dynamic-pvc-25252b38-ea00-11e6-a775-90b11c188fc7 -    /dev/rbd5  
54 kube kubernetes-dynamic-pvc-92c4aec3-ea00-11e6-a775-90b11c188fc7 -    /dev/rbd54 
60 kube kubernetes-dynamic-pvc-a033e8ef-ea00-11e6-a775-90b11c188fc7 -    /dev/rbd60 
76 kube kubernetes-dynamic-pvc-c426b9b9-ea00-11e6-a775-90b11c188fc7 -    /dev/rbd76 


[root@gprfc074 nginx]# ls -l /dev/ |grep rbd 
drwxr-xr-x.  3 root root          60 Feb  3 06:01 rbd
brw-rw----.  1 root disk    252,   0 Feb  3 06:01 rbd0
brw-rw----.  1 root disk    230,   0 Feb  3 06:02 rbd22
brw-rw----.  1 root disk    216,   0 Feb  3 06:03 rbd36
brw-rw----.  1 root disk    215,   0 Feb  3 06:03 rbd37
brw-rw----.  1 root disk    207,   0 Feb  3 06:04 rbd45
brw-rw----.  1 root disk    247,   0 Feb  3 06:01 rbd5
brw-rw----.  1 root disk    198,   0 Feb  3 06:04 rbd54
brw-rw----.  1 root disk    192,   0 Feb  3 06:05 rbd60
brw-rw----.  1 root disk    176,   0 Feb  3 06:06 rbd76

Comment 7 Ben England 2018-05-25 14:51:05 UTC
http://tracker.ceph.com/issues/18768 talks about slowness in RBD volume deletion in Jewel and root cause.


Note You need to log in before you can comment on or make changes to this bug.