+++ This bug was initially created as a clone of Bug #1316095 +++ Description of problem: EBS volume stays in "in use" status after removing pods associated with them Version-Release number of selected component (if applicable): OSE v3.1.1.911 with awsElasticBlockStore storage plugin How reproducible: Create 50+ pods where each pod uses one EBS volume and once pods are created start to delete pods, pv,pvc and EBS volumes.Create and delete operation were timely close, eg. create_pods; sleep 30; delete_pods Actual results: EBS volumes remains in "in use" status, even pods which used them were deleted Expected results: After deleting pods, it is expected that EBS volumes move to "available" state from where they can be removed/deleted. Additional info: In AWS web interface, after deleting pods volumes remain in state as showed in attached photo ( in-use). This status does not allow to remove volumes, and necessary to detach them in order to remove volumes. While devices in "in use" ( after deleting pods) , then are not visible on amazon instance ( = OSE node ) in fdisk /proc/partitions outputs --- Additional comment from Jan Safranek on 2016-03-09 20:29:12 CST --- I'll look at it. --- Additional comment from Jan Safranek on 2016-03-16 17:05:22 CST --- The first part, raising the limit to 39 has been merged to Kubernetes 1.2. Admins can adjust the limit by setting env. variable "KUBE_MAX_PD_VOLS" in scheduler process (openshift-master), however kubelet will refuse to attach more that 39 volumes anyway. 'oc describe pod' will show clear message that too many volumes are attached and a pod can't be started. https://github.com/kubernetes/kubernetes/pull/22942 The second part, allowing kubelet to attach more than 39 volumes, is still open and I'm working on it. Tracked here: https://github.com/kubernetes/kubernetes/issues/22994 --- Additional comment from Jeremy Eder on 2016-03-16 19:31:55 CST --- I assume that updated belonged in https://bugzilla.redhat.com/show_bug.cgi?id=1315995 --- Additional comment from Jan Safranek on 2016-03-16 20:43:32 CST --- Oops, sorry, too many open windows... scratch comment #2.
I saw it happen in Elvir's environment, unfortunately openshift does not log enough in these parts to find what's wrong. Trying hard to reproduce it with more logging, it's tedious (starting 50 pods takes a long time).
I've tried on an OSE setup where the fix of https://github.com/openshift/ose/commit/27d9951039933065f416acac3a248eb39536ee5a is applied: openshift v3.1.1.6-29-g9a3b53e kubernetes v1.1.0-origin-1107-g4c8e6f4 etcd 2.1.2 I tried to create ebs volumes, pv, pvc, 20 pods, sleep 120, then delete these pods, pv, pvc and then create these pods again. Tried several times. Can not reproduce it.
(In reply to Hou Jianwei from comment #2) > I've tried on an OSE setup where the fix of > https://github.com/openshift/ose/commit/ > 27d9951039933065f416acac3a248eb39536ee5a is applied: > > openshift v3.1.1.6-29-g9a3b53e > kubernetes v1.1.0-origin-1107-g4c8e6f4 > etcd 2.1.2 > > I tried to create ebs volumes, pv, pvc, 20 pods, sleep 120, then delete > these pods, pv, pvc and then create these pods again. Tried several times. > Can not reproduce it. Can you try to create more pods across more nodes, eg, try 40+ ( 50+ ) pods across 3 ( 4 ) nodes
Elvir, Can you also try with the version Hou is using. The fix removing the cache (https://github.com/openshift/ose/commit/27d9951039933065f416acac3a248eb39536ee5a) introduced a lot of stability. Previously the cache could get out of sync and kubelet would not know which devices need to get detached.
Elvir has confirmed that this bug cannot be reproduced with the latest version of origin. For OSE 3.1 please update https://bugzilla.redhat.com/show_bug.cgi?id=1316095
1. I create 26 ebs attach/detach the instance by aws cli command , and keep these ebs attach/detach 2. Create 26 pv, pvc, pods , sleep 120s, and delete these pods, pv and pvc, and create these pods again. Repeat step 2 several times Can not reproduced it openshift v3.2.0.7 kubernetes v1.2.0-36-g4a3f9c5 etcd 2.2.5
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:1064