Red Hat Bugzilla – Bug 1316095
AWS volumes remains in "in-use" status after deleting OSE pods which used them - OSE v126.96.36.1991
Last modified: 2016-06-07 18:47:12 EDT
Created attachment 1134490 [details]
Description of problem:
EBS volume stays in "in use" status after removing pods associated with them
Version-Release number of selected component (if applicable):
OSE v188.8.131.521 with awsElasticBlockStore storage plugin
Create 50+ pods where each pod uses one EBS volume and once pods are created start to delete pods, pv,pvc and EBS volumes.Create and delete operation were timely close, eg. create_pods; sleep 30; delete_pods
EBS volumes remains in "in use" status, even pods which used them were deleted
After deleting pods, it is expected that EBS volumes move to "available" state from where they can be removed/deleted.
In AWS web interface, after deleting pods volumes remain in state as showed in attached photo ( in-use). This status does not allow to remove volumes, and necessary to detach them in order to remove volumes.
While devices in "in use" ( after deleting pods) , then are not visible on amazon instance ( = OSE node ) in
I'll look at it.
The first part, raising the limit to 39 has been merged to Kubernetes 1.2. Admins can adjust the limit by setting env. variable "KUBE_MAX_PD_VOLS" in scheduler process (openshift-master), however kubelet will refuse to attach more that 39 volumes anyway. 'oc describe pod' will show clear message that too many volumes are attached and a pod can't be started.
The second part, allowing kubelet to attach more than 39 volumes, is still open and I'm working on it. Tracked here: https://github.com/kubernetes/kubernetes/issues/22994
I assume that updated belonged in https://bugzilla.redhat.com/show_bug.cgi?id=1315995
Oops, sorry, too many open windows... scratch comment #2.
> Create 50+ pods where each pod uses one EBS volume and once pods are created
> start to delete pods, pv,pvc and EBS volumes.Create and delete operation
> were timely close, eg. create_pods; sleep 30; delete_pods
OSE 3.1 currently supports only 11 EBS volumes. This may get raised to 39. Can you reproduce this issue with < 11 volumes ? If not then this should be marked as a duplicate of #1315995
I am surprised you are not getting "Value (/dev/XYZ) for parameter device is invalid." errors. Are all 50 volumes getting attached and the pods are running ? Or are you just cycling through the pods n at a time where n < 11 ?
(In reply to Sami Wagiaalla from comment #6)
> > Create 50+ pods where each pod uses one EBS volume and once pods are created
> > start to delete pods, pv,pvc and EBS volumes.Create and delete operation
> > were timely close, eg. create_pods; sleep 30; delete_pods
> OSE 3.1 currently supports only 11 EBS volumes. This may get raised to 39.
> Can you reproduce this issue with < 11 volumes ? If not then this should be
> marked as a duplicate of #1315995
I did not tried < 11 volumes, in my tests I tried to reach max values.
> I am surprised you are not getting "Value (/dev/XYZ) for parameter device is
> invalid." errors.
I am getting this error when exceed limit of 11 EBSes - but during create phase.This issue is visible after pods/pv/pvc are removed.
Are all 50 volumes getting attached and the pods are
> running ? Or are you just cycling through the pods n at a time where n < 11 ?
if started more pods asking for number of EBSs > 11 , then not all EBEs are attached - only up to limit per ose node ( 11 EBSes ). Only pods which starts get EBS device ( one ebs per pod ). Other EBSes are not visible at OSE side, in amazon console they are marked as "available"
This bz is for case when pods,pv,pvc are deleted ( oc get pv/pods,pvc is not showing anything ), but even pods/pvc/pv are gone, EBSes used previously by pods are still ""in use" and attached to ec2 instance used for ose node.Attachment to BZ is illustrating it, EBS showed "in use" even no pods using them.
Elvir, I tried hard to reproduce this bug, but OpenShift always worked as expected, detaching all volumes. Sometimes I ended with very confused AWS - it refused to attach any volumes to a node, but that's not what you see.
Can you try with the newest OpenShift build? atomic-openshift-184.108.40.206 should does not have buggy AWS attachment cache, it should help.
If you can still reproduce it, can you please share with us how did you install the systems (i.e. what AWS AMI do you use, your ansible inventory + any additional steps you did).
I do not have atomic-openshift-220.127.116.11 installed, will do it today/tomorrow and test with these bits and will follow up. Thank you
I tested with below packages ( 38 pods per ose node )
and cannot reproduce issue again with these packages.
I tried same approach as before ( which led to issue )
create pods, sleep 300s, delete pods,sleep 30s; delete pv; delete pvc, sleep 120s, delete EBS volumes... worked fine as expected
Previously last step when tried to remove EBS volumes was problematic and failing, now it does not.
Thanks a lot for the test! I'm marking it as fixed.
I have verified of this on origin, it works well with the max pods numbers(39), all the volumes can be deleted after releasing. Change the status to verified.