1318975 – AWS volumes remains in "in-use" status after deleting OSE pods which used them

Bug 1318975 - AWS volumes remains in "in-use" status after deleting OSE pods which used them

Summary: AWS volumes remains in "in-use" status after deleting OSE pods which used them

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	3.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Sami Wagiaalla
QA Contact:	Jianwei Hou
Docs Contact:
URL:
Whiteboard:
Depends On:	1316095
Blocks:
TreeView+	depends on / blocked

Reported:	2016-03-18 10:00 UTC by Jianwei Hou
Modified:	2017-02-27 12:11 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	1316095
Environment:
Last Closed:	2016-05-12 16:33:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2016:1064	0	normal	SHIPPED_LIVE	Important: Red Hat OpenShift Enterprise 3.2 security, bug fix, and enhancement update	2016-05-12 20:19:17 UTC

Description Jianwei Hou 2016-03-18 10:00:57 UTC

+++ This bug was initially created as a clone of Bug #1316095 +++

Description of problem:
EBS volume stays in "in use" status after removing pods associated with them

Version-Release number of selected component (if applicable):

OSE v3.1.1.911 with awsElasticBlockStore storage plugin

How reproducible:
Create 50+ pods where each pod uses one EBS volume and once pods are created start to delete pods, pv,pvc and EBS volumes.Create and delete operation were timely close, eg. create_pods; sleep 30; delete_pods 

Actual results:

EBS volumes remains in "in use" status, even pods which used them were deleted 

Expected results:

After deleting pods, it is expected that EBS volumes move to "available" state from where they can be removed/deleted. 

Additional info:

In AWS web interface, after deleting pods volumes remain in state as showed in attached photo ( in-use). This status does not allow to remove volumes, and necessary to detach them  in order to remove volumes.

While devices in "in use" ( after deleting pods) , then are not visible on amazon instance ( = OSE node ) in 

fdisk 
/proc/partitions 

outputs

--- Additional comment from Jan Safranek on 2016-03-09 20:29:12 CST ---

I'll look at it.

--- Additional comment from Jan Safranek on 2016-03-16 17:05:22 CST ---

The first part, raising the limit to 39 has been merged to Kubernetes 1.2. Admins can adjust the limit by setting env. variable "KUBE_MAX_PD_VOLS" in scheduler process (openshift-master), however kubelet will refuse to attach more that 39 volumes anyway. 'oc describe pod' will show clear message that too many volumes are attached and a pod can't be started.

https://github.com/kubernetes/kubernetes/pull/22942


The second part, allowing kubelet to attach more than 39 volumes, is still open and I'm working on it. Tracked here: https://github.com/kubernetes/kubernetes/issues/22994

--- Additional comment from Jeremy Eder on 2016-03-16 19:31:55 CST ---

I assume that updated belonged in https://bugzilla.redhat.com/show_bug.cgi?id=1315995

--- Additional comment from Jan Safranek on 2016-03-16 20:43:32 CST ---

Oops, sorry, too many open windows... scratch comment #2.

Comment 1 Jan Safranek 2016-03-18 15:03:31 UTC

I saw it happen in Elvir's environment, unfortunately openshift does not log enough in these parts to find what's wrong. Trying hard to reproduce it with more logging, it's tedious (starting 50 pods takes a long time).

Comment 2 Jianwei Hou 2016-03-21 11:29:34 UTC

I've tried on an OSE setup where the fix of https://github.com/openshift/ose/commit/27d9951039933065f416acac3a248eb39536ee5a is applied:

openshift v3.1.1.6-29-g9a3b53e
kubernetes v1.1.0-origin-1107-g4c8e6f4
etcd 2.1.2

I tried to create ebs volumes, pv, pvc, 20 pods, sleep 120, then delete these pods, pv, pvc and then create these pods again. Tried several times. Can not reproduce it.

Comment 3 Elvir Kuric 2016-03-21 11:54:44 UTC

(In reply to Hou Jianwei from comment #2)
> I've tried on an OSE setup where the fix of
> https://github.com/openshift/ose/commit/
> 27d9951039933065f416acac3a248eb39536ee5a is applied:
> 
> openshift v3.1.1.6-29-g9a3b53e
> kubernetes v1.1.0-origin-1107-g4c8e6f4
> etcd 2.1.2
> 
> I tried to create ebs volumes, pv, pvc, 20 pods, sleep 120, then delete
> these pods, pv, pvc and then create these pods again. Tried several times.
> Can not reproduce it.
Can you try to  create more pods across more nodes, eg, try 40+ ( 50+ )  pods across 3 ( 4 ) nodes

Comment 4 Sami Wagiaalla 2016-03-21 13:41:00 UTC

Elvir,

Can you also try with the version Hou is using. The fix removing the cache (https://github.com/openshift/ose/commit/27d9951039933065f416acac3a248eb39536ee5a) introduced a lot of stability. Previously the cache could get out of sync and kubelet would not know which devices need to get detached.

Comment 11 Sami Wagiaalla 2016-03-22 15:20:22 UTC

Elvir has confirmed that this bug cannot be reproduced with the latest version of origin. 

For OSE 3.1 please update https://bugzilla.redhat.com/show_bug.cgi?id=1316095

Comment 12 Chao Yang 2016-03-24 13:21:20 UTC

1. I create 26 ebs attach/detach the instance by aws cli command , and keep these ebs attach/detach
2. Create 26 pv, pvc, pods , sleep 120s, and delete these pods, pv and pvc, and create these pods again.

Repeat step 2 several times
Can not reproduced it

openshift v3.2.0.7
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

Comment 15 errata-xmlrpc 2016-05-12 16:33:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064

Note You need to log in before you can comment on or make changes to this bug.