Bug 1489603 - Volume unmounted but not being detached from node
Summary: Volume unmounted but not being detached from node
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.6.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.9.0
Assignee: Hemant Kumar
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-07 21:12 UTC by Hemant Kumar
Modified: 2018-03-28 14:06 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-28 14:06:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:06:48 UTC

Description Hemant Kumar 2017-09-07 21:12:29 UTC
I am seeing many instances of volumes being unmounted but not detached from node on various Openshift clusters. Need to find out why is it happening.

Comment 5 Hemant Kumar 2017-09-14 19:57:58 UTC
As I have stated above, the root cause of this bug was:

1. User created a pod with volume, but volume was stuck in "attaching" state for more than 1 hour.
2. AttachDetach Controller gave up after certain time and this volume was not added to actual_state_of_World of A/D Controller.
3. Eventually attach succeeds but A/D controller no longer knows about this volume. 

Obviously the main thing is - volume shouldn't have been stuck in attaching state for such a long time. We have to work with Amazon to find solution for that problem.

Comment 9 David Caldwell 2017-10-16 13:04:34 UTC
Hey guys, 

Any updates on this issue? 

Is there a workaround?

Thanks,

David.

Comment 10 Hemant Kumar 2017-10-16 18:01:14 UTC
Each instance of this problem is caused by different underlying problem. Can you give some more details about customer's problem?

This bug I opened is caused by - a volume being stuck in "attaching" state too long and then user deletes the pod while waiting for pod to come up. Volume attach eventually succeeds but because attach finishes outside the expiry Window of attach/detach controller, it doesn't know about the volume and hence it never gets detached.

I am not sure if incident you linked is same as what I outlined above. It may be that symptoms are similar from outside but root cause can be different.

I would request you to open a new bug with following details:

1. PV & PVC yaml
2. output of describe pv and pvc
3. Node logs where this happened.
4. Controller log during same time period.

Comment 11 Hemant Kumar 2017-12-20 01:48:15 UTC
We have opened a PR against Openshift-3.8 which will cause all dangling volumes to correct itself - https://github.com/openshift/origin/pull/17544

Specific commit that includes the fix is - https://github.com/openshift/origin/pull/17544/commits/2885375c4d0f1738dc45a013e11d64d638f0f050

Comment 13 Hemant Kumar 2018-01-18 22:55:46 UTC
Yes that is fine. The fix has been merged in 3.9. Moving to modified.

Comment 15 Chao Yang 2018-01-24 07:41:06 UTC
This is passed on 
oc v3.9.0-0.23.0
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-14-251.ec2.internal:443
openshift v3.9.0-0.23.0
kubernetes v1.9.1+a0ce1bc657


1. Make sure the pod is in ContainerCreating due to volume could not attach
[root@ip-172-18-14-251 ~]# oc get pods
NAME      READY     STATUS              RESTARTS   AGE
mypod1    0/1       ContainerCreating   0          1h
2. Let the pod become running
[root@ip-172-18-14-251 ~]# oc get pods
NAME      READY     STATUS    RESTARTS   AGE
mypod1    1/1       Running   0          1h
3. Delete the pod, check volume is detached and become available

Comment 18 errata-xmlrpc 2018-03-28 14:06:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.