Bug 1489603
| Summary: | Volume unmounted but not being detached from node | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Hemant Kumar <hekumar> |
| Component: | Storage | Assignee: | Hemant Kumar <hekumar> |
| Status: | CLOSED ERRATA | QA Contact: | Chao Yang <chaoyang> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.6.1 | CC: | aos-bugs, aos-storage-staff, bchilds, dcaldwel, hekumar, lxia, tlarsson |
| Target Milestone: | --- | ||
| Target Release: | 3.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-03-28 14:06:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Hemant Kumar
2017-09-07 21:12:29 UTC
As I have stated above, the root cause of this bug was: 1. User created a pod with volume, but volume was stuck in "attaching" state for more than 1 hour. 2. AttachDetach Controller gave up after certain time and this volume was not added to actual_state_of_World of A/D Controller. 3. Eventually attach succeeds but A/D controller no longer knows about this volume. Obviously the main thing is - volume shouldn't have been stuck in attaching state for such a long time. We have to work with Amazon to find solution for that problem. Hey guys, Any updates on this issue? Is there a workaround? Thanks, David. Each instance of this problem is caused by different underlying problem. Can you give some more details about customer's problem? This bug I opened is caused by - a volume being stuck in "attaching" state too long and then user deletes the pod while waiting for pod to come up. Volume attach eventually succeeds but because attach finishes outside the expiry Window of attach/detach controller, it doesn't know about the volume and hence it never gets detached. I am not sure if incident you linked is same as what I outlined above. It may be that symptoms are similar from outside but root cause can be different. I would request you to open a new bug with following details: 1. PV & PVC yaml 2. output of describe pv and pvc 3. Node logs where this happened. 4. Controller log during same time period. We have opened a PR against Openshift-3.8 which will cause all dangling volumes to correct itself - https://github.com/openshift/origin/pull/17544 Specific commit that includes the fix is - https://github.com/openshift/origin/pull/17544/commits/2885375c4d0f1738dc45a013e11d64d638f0f050 Yes that is fine. The fix has been merged in 3.9. Moving to modified. This is passed on oc v3.9.0-0.23.0 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-14-251.ec2.internal:443 openshift v3.9.0-0.23.0 kubernetes v1.9.1+a0ce1bc657 1. Make sure the pod is in ContainerCreating due to volume could not attach [root@ip-172-18-14-251 ~]# oc get pods NAME READY STATUS RESTARTS AGE mypod1 0/1 ContainerCreating 0 1h 2. Let the pod become running [root@ip-172-18-14-251 ~]# oc get pods NAME READY STATUS RESTARTS AGE mypod1 1/1 Running 0 1h 3. Delete the pod, check volume is detached and become available Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489 |