I am seeing many instances of volumes being unmounted but not detached from node on various Openshift clusters. Need to find out why is it happening.
As I have stated above, the root cause of this bug was: 1. User created a pod with volume, but volume was stuck in "attaching" state for more than 1 hour. 2. AttachDetach Controller gave up after certain time and this volume was not added to actual_state_of_World of A/D Controller. 3. Eventually attach succeeds but A/D controller no longer knows about this volume. Obviously the main thing is - volume shouldn't have been stuck in attaching state for such a long time. We have to work with Amazon to find solution for that problem.
Hey guys, Any updates on this issue? Is there a workaround? Thanks, David.
Each instance of this problem is caused by different underlying problem. Can you give some more details about customer's problem? This bug I opened is caused by - a volume being stuck in "attaching" state too long and then user deletes the pod while waiting for pod to come up. Volume attach eventually succeeds but because attach finishes outside the expiry Window of attach/detach controller, it doesn't know about the volume and hence it never gets detached. I am not sure if incident you linked is same as what I outlined above. It may be that symptoms are similar from outside but root cause can be different. I would request you to open a new bug with following details: 1. PV & PVC yaml 2. output of describe pv and pvc 3. Node logs where this happened. 4. Controller log during same time period.
We have opened a PR against Openshift-3.8 which will cause all dangling volumes to correct itself - https://github.com/openshift/origin/pull/17544 Specific commit that includes the fix is - https://github.com/openshift/origin/pull/17544/commits/2885375c4d0f1738dc45a013e11d64d638f0f050
Yes that is fine. The fix has been merged in 3.9. Moving to modified.
This is passed on oc v3.9.0-0.23.0 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-14-251.ec2.internal:443 openshift v3.9.0-0.23.0 kubernetes v1.9.1+a0ce1bc657 1. Make sure the pod is in ContainerCreating due to volume could not attach [root@ip-172-18-14-251 ~]# oc get pods NAME READY STATUS RESTARTS AGE mypod1 0/1 ContainerCreating 0 1h 2. Let the pod become running [root@ip-172-18-14-251 ~]# oc get pods NAME READY STATUS RESTARTS AGE mypod1 1/1 Running 0 1h 3. Delete the pod, check volume is detached and become available
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489