Bug 1446788

Summary: Volume failed to detach even after unmount is successful on the node
Product: OpenShift Container Platform Reporter: Hemant Kumar <hekumar>
Component: StorageAssignee: Hemant Kumar <hekumar>
Status: CLOSED ERRATA QA Contact: Chao Yang <chaoyang>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: aos-bugs, eparis, jhou, sjenning, smunilla, vgoyal
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Openshift does not attempt detach operation for pods that are completed or terminated but not deleted from API server. Consequence: Volumes can be left attached to old nodes, preventing reuse of volume in other pods. Fix: Implement support for detaching volumes for pods that are completed or terminated. Result: After this bug is fixed - volumes for terminated or completed pods are detached automatically. Users are free to reuse such volumes in other pods.
Story Points: ---
Clone Of:
: 1450215 (view as bug list) Environment:
Last Closed: 2017-08-10 05:21:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1450215    

Description Hemant Kumar 2017-04-28 22:17:23 UTC
Description of problem:

While investigating one of the containers which was stuck in ContainerCreating state, I noticed that volume that it is trying to use is attached on another node. 

When I logged into this other node, I saw that volume is indeed attached but completely unmounted. The last line of node logs look like:

operation_executor.go:1267] UnmountDevice succeeded for volume "kubernetes.io/aws-ebs/aws://us-
east-2a/vol-03b5d7dbf226280fa" (spec.Name: "pvc-66da286a-2bb2-11e7-9954-02e52a0be43d").

Jumping back to controller, I could see no detach has been attempted for this volume and as a result pod was stuck in ContainerCreating state.

The controller had uptime of 21 hours and hence it hasn't been restarted since volume was attached to the node in question.




Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 6 Hemant Kumar 2017-05-01 22:05:17 UTC
I think "device is busy" error was confusing and wasn't really source of problems in this case. The root cause of the problem here is - a terminated pod doesn't detaches volumes in Kubernetes. I have opened a upstream bug to track this - https://github.com/kubernetes/kubernetes/issues/45191

Comment 9 Hemant Kumar 2017-05-04 00:40:27 UTC
I have opened a PR that fixes this - https://github.com/kubernetes/kubernetes/pull/45286

Comment 11 Hemant Kumar 2017-05-15 13:55:09 UTC
PR opened https://github.com/openshift/origin/pull/14191

Comment 14 Chao Yang 2017-06-27 07:18:55 UTC
Test is passed on container env.
oc version
oc v3.6.121
kubernetes v1.6.1+5115d708d7

1.Create pvc 
{
  "kind": "PersistentVolumeClaim",
  "apiVersion": "v1",
  "metadata": {
    "name": "ebsc",
    "annotations": {
        "volume.beta.kubernetes.io/storage-class": "gp2"
    }
  },
  "spec": {
    "accessModes": [
      "ReadWriteOnce"
    ],
    "resources": {
      "requests": {
        "storage": "1Gi"
      }
    }
  }
}
2.Create pod
kind: Pod
apiVersion: v1
metadata:
  name: test-pod
spec:
  containers:
  - name: test-pod
    image: gcr.io/google_containers/busybox:1.24
    command:
      - "/bin/sh"
    args:
      - "-c"
      - "touch /mnt/SUCCESS && exit 0 || exit 1"
    volumeMounts:
      - name: ebs-pvc
        mountPath: "/mnt"
  restartPolicy: "Never"
  volumes:
    - name: ebs-pvc
      persistentVolumeClaim:
        claimName: ebsc
After pod is Completed, the ebs volume is become available on aws web console

Comment 16 errata-xmlrpc 2017-08-10 05:21:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716