Bug 1645260 - vSphere Cloud provider: detach volume when node is not present/ powered off
Summary: vSphere Cloud provider: detach volume when node is not present/ powered off
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 3.10.z
Assignee: Hemant Kumar
QA Contact: Wenqi He
URL:
Whiteboard:
Depends On: 1619514
Blocks: 1645258
TreeView+ depends on / blocked
 
Reported: 2018-11-01 18:30 UTC by Hemant Kumar
Modified: 2018-12-13 17:09 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1619514
Environment:
Last Closed: 2018-12-13 17:09:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:3750 0 None None None 2018-12-13 17:09:16 UTC

Comment 1 Hemant Kumar 2018-11-01 18:34:02 UTC
https://github.com/openshift/origin/pull/21409

Comment 7 Hemant Kumar 2018-11-29 15:07:43 UTC
Hmm, small modifications. 

1. It is better to create a deployment rather than a plain pod for verifying this bug. Because if pod is backed by a deployment it will be migrated when node where it was running is shutdown.
2. You will notice that newly created pod (on new node) is stuck in ContainerCreating state while old pod (on shutdown node) is in Unknwown state. At this point you must force delete the old pod. Typically this can be done by:

oc delete pod <pod> --force=true --grace-period=0

3. This should allow new pod to be running on new node.

Comment 8 Wenqi He 2018-11-30 02:52:04 UTC
Thanks HK!
I have verified this bug on below version:

openshift v3.10.79
kubernetes v1.10.0+b81c8f8

1. Create sc, pvc and dc
# oc get dc
NAME               REVISION   DESIRED   CURRENT   TRIGGERED BY
vsphere            1          1         1         config

# oc get pvc
NAME      STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
azpvc     Bound     pvc-6696150c-f3af-11e8-8288-0050569f4627   1Gi        RWO            standard       18h

# oc get pods
NAME              READY     STATUS    RESTARTS   AGE
vsphere-1-4kwj7   1/1       Running   0          2m

# oc get pods vsphere-1-4kwj7 -o yaml | grep node
  nodeName: ocp310.node1.vsphere.local

2. Power off the node and delete the pod

# oc get nodes
NAME                          STATUS     ROLES     AGE       VERSION
ocp310.master.vsphere.local   Ready      master    49d       v1.10.0+b81c8f8
ocp310.node1.vsphere.local    NotReady   compute   49d       v1.10.0+b81c8f8

# oc delete pod vsphere-1-4kwj7

# oc get pods
NAME              READY     STATUS              RESTARTS   AGE
vsphere-1-4kwj7   1/1       Terminating         0          5m
vsphere-1-vh4zl   0/1       ContainerCreating   0          10s

3. Then force delete the pod
# oc delete pod vsphere-1-4kwj7 --force=true --grace-period=0

# oc get pods
NAME              READY     STATUS              RESTARTS   AGE
vsphere-1-vh4zl   0/1       ContainerCreating   0          4m

# oc get pods
NAME              READY     STATUS    RESTARTS   AGE
vsphere-1-vh4zl   1/1       Running   0          11m

# oc describe pods vsphere-1-vh4zl
Events:
  Type     Reason                  Age              From                                  Message
  ----     ------                  ----             ----                                  -------
  Normal   Scheduled               11m              default-scheduler                     Successfully assigned vsphere-1-vh4zl to ocp310.master.vsphere.local
  Warning  FailedAttachVolume      11m              attachdetach-controller               Multi-Attach error for volume "pvc-6696150c-f3af-11e8-8288-0050569f4627" Volume is already used by pod(s) vsphere-1-4kwj7
  Warning  FailedMount             2m (x4 over 9m)  kubelet, ocp310.master.vsphere.local  Unable to mount volumes for pod "vsphere-1-vh4zl_default(ed1be1a9-f448-11e8-8288-0050569f4627)": timeout expired waiting for volumes to attach or mount for pod "default"/"vsphere-1-vh4zl". list of unmounted volumes=[azure]. list of unattached volumes=[azure default-token-t9wrs]
  Normal   SuccessfulAttachVolume  1m               attachdetach-controller               AttachVolume.Attach succeeded for volume "pvc-6696150c-f3af-11e8-8288-0050569f4627"
  Normal   Pulled                  1m               kubelet, ocp310.master.vsphere.local  Container image "aosqe/hello-openshift" already present on machine
  Normal   Created                 1m               kubelet, ocp310.master.vsphere.local  Created container
  Normal   Started                 1m               kubelet, ocp310.master.vsphere.local  Started container

This fix is awesome, even the node is shutdown we can still allow the disk to detach and attach to new node again. Thanks!

Comment 10 errata-xmlrpc 2018-12-13 17:09:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3750


Note You need to log in before you can comment on or make changes to this bug.