Description of problem: Version-Release number of selected component (if applicable): latest 3.9/rhel-7.5 How reproducible: Steps to Reproduce: 1. Create a 2 VM openshift cluster on vsphere. 2. Create some deployments on the cluster with vsphere persistent volumes. Make sure that slave node gets some pods. 3. Now shutdown the slave node. 4. The node api object of shutdown node gets removed and all pods from it are migrated. 5. Try to bring back the old node. What happens: Old node refuses to start with - https://gist.github.com/gnufied/40fe436dd885311e8ee520ac67bd84ad because volumes never gets detached from old node. And those same volumes get attached to a new node. ug 24 11:52:06 vim-master.lan atomic-openshift-master-controllers[23519]: E0824 11:52:06.691814 23519 attacher.go:260] Error checking if volume ("[datastore1] kubevols/kubernetes-dynamic-pvc-0e972db5-a7b1-11e8-a954-00505694a8ab.vmdk") is already attached to current node ("vim-node .lan"). Will continue and try detach anyway. err=No VM found Aug 24 11:52:06 vim-master.lan atomic-openshift-master-controllers[23519]: E0824 11:52:06.691828 23519 attacher.go:274] Error detaching volume "[datastore1] kubevols/kubernetes-dynamic-pvc-0e972db5-a7b1-11e8-a954-00505694a8ab.vmdk": No VM found Aug 24 11:52:06 vim-master.lan atomic-openshift-master-controllers[23519]: E0824 11:52:06.691846 23519 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/vsphere-volume/[datastore1] kubevols/kubernetes-dynamic-pvc-0e972db5-a7b1-11e8-a954-00505694a8ab.vmdk\"" failed. No retries permitted until 2018-08-24 11:52:07.191836175 -0400 EDT m=+3859.550119407 (durationBeforeRetry 500ms). Error: "DetachVolume.Detach failed for volume \"pvc-0e972db5-a7b1-11e8-a954-00505694a8ab\" (UniqueName: \"kubernetes.io/vsphere-volume/[datastore1] kubevols/kubernetes-dynam ic-pvc-0e972db5-a7b1-11e8-a954-00505694a8ab.vmdk\") on node \"vim-node.lan\" : No VM found" Aug 24 11:52:07 vim-master.lan atomic-openshift-master-controllers[23519]: W0824 11:52:07.192523 23519 reconciler.go:235] attacherDetacher.DetachVolume started for volume "pvc-0e972db5-a7b1-11e8-a954-00505694a8ab" (UniqueName: "kubernetes.io/vsphere-volume/[datastore1] kubevols/kub ernetes-dynamic-pvc-0e972db5-a7b1-11e8-a954-00505694a8ab.vmdk") on node "vim-node.lan" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching Aug 24 11:52:07 vim-master.lan atomic-openshift-master-controllers[23519]: E0824 11:52:07.192729 23519 attacher.go:260] Error checking if volume ("[datastore1] kubevols/kubernetes-dynamic-pvc-0e972db5-a7b1-11e8-a954-00505694a8ab.vmdk") is already attached to current node ("vim-node .lan"). Will continue and try detach anyway. err=No VM found Aug 24 11:52:07 vim-master.lan atomic-openshift-master-controllers[23519]: E0824 11:52:07.192743 23519 attacher.go:274] Error detaching volume "[datastore1] kubevols/kubernetes-dynamic-pvc-0e972db5-a7b1-11e8-a954-00505694a8ab.vmdk": No VM found Aug 24 11:52:07 vim-master.lan atomic-openshift-master-controllers[23519]: E0824 11:52:07.192759 23519 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/vsphere-volume/[datastore1] kubevols/kubernetes-dynamic-pvc-0e972db5-a7b1-11e8-a954-00505694a8ab.vmdk\"" failed. No retries permitted until 2018-08-24 11:52:08.192751187 -0400 EDT m=+3860.551034418 (durationBeforeRetry 1s). Error: "DetachVolume.Detach failed for volume \"pvc-0e972db5-a7b1-11e8-a954-00505694a8ab\" (UniqueName: \"kubernetes.io/vsphere-volume/[datastore1] kubevols/kubernetes-dynamic- pvc-0e972db5-a7b1-11e8-a954-00505694a8ab.vmdk\") on node \"vim-node.lan\" : No VM found" Actual results: Can't resume a shutdown node. Expected results: Should be able to resume a shutdown node. Additional info: The main bug here is - detaching from shutdown node never really works. That is why we can't resume a shutdown node. Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
Possibly related bug - https://github.com/kubernetes/kubernetes/pull/67825 it looks like vsphere volumes do not support multiattach and yet that flag is enabled and hence causes same volume to be mounted in multiple places without detaching from old place first.
Opened https://github.com/openshift/origin/pull/21025 to backport the fix to Openshift.
This failed my test on v3.9.45, after the node is shutdown, it can not be started unless the volume is manually removed from it. Steps: 1. Prepare two nodes, create some deployments on node a. 2. Shutdown node a, Pods are scheduled to node b, volume is attached to node b 3. Start node a. Node a cannot be started.