Description of problem: When a vm is reported as no longer present in cloud provider and is deleted by node controller, there are no attempts to detach respective volumes. For example, if a VM is powered off , and pods are migrated to other nodes. In the case of vSphere, the VM cannot be started again because the VM still holds mount points to volumes that are now mounted to other VMs. Please check https://github.com/kubernetes/kubernetes/pull/40118 https://github.com/kubernetes/kubernetes/issues/33061 Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Vm fails to start as volume is still mounted Expected results: Detach volumes when vm is not present or powered off. Additional info:
Can you please post controller logs.
*** Bug 1622245 has been marked as a duplicate of this bug. ***
Tested with below OCP, openshift v3.9.55 kubernetes v1.9.1+a0ce1bc657 etcd 3.2.16 Prepare a pod with persistent volumes. # oc get pods -n test NAME READY STATUS RESTARTS AGE cakephp-mysql-persistent-1-build 1/1 Running 0 3m mysql-1-zzfch 1/1 Running 0 3m # oc get pvc -n test NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE mysql Bound pvc-a080ce24-f490-11e8-a6dc-0050569f5322 1Gi RWO standard 4m # oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-a080ce24-f490-11e8-a6dc-0050569f5322 1Gi RWO Delete Bound test/mysql standard 4m # oc get pods -n test mysql-1-zzfch -o yaml | grep -i nodename nodeName: qe-lxia-39-node-registry-router-1 Then shutdown the node with "shutdown -h" Wait (about 1 minute) for the node to become NotReady. Wait (about 2 minutes) for the pod to become Unknown. And also a new pod created in ContainerCreating status. Delete the old pod via command, # oc delete pod -n test mysql-1-zzfch --grace-period=0 --force And the new pod eventually becomes running after wait for some time (8 minutes in my case). # oc get pod -n test NAME READY STATUS RESTARTS AGE mysql-1-l5fqs 1/1 Running 0 12m Though it show mount failure in the events at first, it finally mounted successfully and become running. Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 10m default-scheduler Successfully assigned mysql-1-l5fqs to openshift-195 Warning FailedAttachVolume 10m attachdetach-controller Multi-Attach error for volume "pvc-a080ce24-f490-11e8-a6dc-0050569f5322" Volume is already used by pod(s) mysql-1-zzfch Normal SuccessfulMountVolume 10m kubelet, openshift-195 MountVolume.SetUp succeeded for volume "default-token-zspdf" Warning FailedMount 3m (x3 over 8m) kubelet, openshift-195 Unable to mount volumes for pod "mysql-1-l5fqs_test(5aef15d8-f492-11e8-a6dc-0050569f5322)": timeout expired waiting for volumes to attach/mount for pod "test"/"mysql-1-l5fqs". list of unattached/unmounted volumes=[mysql-data] Normal SuccessfulMountVolume 1m kubelet, openshift-195 MountVolume.SetUp succeeded for volume "pvc-a080ce24-f490-11e8-a6dc-0050569f5322" Normal Pulled 1m kubelet, openshift-195 Container image "registry.access.redhat.com/rhscl/mysql-57-rhel7@sha256:75665d5efd7f051fa8b308207fac269b2d8cae0848007dcad4a6ffdcddf569cb" already present on machine Normal Created 1m kubelet, openshift-195 Created container Normal Started 1m kubelet, openshift-195 Started container
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3748