Bug 1645260
Summary: | vSphere Cloud provider: detach volume when node is not present/ powered off | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Hemant Kumar <hekumar> |
Component: | Storage | Assignee: | Hemant Kumar <hekumar> |
Status: | CLOSED ERRATA | QA Contact: | Wenqi He <wehe> |
Severity: | low | Docs Contact: | |
Priority: | medium | ||
Version: | 3.10.0 | CC: | aos-bugs, aos-storage-staff, bbennett, bchilds, fshaikh, hekumar, hgomes, jkaur, jokerman, lxia, misalunk, mmccomas, sgarciam, tmanor |
Target Milestone: | --- | Keywords: | NeedsTestCase |
Target Release: | 3.10.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1619514 | Environment: | |
Last Closed: | 2018-12-13 17:09:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1619514 | ||
Bug Blocks: | 1645258 |
Comment 1
Hemant Kumar
2018-11-01 18:34:02 UTC
Hmm, small modifications. 1. It is better to create a deployment rather than a plain pod for verifying this bug. Because if pod is backed by a deployment it will be migrated when node where it was running is shutdown. 2. You will notice that newly created pod (on new node) is stuck in ContainerCreating state while old pod (on shutdown node) is in Unknwown state. At this point you must force delete the old pod. Typically this can be done by: oc delete pod <pod> --force=true --grace-period=0 3. This should allow new pod to be running on new node. Thanks HK! I have verified this bug on below version: openshift v3.10.79 kubernetes v1.10.0+b81c8f8 1. Create sc, pvc and dc # oc get dc NAME REVISION DESIRED CURRENT TRIGGERED BY vsphere 1 1 1 config # oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE azpvc Bound pvc-6696150c-f3af-11e8-8288-0050569f4627 1Gi RWO standard 18h # oc get pods NAME READY STATUS RESTARTS AGE vsphere-1-4kwj7 1/1 Running 0 2m # oc get pods vsphere-1-4kwj7 -o yaml | grep node nodeName: ocp310.node1.vsphere.local 2. Power off the node and delete the pod # oc get nodes NAME STATUS ROLES AGE VERSION ocp310.master.vsphere.local Ready master 49d v1.10.0+b81c8f8 ocp310.node1.vsphere.local NotReady compute 49d v1.10.0+b81c8f8 # oc delete pod vsphere-1-4kwj7 # oc get pods NAME READY STATUS RESTARTS AGE vsphere-1-4kwj7 1/1 Terminating 0 5m vsphere-1-vh4zl 0/1 ContainerCreating 0 10s 3. Then force delete the pod # oc delete pod vsphere-1-4kwj7 --force=true --grace-period=0 # oc get pods NAME READY STATUS RESTARTS AGE vsphere-1-vh4zl 0/1 ContainerCreating 0 4m # oc get pods NAME READY STATUS RESTARTS AGE vsphere-1-vh4zl 1/1 Running 0 11m # oc describe pods vsphere-1-vh4zl Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 11m default-scheduler Successfully assigned vsphere-1-vh4zl to ocp310.master.vsphere.local Warning FailedAttachVolume 11m attachdetach-controller Multi-Attach error for volume "pvc-6696150c-f3af-11e8-8288-0050569f4627" Volume is already used by pod(s) vsphere-1-4kwj7 Warning FailedMount 2m (x4 over 9m) kubelet, ocp310.master.vsphere.local Unable to mount volumes for pod "vsphere-1-vh4zl_default(ed1be1a9-f448-11e8-8288-0050569f4627)": timeout expired waiting for volumes to attach or mount for pod "default"/"vsphere-1-vh4zl". list of unmounted volumes=[azure]. list of unattached volumes=[azure default-token-t9wrs] Normal SuccessfulAttachVolume 1m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-6696150c-f3af-11e8-8288-0050569f4627" Normal Pulled 1m kubelet, ocp310.master.vsphere.local Container image "aosqe/hello-openshift" already present on machine Normal Created 1m kubelet, ocp310.master.vsphere.local Created container Normal Started 1m kubelet, ocp310.master.vsphere.local Started container This fix is awesome, even the node is shutdown we can still allow the disk to detach and attach to new node again. Thanks! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3750 |