Bug 1434709
| Summary: | vSphere disks don't get mounted on containerized OCP | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jianwei Hou <jhou> |
| Component: | Storage | Assignee: | Matthew Wong <mawong> |
| Status: | CLOSED NOTABUG | QA Contact: | Jianwei Hou <jhou> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.5.0 | CC: | aos-bugs, bchilds, eparis, jhou, ronald.l.gavlin.civ |
| Target Milestone: | --- | Keywords: | UpcomingRelease |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-05-15 14:07:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
@jianwei, I tried to reproduce on newer versions
openshift v3.5.5.13
kubernetes v1.5.2+43a9be4
and mounting dynamically provisioned volumes worked, the pod entered Running state and could write to them.
However, I did find a different bug where when the pod is deleted and then the PVC is deleted, the PV enters Failed state because the disk cannot be detached then deleted. Can you try this again on the newer 3.5? I'll open a new bug in this one's place.
PV Failed state:
25s 25s 1 {persistentvolume-controller } Warning VolumeFailedDelete Error caused by file /vmfs/volumes/57a232db-dca8824d-d9cd-44a842af585d/kubevols/kubernetes-dynamic-pvc-242a3c0f-3431-11e7-abad-005056a5d7df.vmd
node logs:
May 08 16:07:35 mawong atomic-openshift-node[32763]: E0508 16:07:35.592937 353 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/vsphere-volume/[ose3-vmware] kubevols/kubernetes-dynamic-pvc-e46bca08-3431-11e7-abad-005056a5d7df.vmdk\"" failed. No retries permitted until 2017-05-08 16:09:35.592905908 -0500 CDT (durationBeforeRetry 2m0s). Error: UnmountDevice failed for volume "kubernetes.io/vsphere-volume/[ose3-vmware] kubevols/kubernetes-dynamic-pvc-e46bca08-3431-11e7-abad-005056a5d7df.vmdk" (spec.Name: "pvc-e46bca08-3431-11e7-abad-005056a5d7df") with: remove /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/vsphere-volume/mounts/[ose3-vmware] kubevols/kubernetes-dynamic-pvc-e46bca08-3431-11e7-abad-005056a5d7df.vmdk: device or resource busy
May 08 16:07:35 mawong atomic-openshift-node[32763]: W0508 16:07:35.592860 353 util.go:84] Warning: "/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/vsphere-volume/mounts/[ose3-vmware] kubevols/kubernetes-dynamic-pvc-e46bca08-3431-11e7-abad-005056a5d7df.vmdk" is not a mountpoint, deleting
May 08 16:07:35 mawong atomic-openshift-node[32763]: I0508 16:07:35.592853 353 nsenter_mount.go:202] IsLikelyNotMountPoint: /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/vsphere-volume/mounts/[ose3-vmware] kubevols/kubernetes-dynamic-pvc-e46bca08-3431-11e7-abad-005056a5d7df.vmdk is not a mount point
May 08 16:07:35 mawong atomic-openshift-node[32763]: I0508 16:07:35.592837 353 nsenter_mount.go:196] IsLikelyNotMountPoint findmnt output for path /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/vsphere-volume/mounts/[ose3-vmware] kubevols/kubernetes-dynamic-pvc-e46bca08-3431-11e7-abad-005056a5d7df.vmdk: /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/vsphere-volume/mounts/[ose3-vmware]:
May 08 16:07:35 mawong atomic-openshift-node[32763]: I0508 16:07:35.582799 353 nsenter_mount.go:183] findmnt command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/findmnt -o target,fstype --noheadings --first-only --target /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/vsphere-volume/mounts/[ose3-vmware] kubevols/kubernetes-dynamic-pvc-e46bca08-3431-11e7-abad-005056a5d7df.vmdk]
May 08 16:07:35 mawong atomic-openshift-node[32763]: W0508 16:07:35.582773 353 mount.go:136] could not determine device for path: "/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/vsphere-volume/mounts/[ose3-vmware] kubevols/kubernetes-dynamic-pvc-e46bca08-3431-11e7-abad-005056a5d7df.vmdk"
May 08 16:07:35 mawong atomic-openshift-node[32763]: I0508 16:07:35.582300 353 reconciler.go:341] UnmountDevice operation started for volume "kubernetes.io/vsphere-volume/[ose3-vmware] kubevols/kubernetes-dynamic-pvc-e46bca08-3431-11e7-abad-005056a5d7df.vmdk" (spec.Name: "pvc-e46bca08-3431-11e7-abad-005056a5d7df")
mount on host:
/dev/sde on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/vsphere-volume/mounts/[ose3-vmware] kubevols/kubernetes-dynamic-pvc-e46bca08-3431-11e7-abad-005056a5d7df.vmdk type ext4 (rw,relatime,seclabel,data=ordered)
mounts in container:
/dev/sde on /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/vsphere-volume/mounts/[ose3-vmware] kubevols/kubernetes-dynamic-pvc-e46bca08-3431-11e7-abad-005056a5d7df.vmdk type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sde on /rootfs/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/vsphere-volume/mounts/[ose3-vmware] kubevols/kubernetes-dynamic-pvc-e46bca08-3431-11e7-abad-005056a5d7df.vmdk type ext4 (rw,relatime,seclabel,data=ordered)
@mawong thank you. I had upgraded my cluster to latest v3.5.5.13-1. I still couldn't get my pods running. The provisioned vmdk was attached to my node, the log didn't report any errors, it kept printing the following info: ``` May 9 14:55:26 node1 journal: I0509 14:55:26.702346 6220 attacher.go:139] Checking VMDK "[datastore1] kubevols/kubernetes-dynamic-pvc-f9adb6ec-347e-11e7-9c55-0050569fb058.vmdk" is attached May 9 14:55:26 node1 atomic-openshift-node: I0509 14:55:26.702346 6220 attacher.go:139] Checking VMDK "[datastore1] kubevols/kubernetes-dynamic-pvc-f9adb6ec-347e-11e7-9c55-0050569fb058.vmdk" is attached ``` OK, my bad, the first time I tried it I had enable-controller-attach-detach true, but the default is false. However the result is the same for me, I tried with false, and it still worked and my pod is running. (PV fails at deletion time but again that's a different bug) Do you see "AttachVolume.Attach succeeded" or "AttachVolume.Attach failed" in the node logs? Could you attach the full kubelet/node logs capturing around the pod creation time and attempts to attach and mount? Thanks. So I dug through the logs and tried it myself on your env. The controller thinks it successfully attached the volume and returns a devicePath like /dev/disk/by-id/wwn-0x6000c29487801c9f215cffef219bc97d. But the devicePath does not exist on the node. The node is told by the controller that it attached the volume but since the devicePath does not actually exist, when the node tries to verify that the VMDK is attached at mount time it fails, hence the repeated "Checking VMDK...is attached." May 11 00:46:02 node1.c.vsphere.local atomic-openshift-node[6160]: E0511 00:46:02.945067 6220 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/vsphere-volume/[datastore1] kubevols/kubernetes-dynamic-pvc-1616e018-3541-11e7-9c55-0050569fb058.vmdk\"" failed. No retries permitted until 2017-05-11 00:48:02.945024277 +0800 CST (durationBeforeRetry 2m0s). Error: MountVolume.WaitForAttach failed for volume "kubernetes.io/vsphere-volume/[datastore1] kubevols/kubernetes-dynamic-pvc-1616e018-3541-11e7-9c55-0050569fb058.vmdk" (spec.Name: "pvc-1616e018-3541-11e7-9c55-0050569fb058") pod "1d943175-3541-11e7-9c55-0050569fb058" (UID: "1d943175-3541-11e7-9c55-0050569fb058") with: Could not find attached VMDK "[datastore1] kubevols/kubernetes-dynamic-pvc-1616e018-3541-11e7-9c55-0050569fb058.vmdk". Timeout waiting for mount paths to be created. I don't know what the difference is between my vsphere env and yours to cause this. The vcenter or whatever API and UI insists the disk is attached when from the node POV it isn't. This needs looking at by upstream vmware people I think. I've opened an upstream issue https://github.com/kubernetes/kubernetes/issues/45679 . @jianwei please assist with providing any other environment information you can if I missed anything or they ask for it, thank you! (In reply to Matthew Wong from comment #8) > I've opened an upstream issue > https://github.com/kubernetes/kubernetes/issues/45679 . @jianwei please > assist with providing any other environment information you can if I missed > anything or they ask for it, thank you! Thank you, I'll keep updating the issue. "looks like you are missing disk.EnableUUID = "TRUE" on the node VM." Can you check this? I'm not sure how to do it myself, thanks! (In reply to Matthew Wong from comment #10) > "looks like you are missing disk.EnableUUID = "TRUE" on the node VM." Can > you check this? I'm not sure how to do it myself, thanks! Thank you, that has fixed the problem, I have replied https://github.com/kubernetes/kubernetes/issues/45679#issuecomment-301100081 |
Description of problem: In containerized OCP cluster, the vSphere disks could be successfully provisioned, but don't get mounted. This is not reproducible on RPM installed OCP. Version-Release number of selected component (if applicable): v3.5.0.55 How reproducible: Always Steps to Reproduce: 1. Create a StorageClass kind: StorageClass apiVersion: storage.k8s.io/v1beta1 metadata: name: vsphere provisioner: kubernetes.io/vsphere-volume 2. Create a PVC { "kind": "PersistentVolumeClaim", "apiVersion": "v1", "metadata": { "name": "vsphere", "annotations": { "volume.beta.kubernetes.io/storage-class": "vsphere" } }, "spec": { "accessModes": [ "ReadWriteOnce" ], "resources": { "requests": { "storage": "1Gi" } } } } 3. Once PVC is bound, create a Pod oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES AGE vsphere Bound pvc-9baba382-0ed0-11e7-8a82-0050569fb058 1Gi RWO 48m { "apiVersion": "v1", "kind": "Pod", "metadata": { "name": "vsphere", "labels": { "name": "vsphere" } }, "spec": { "containers": [{ "name": "vsphere", "image": "aosqe/hello-openshift", "imagePullPolicy": "IfNotPresent", "volumeMounts": [{ "mountPath": "/mnt/vsphere", "name": "vsphere" }] }], "securityContext": { "fsGroup": 123456, "supplementalGroups": [111,112,113], "seLinuxOptions": { "level": "s0" } }, "volumes": [{ "name": "vsphere", "persistentVolumeClaim": { "claimName": "vsphere" } }] } } Actual results: After step 3: Pod stuck at 'ContainerCreating' state. oc describe pod vsphere Name: vsphere Namespace: jhou Security Policy: storagescc Node: node1.c.vsphere.local/10.66.144.139 Start Time: Wed, 22 Mar 2017 15:25:01 +0800 Labels: name=vsphere Status: Pending IP: Controllers: <none> Containers: vsphere: Container ID: Image: aosqe/hello-openshift Image ID: Port: State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Volume Mounts: /mnt/vsphere from vsphere (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-lzwkl (ro) Environment Variables: <none> Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: vsphere: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: vsphere ReadOnly: false default-token-lzwkl: Type: Secret (a volume populated by a Secret) SecretName: default-token-lzwkl QoS Class: BestEffort Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 48m 48m 1 {default-scheduler } Normal Scheduled Successfully assigned vsphere to node1.c.vsphere.local 46m 2m 21 {kubelet node1.c.vsphere.local} Warning FailedMount Unable to mount volumes for pod "vsphere_jhou(a91ac559-0ed0-11e7-8a82-0050569fb058)": timeout expired waiting for volumes to attach/mount for pod "jhou"/"vsphere". list of unattached/unmounted volumes=[vsphere] 46m 2m 21 {kubelet node1.c.vsphere.local} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "jhou"/"vsphere". list of unattached/unmounted volumes=[vsphere] Expected results: Pod should be 'Running'. Additional info: