Description of problem: Delete a volumeattachment used by a Pod will cause errors Version-Release number of selected component (if applicable): openshift v3.10.0-0.22.0 kubernetes v1.10.0+b81c8f8 # uname -a Linux host-172-16-120-56 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) # oc exec csi-attacher-cinderplugin-0 -c csi-attacher -- rpm -qa|grep attacher csi-attacher-0.2.0-2.git27299be.el7.x86_64 How reproducible: Always Steps to Reproduce: 1. Deployment cinder csi driver on OCP3.10 2. Create a storageclass 3. Create a PVC 4. Create a Pod 5. When Pod is Running, delete volumeattachment used by pod, volume is detached successfully, "oc get volumeattachment" command can not get the va. Actual results: 1. Delete Pod and recreate Pod Pod will stuck in "ContainerCreating" status, and no volumeattachment is created. oc describe pod gets event: Warning FailedMount 19s kubelet, host-172-16-120-79 Unable to mount volumes for pod "web-server_mytest(ee7f3ff0-43a8-11e8-8b58-fa163e20ed22)": timeout expired waiting for volumes to attach or mount for pod "mytest"/"web-server". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-4mnd9] 2. Delete Pod, PVC, recreate PVC and Pod. Pod will stuck in "ContainerCreating" status, and get events: Warning FailedMount 14s kubelet, host-172-16-120-79 MountVolume.SetUp failed for volume "kubernetes-dynamic-pv-5fc40cce43a911e8" : rpc error: code = DeadlineExceeded desc = context deadline exceeded volumeattachment is created: # oc get volumeattachment -o yaml apiVersion: v1 items: - apiVersion: storage.k8s.io/v1beta1 kind: VolumeAttachment metadata: creationTimestamp: 2018-04-19T07:46:57Z finalizers: - external-attacher/csi-cinderplugin name: csi-79b66fcd646966bea869a443f801f84649c0b0d2b129ccf38972f1dbb0a74705 namespace: "" resourceVersion: "181917" selfLink: /apis/storage.k8s.io/v1beta1/volumeattachments/csi-79b66fcd646966bea869a443f801f84649c0b0d2b129ccf38972f1dbb0a74705 uid: d65bd911-43a5-11e8-8b58-fa163e20ed22 spec: attacher: csi-cinderplugin nodeName: host-172-16-120-79 source: persistentVolumeName: kubernetes-dynamic-pv-d060a33c43a511e8 status: attached: true attachmentMetadata: DevicePath: /dev/vdb kind: List metadata: resourceVersion: "" selfLink: "" But on the node, volume is attached to /dev/vdc: # fdisk -l|grep vd Disk /dev/vda: 42.9 GB, 42949672960 bytes, 83886080 sectors /dev/vda1 * 2048 1026047 512000 83 Linux /dev/vda2 1026048 83886046 41429999+ 8e Linux LVM Disk /dev/vdc: 1073 MB, 1073741824 bytes, 2097152 sectors Expected results: When volumeattachment is used, it should not be deleted? Master Log: Node Log (of failed PODs): PV Dump: # oc export pv kubernetes-dynamic-pv-5fc40cce43a911e8 apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/provisioned-by: csi-cinderplugin creationTimestamp: null finalizers: - kubernetes.io/pv-protection - external-attacher/csi-cinderplugin name: kubernetes-dynamic-pv-5fc40cce43a911e8 spec: accessModes: - ReadWriteOnce capacity: storage: 1Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: pvc1 namespace: mytest resourceVersion: "185433" uid: 5f3b2aca-43a9-11e8-8b58-fa163e20ed22 csi: driver: csi-cinderplugin volumeAttributes: availability: nova storage.kubernetes.io/csiProvisionerIdentity: 1524125179002-8081-csi-cinderplugin volumeHandle: 0051ee4d-d18a-4152-af25-1d8a6896d487 persistentVolumeReclaimPolicy: Delete storageClassName: csi-provisioner status: {} PVC Dump: # oc export pvc pvc1 -n mytest apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"8c50a7cb-43a8-11e8-b277-0a580a810025","leaseDurationSeconds":15,"acquireTime":"2018-04-19T08:12:16Z","renewTime":"2018-04-19T08:12:18Z","leaderTransitions":0}' pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" volume.beta.kubernetes.io/storage-provisioner: csi-cinderplugin creationTimestamp: null finalizers: - kubernetes.io/pvc-protection name: pvc1 spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: csi-provisioner volumeName: kubernetes-dynamic-pv-5fc40cce43a911e8 status: {} StorageClass Dump (if StorageClass used by PV/PVC): # oc export sc csi-provisioner apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: creationTimestamp: null name: csi-provisioner provisioner: csi-cinderplugin reclaimPolicy: Delete volumeBindingMode: Immediate Additional info: # oc logs csi-nodeplugin-cinderplugin-v95qp -c cinder -n csi-storage|grep vd fsck.ext2: Bad magic number in super-block while trying to open /dev/vdb /dev/vdb: Mounting arguments: -t ext4 -o rw,defaults /dev/vdb /var/lib/origin/openshift.local.volumes/pods/2bf3292a-43a5-11e8-8b58-fa163e20ed22/volumes/kubernetes.io~csi/kubernetes-dynamic-pv-24adcd3543a511e8/mount Output: mount: wrong fs type, bad option, bad superblock on /dev/vdb, I0419 07:42:31.420903 1 mount_linux.go:532] Disk "/dev/vdb" appears to be unformatted, attempting to format as type: "ext4" with options: [-F /dev/vdb] I0419 07:42:32.395545 1 mount_linux.go:536] Disk successfully formatted (mkfs): ext4 - /dev/vdb /var/lib/origin/openshift.local.volumes/pods/2bf3292a-43a5-11e8-8b58-fa163e20ed22/volumes/kubernetes.io~csi/kubernetes-dynamic-pv-24adcd3543a511e8/mount E0419 07:48:07.008324 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created. E0419 07:48:22.574978 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created. E0419 07:48:38.575257 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created. E0419 07:48:55.612635 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created. E0419 07:49:14.608354 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created. E0419 07:49:37.647315 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created. E0419 07:50:08.730815 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created. E0419 07:51:59.764854 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created. E0419 07:54:16.859373 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created. E0419 07:56:33.887760 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created. E0419 07:59:47.004532 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created. E0419 08:00:02.541557 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created. E0419 08:00:18.658403 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created. E0419 08:00:35.695261 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created. E0419 08:00:54.782978 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created. E0419 08:01:17.854629 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created. E0419 08:01:48.926211 1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
Qin, how did you install OpenShift on OpenStack? My installation in Jenkins failed with "Node start failed." Node logs shows this: 22933 server.go:233] failed to run Kubelet: failed to create kubelet: failed to get the addresses of the current instance from the cloud https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/35908/console
There are several bugs in Kubernetes CSI implementation: - Attach/Detach controller does not recover when user deletes VolumeAttachment manually: https://github.com/kubernetes/kubernetes/pull/63295 - Attach/Detach controller does not report errors when attach fails: https://github.com/kubernetes/kubernetes/pull/63303 And there are some bugs in Cinder CSI driver itself: - CSI driver trusts OpenStack's DevicePath: https://github.com/kubernetes/cloud-provider-openstack/issues/150 Cinder driver is not much usable without fixing that.
Origin PR: https://github.com/openshift/origin/pull/19816
I should make that clearer, I fixed only the bugs in Kubernetes components: > - Attach/Detach controller does not recover when user deletes > VolumeAttachment manually: > https://github.com/kubernetes/kubernetes/pull/63295 > > - Attach/Detach controller does not report errors when attach fails: > https://github.com/kubernetes/kubernetes/pull/63303 I did not fix the driver itself, this is still open: > > - CSI driver trusts OpenStack's DevicePath: > https://github.com/kubernetes/cloud-provider-openstack/issues/150 > > > Cinder driver is not much usable without fixing that. What issue exactly are you hitting?
Ah, I misunderstood CSI driver PR, I saw a comment said fixed, but did not notice it was been fixed in "internal" Cidner volume plugin.
Qin, I checked the system you left for me and I've discovered another bug. I filled it as #1588453 so we can track its fix independently.
Verified in openshift: oc v3.10.0-0.64.0 openshift v3.10.0-0.64.0 kubernetes v1.10.0+b81c8f8 csi-provisioner-0.2.0-2.el7.x86_64 # uname -a Linux qe-piqin-master-etcd-nfs-1 3.10.0-862.3.2.el7.x86_64 #1 SMP Tue May 15 18:22:15 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo)