Bug 1569404

Summary: Delete a volumeattachment used by a Pod will cause errors
Product: OpenShift Container Platform Reporter: Qin Ping <piqin>
Component: StorageAssignee: Jan Safranek <jsafrane>
Status: CLOSED CURRENTRELEASE QA Contact: Qin Ping <piqin>
Severity: high Docs Contact:
Priority: high    
Version: 3.10.0CC: aos-bugs, aos-storage-staff, bchilds, jsafrane, piqin
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-20 21:12:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qin Ping 2018-04-19 08:52:02 UTC
Description of problem:
Delete a volumeattachment used by a Pod will cause errors

Version-Release number of selected component (if applicable):
openshift v3.10.0-0.22.0
kubernetes v1.10.0+b81c8f8

# uname -a
Linux host-172-16-120-56 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.5 (Maipo)

# oc exec csi-attacher-cinderplugin-0 -c csi-attacher -- rpm -qa|grep attacher
csi-attacher-0.2.0-2.git27299be.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Deployment cinder csi driver on OCP3.10
2. Create a storageclass
3. Create a PVC
4. Create a Pod
5. When Pod is Running, delete volumeattachment used by pod, volume is detached successfully, "oc get volumeattachment" command can not get the va.

Actual results:
1. Delete Pod and recreate Pod
Pod will stuck in "ContainerCreating" status, and no volumeattachment is created.
oc describe pod gets event:
Warning  FailedMount            19s   kubelet, host-172-16-120-79  Unable to mount volumes for pod "web-server_mytest(ee7f3ff0-43a8-11e8-8b58-fa163e20ed22)": timeout expired waiting for volumes to attach or mount for pod "mytest"/"web-server". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-4mnd9]

2. Delete Pod, PVC, recreate PVC and Pod.
Pod will stuck in "ContainerCreating" status, and get events:
Warning  FailedMount             14s   kubelet, host-172-16-120-79  MountVolume.SetUp failed for volume "kubernetes-dynamic-pv-5fc40cce43a911e8" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
volumeattachment is created:
# oc get volumeattachment -o yaml
apiVersion: v1
items:
- apiVersion: storage.k8s.io/v1beta1
  kind: VolumeAttachment
  metadata:
    creationTimestamp: 2018-04-19T07:46:57Z
    finalizers:
    - external-attacher/csi-cinderplugin
    name: csi-79b66fcd646966bea869a443f801f84649c0b0d2b129ccf38972f1dbb0a74705
    namespace: ""
    resourceVersion: "181917"
    selfLink: /apis/storage.k8s.io/v1beta1/volumeattachments/csi-79b66fcd646966bea869a443f801f84649c0b0d2b129ccf38972f1dbb0a74705
    uid: d65bd911-43a5-11e8-8b58-fa163e20ed22
  spec:
    attacher: csi-cinderplugin
    nodeName: host-172-16-120-79
    source:
      persistentVolumeName: kubernetes-dynamic-pv-d060a33c43a511e8
  status:
    attached: true
    attachmentMetadata:
      DevicePath: /dev/vdb
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

But on the node, volume is attached to /dev/vdc:
# fdisk -l|grep vd
Disk /dev/vda: 42.9 GB, 42949672960 bytes, 83886080 sectors
/dev/vda1   *        2048     1026047      512000   83  Linux
/dev/vda2         1026048    83886046    41429999+  8e  Linux LVM
Disk /dev/vdc: 1073 MB, 1073741824 bytes, 2097152 sectors

Expected results:
When volumeattachment is used, it should not be deleted?

Master Log:

Node Log (of failed PODs):

PV Dump:
# oc export pv kubernetes-dynamic-pv-5fc40cce43a911e8
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: csi-cinderplugin
  creationTimestamp: null
  finalizers:
  - kubernetes.io/pv-protection
  - external-attacher/csi-cinderplugin
  name: kubernetes-dynamic-pv-5fc40cce43a911e8
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 1Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: pvc1
    namespace: mytest
    resourceVersion: "185433"
    uid: 5f3b2aca-43a9-11e8-8b58-fa163e20ed22
  csi:
    driver: csi-cinderplugin
    volumeAttributes:
      availability: nova
      storage.kubernetes.io/csiProvisionerIdentity: 1524125179002-8081-csi-cinderplugin
    volumeHandle: 0051ee4d-d18a-4152-af25-1d8a6896d487
  persistentVolumeReclaimPolicy: Delete
  storageClassName: csi-provisioner
status: {}


PVC Dump:
# oc export pvc pvc1 -n mytest
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"8c50a7cb-43a8-11e8-b277-0a580a810025","leaseDurationSeconds":15,"acquireTime":"2018-04-19T08:12:16Z","renewTime":"2018-04-19T08:12:18Z","leaderTransitions":0}'
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: csi-cinderplugin
  creationTimestamp: null
  finalizers:
  - kubernetes.io/pvc-protection
  name: pvc1
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: csi-provisioner
  volumeName: kubernetes-dynamic-pv-5fc40cce43a911e8
status: {}


StorageClass Dump (if StorageClass used by PV/PVC):
# oc export sc csi-provisioner 
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: null
  name: csi-provisioner
provisioner: csi-cinderplugin
reclaimPolicy: Delete
volumeBindingMode: Immediate


Additional info:
# oc logs csi-nodeplugin-cinderplugin-v95qp -c cinder -n csi-storage|grep vd
fsck.ext2: Bad magic number in super-block while trying to open /dev/vdb
/dev/vdb: 
Mounting arguments: -t ext4 -o rw,defaults /dev/vdb /var/lib/origin/openshift.local.volumes/pods/2bf3292a-43a5-11e8-8b58-fa163e20ed22/volumes/kubernetes.io~csi/kubernetes-dynamic-pv-24adcd3543a511e8/mount
Output: mount: wrong fs type, bad option, bad superblock on /dev/vdb,
I0419 07:42:31.420903       1 mount_linux.go:532] Disk "/dev/vdb" appears to be unformatted, attempting to format as type: "ext4" with options: [-F /dev/vdb]
I0419 07:42:32.395545       1 mount_linux.go:536] Disk successfully formatted (mkfs): ext4 - /dev/vdb /var/lib/origin/openshift.local.volumes/pods/2bf3292a-43a5-11e8-8b58-fa163e20ed22/volumes/kubernetes.io~csi/kubernetes-dynamic-pv-24adcd3543a511e8/mount
E0419 07:48:07.008324       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
E0419 07:48:22.574978       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
E0419 07:48:38.575257       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
E0419 07:48:55.612635       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
E0419 07:49:14.608354       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
E0419 07:49:37.647315       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
E0419 07:50:08.730815       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
E0419 07:51:59.764854       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
E0419 07:54:16.859373       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
E0419 07:56:33.887760       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
E0419 07:59:47.004532       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
E0419 08:00:02.541557       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
E0419 08:00:18.658403       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
E0419 08:00:35.695261       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
E0419 08:00:54.782978       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
E0419 08:01:17.854629       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.
E0419 08:01:48.926211       1 utils.go:100] GRPC error: Could not find attached Cinder disk /dev/vdb. Timeout waiting for mount paths to be created.

Comment 1 Jan Safranek 2018-04-19 10:16:21 UTC
Qin, how did you install OpenShift on OpenStack? My installation in Jenkins failed with "Node start failed."

Node logs shows this:
22933 server.go:233] failed to run Kubelet: failed to create kubelet: failed to get the addresses of the current instance from the cloud

https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/35908/console

Comment 3 Jan Safranek 2018-05-02 11:37:44 UTC
There are several bugs in Kubernetes CSI implementation:

- Attach/Detach controller does not recover when user deletes VolumeAttachment manually: https://github.com/kubernetes/kubernetes/pull/63295

- Attach/Detach controller does not report errors when attach fails: https://github.com/kubernetes/kubernetes/pull/63303


And there are some bugs in Cinder CSI driver itself:

- CSI driver trusts OpenStack's DevicePath: https://github.com/kubernetes/cloud-provider-openstack/issues/150


Cinder driver is not much usable without fixing that.

Comment 4 Jan Safranek 2018-05-23 12:51:42 UTC
Origin PR: https://github.com/openshift/origin/pull/19816

Comment 7 Jan Safranek 2018-06-06 15:02:57 UTC
I should make that clearer, I fixed only the bugs in Kubernetes components:

> - Attach/Detach controller does not recover when user deletes
> VolumeAttachment manually:
> https://github.com/kubernetes/kubernetes/pull/63295
> 
> - Attach/Detach controller does not report errors when attach fails:
> https://github.com/kubernetes/kubernetes/pull/63303


I did not fix the driver itself, this is still open:

> 
> - CSI driver trusts OpenStack's DevicePath:
> https://github.com/kubernetes/cloud-provider-openstack/issues/150
> 
> 
> Cinder driver is not much usable without fixing that.

What issue exactly are you hitting?

Comment 8 Qin Ping 2018-06-07 05:40:05 UTC
Ah, I misunderstood CSI driver PR, I saw a comment said fixed, but did not notice it was been fixed in "internal" Cidner volume plugin.

Comment 9 Jan Safranek 2018-06-07 11:46:30 UTC
Qin, I checked the system you left for me and I've discovered another bug. I filled it as #1588453 so we can track its fix independently.

Comment 10 Qin Ping 2018-06-12 08:49:50 UTC
Verified in openshift:
oc v3.10.0-0.64.0
openshift v3.10.0-0.64.0
kubernetes v1.10.0+b81c8f8

csi-provisioner-0.2.0-2.el7.x86_64

# uname -a
Linux qe-piqin-master-etcd-nfs-1 3.10.0-862.3.2.el7.x86_64 #1 SMP Tue May 15
18:22:15 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.5 (Maipo)