Bug 1730339 - Potential volume leak in CSI external provisioner
Summary: Potential volume leak in CSI external provisioner
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.2.0
Assignee: Jan Safranek
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-16 13:48 UTC by Jan Safranek
Modified: 2019-10-16 06:32 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:29:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:32:03 UTC

Description Jan Safranek 2019-07-16 13:48:43 UTC
CSI external provisioner may leak volumes after timeouts from a CSI driver. See https://github.com/kubernetes-csi/external-provisioner/issues/311 for details.

It's complicated to reproduce with a testing mock driver and it's almost impossible with a real driver (unless the driver takes really long time to provision a volume).

1. Set up a CSI driver.
2. Create a PVC to dynamically provision a PV.
3. The CSI driver must time out while provisioning a PV (!)
4. While the volume is being provisioned, delete the PVC.

Actual result:
The provisioner stops provisioning of the volume when PVC is deleted. The driver finishes provisioning, but no PV is created for the volume -> the volume is orphaned in the storage backend.

Expected result:
external-provisioner continues provisioning the volume even after PVC is deleted. Eventually, a PV is created for the volume and it's immediately deleted. As result, there is no orphan volume in the storage backend.

Fix: https://github.com/kubernetes-csi/external-provisioner/pull/312
We need the fix in 4.2

Comment 1 Jan Safranek 2019-07-16 14:25:30 UTC
OCP 4.2 / OKD PR: https://github.com/openshift/csi-external-provisioner/pull/13

Comment 3 Chao Yang 2019-08-13 08:17:42 UTC
It is passed on the 4.2.0-0.nightly-2019-08-08-002434apiVersion: storage.k8s.io/v1

Dynamic create a 10000Gi csi ebs volume, and delete the pvc.

kind: StorageClass
metadata:
  name: test
parameters:
  fsType: ext4
  csi.storage.k8s.io/provisioner-secret-name: aws-creds
  csi.storage.k8s.io/provisioner-secret-namespace: kube-system
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10000Gi
  storageClassName: 'test'


oc create -f pvc-foo.yaml;oc delete pvc test;oc get pv -w
persistentvolumeclaim/test created
persistentvolumeclaim "test" deleted
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM          STORAGECLASS   REASON   AGE
pvc-d19f75e9-bda1-11e9-b187-0abf16ed4052   10000Gi    RWO            Delete           Pending   default/test   test                    1s
pvc-d19f75e9-bda1-11e9-b187-0abf16ed4052   10000Gi    RWO            Delete           Released   default/test   test                    1s
pvc-d19f75e9-bda1-11e9-b187-0abf16ed4052   10000Gi    RWO            Delete           Terminating   default/test   test                    1s
pvc-d19f75e9-bda1-11e9-b187-0abf16ed4052   10000Gi    RWO            Delete           Terminating   default/test   test                    1s

Comment 4 errata-xmlrpc 2019-10-16 06:29:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.