Bug 1347212

Summary: Long time to delete the persist volume with reclaim policy "Delete"
Product: OpenShift Container Platform Reporter: Wenqi He <wehe>
Component: StorageAssignee: hchen
Status: CLOSED ERRATA QA Contact: Wenqi He <wehe>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.0CC: aos-bugs, bchilds, jhou, lxia, mawong, tdawson, wehe
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-27 09:34:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Wenqi He 2016-06-16 10:04:06 UTC
Description of problem:
It takes long time to delete the PV with reclaim policy of "Delete"

Version-Release number of selected component (if applicable):
openshift v3.2.1.2-1-g64665fc
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

How reproducible:
Always

Steps to Reproduce:
1.Create a project
2.Create a pv with the reclaim policy of "Delete"
oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/persistent-volumes/cinder/pv-rwo-delete.json
3. Create a pvc
oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/persistent-volumes/cinder/pvc-rwo.json
4. Create a pod using this claim 
oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/persistent-volumes/cinder/pod.json
4. Enter the pod and create files on mounted dir
oc exec cinderpd -it sh
touch /mnt/cinder/wehe
5. Delete the pod and pvc
6  Check the PV status

Actual results:
The PV takes long time to be deleted 

NAME      CAPACITY ACCESSMODES  STATUS     CLAIM            REASON AGE
cin-npilt 5Gi      RWO          Released   npilt/cpvc-npilt        16m

and log with journalctl -u atomic-openshift-master | grep cin-npilt

Jun 16 04:25:13 openshift-119.lab.eng.nay.redhat.com atomic-openshift-master[37708]: I0616 04:25:13.779537   37708 persistentvolume_claim_binder_controller.go:345] PersistentVolume[cin-npilt] changing phase from Bound to Released
Jun 16 04:25:13 openshift-119.lab.eng.nay.redhat.com atomic-openshift-master[37708]: I0616 04:25:13.817090   37708 persistentvolume_provisioner_controller.go:228] PersistentVolume[cin-npilt] reconciling
Jun 16 04:25:13 openshift-119.lab.eng.nay.redhat.com atomic-openshift-master[37708]: I0616 04:25:13.818246   37708 persistentvolume_claim_binder_controller.go:194] Synchronizing PersistentVolume[cin-npilt], current phase: Released
Jun 16 04:25:13 openshift-119.lab.eng.nay.redhat.com atomic-openshift-master[37708]: I0616 04:25:13.818569   37708 persistentvolume_recycler_controller.go:158] Recycler: checking PersistentVolume[cin-npilt]
Jun 16 04:25:13 openshift-119.lab.eng.nay.redhat.com atomic-openshift-master[37708]: E0616 04:25:13.823908   37708 persistentvolume_provisioner_controller.go:124] Error reconciling volume cin-npilt: PersistentVolumeClaim[npilt/cpvc-npilt] not found in local cache
Jun 16 04:25:13 openshift-119.lab.eng.nay.redhat.com atomic-openshift-master[37708]: I0616 04:25:13.826134   37708 persistentvolume_recycler_controller.go:142] PersistentVolume[cin-npilt] not found in local cache, recycling
Jun 16 04:25:13 openshift-119.lab.eng.nay.redhat.com atomic-openshift-master[37708]: I0616 04:25:13.826158   37708 persistentvolume_recycler_controller.go:168] Reclaiming PersistentVolume[cin-npilt]
Jun 16 04:25:13 openshift-119.lab.eng.nay.redhat.com atomic-openshift-master[37708]: I0616 04:25:13.826168   37708 persistentvolume_recycler_controller.go:271] Deleting PersistentVolume[cin-npilt]
Jun 16 04:25:13 openshift-119.lab.eng.nay.redhat.com atomic-openshift-master[37708]: E0616 04:25:13.897948   37708 persistentvolume_recycler_controller.go:294] PersistentVolume[cin-npilt] failed deletion: Expected HTTP response code [202 204] when accessing [DELETE http://10.66.147.18:8776/v1/19fd3f342fbb48ff8de82bd8c49c7667/volumes/068fd9b7-b9c8-420c-ae16-6b3899100a35], but got 400 instead

Expected results:
The PV can be deleted quickly without errors in the log

Additional info:
From the log, I suspect it delete the volume firstly and unable to find it with the deletion of PV

Comment 1 Jianwei Hou 2016-06-21 03:11:16 UTC
I have another way to reproduce it. It also takes a long time to delete the volume.

1. Create a dynamic PVC which provisions and binds a PV
2. Create a Pod referencing this PVC
3. When the Pod is running, delete this PVC
4. Get Pod status, Pod is still 'Running', volume is still attached to the node. But PVC and PV are deleted.
5. Delete the Pod and wait to see whether the provisioned volume will be deleted from the IaaS console.

Result: Volume will eventually be deleted, but it takes a pretty long time(over 15 minutes).

Comment 2 hchen 2016-08-08 18:05:34 UTC
this is likely fixed in kubernetes 1.3 with the new persistent volume controller, where sync time is much shorter.

Comment 3 Matthew Wong 2016-08-08 21:50:16 UTC
Could not reproduce in kubernetes 1.3, pv was already deleted by the time I did get pv after deleting pod & pvc.

Comment 4 Wenqi He 2016-08-09 02:34:51 UTC
Verified with below version:
openshift v3.3.0.16
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

This bug is fixed, after I delete the pod and pvc, the pv is deleted by the time I deleted pvc. Thanks.

Comment 5 Bradley Childs 2016-08-15 15:37:32 UTC
Moving to verified per @Wenqi He comments.

Comment 7 errata-xmlrpc 2016-09-27 09:34:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933