Description of problem:
Sometimes, the provisioned PVs/volumes aren't getting automatically deleted after their PVCs are deleted on OpenStack. On nodes, these cinder volumes are all unmounted, but they are still attached to the nodes on OpenStack console.
Version-Release number of selected component (if applicable):
This is reproduced 3 times on two OpenStack environments recently.
Steps to Reproduce:
1. Keep creating bunch of dynamic PVs + Pods fast. I have seen 7 volumes stuck at 'attached' status, so I think we have to create 14 or more to reproduce.
2. Delete Pods and PVCs
3. List PVs
After step 3: Many PVs stuck at 'Released' status. They are not being deleted.
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE
pvc-0799c71e-ee8b-11e6-b41f-fa163e86be01 1Gi RWO Delete Released 9l2i0/dynamic-pvc1-9l2i0 1h
pvc-081d884a-ee8b-11e6-b41f-fa163e86be01 2Gi RWX Delete Released 9l2i0/dynamic-pvc2-9l2i0 1h
pvc-08abe1a4-ee8b-11e6-b41f-fa163e86be01 3Gi ROX Delete Released 9l2i0/dynamic-pvc3-9l2i0 1h
pvc-3db0c938-ee8d-11e6-b41f-fa163e86be01 1Gi RWO Delete Released htsdg/htsdg-mked7c3k 57m
pvc-48c38a94-ee8a-11e6-81ba-fa163e86be01 1Gi RWO Delete Released rs8ap/dynamic-pvc1-rs8ap 1h
pvc-49481acd-ee8a-11e6-81ba-fa163e86be01 2Gi RWX Delete Released rs8ap/dynamic-pvc2-rs8ap 1h
pvc-49c703d5-ee8a-11e6-81ba-fa163e86be01 3Gi ROX Delete Released rs8ap/dynamic-pvc3-rs8ap 1h
pvc-56ccafc2-ee90-11e6-b41f-fa163e86be01 1Gi RWO Delete Released ni9ml/pvcsc 35m
pvc-67b7b6a8-ee8c-11e6-b41f-fa163e86be01 1Gi RWO Delete Released 0udxv/0udxv-mddly5nl 1h
pvc-7825189a-ee90-11e6-b41f-fa163e86be01 1Gi RWO Delete Released jhou/cinderc 34m
pvc-7d1d3e21-ee8c-11e6-b41f-fa163e86be01 1Gi RWO Delete Released 34whw/34whw-0x66ulvo 1h
pvc-916d6432-ee8c-11e6-b41f-fa163e86be01 1Gi RWO Delete Released 57lw3/57lw3-kxt1-gvl 1h
pvc-f514f8ed-ee8c-11e6-b41f-fa163e86be01 1Gi RWO Delete Released az5wb/az5wb-po48xolh
On OpenStack console, these volumes were shown as 'Attached to xxx host on /dev/xxx'. But on nodes, none of these volumes were attached. They should already had been unmounted.
PVs/Volumes should be deleted.
Log file https://drive.google.com/a/redhat.com/file/d/0B8NAD8stzqvnNzBxdWZOTFdvMzg/view?usp=sharing
Created attachment 1250268 [details]
I reproduced the bug (simply creating PVC + pod + deleting both, no load was required). Attached logs starts with provisioning of the volume.
Cinder volume ID: 11a4faea-bfc7-4713-88b3-dec492480dba
At 07:57:25.475258, the pod is created.
At 07:59:03.266783, the pod is deleted (i.e. probably enters Terminating phase).
At 07:59:33.871363, the pod is really deleted.
In the log, I can frequently see that the volume is marked as detached even though the pod is still running and the volume is still attached!
I0214 07:58:16.054214 77288 reconciler.go:107] Starting reconciling attached volumes still attached
I0214 07:58:16.154409 77288 reconciler.go:206] Volume "kubernetes.io/cinder/11a4faea-bfc7-4713-88b3-dec492480dba"/Node "host-8-175-110.host.centralci.eng.rdu2.redhat.com" is attached--touching.
I0214 07:58:16.235672 77288 openstack_volumes.go:122] 11a4faea-bfc7-4713-88b3-dec492480dba kubernetes-dynamic-pvc-1fa0e8b4-f2b5-11e6-a8bb-fa163ecb84eb [map[id:11a4faea-bfc7-4713-88b3-dec492480dba server_id:212f723e-513b-4f00-a275-38192ea79fa6 attachment_id:a46be665-35d2-410e-bd3f-e0fd8a100b23 host_name:<nil> volume_id:11a4faea-bfc7-4713-88b3-dec492480dba device:/dev/vdr]]
I0214 07:58:16.235705 77288 attacher.go:140] VolumesAreAttached: check volume "11a4faea-bfc7-4713-88b3-dec492480dba" (specName: "pvc-1fa0e8b4-f2b5-11e6-a8bb-fa163ecb84eb") is no longer attached
I0214 07:58:16.235718 77288 operation_executor.go:565] VerifyVolumesAreAttached determined volume "kubernetes.io/cinder/11a4faea-bfc7-4713-88b3-dec492480dba" (spec.Name: "pvc-1fa0e8b4-f2b5-11e6-a8bb-fa163ecb84eb") is no longer attached to node %!q(MISSING), therefore it was marked as detached.
The same happens also when the pod is finally deleted and the volume should be detached - it's "marked as detached", but not actually detached from the node.
It could be fixed by https://github.com/kubernetes/kubernetes/pull/39998/, let me check...
Confirmed, upstream #39998 fixed the issue for me. Backported to origin as https://bugzilla.redhat.com/show_bug.cgi?id=1420645
Jan meant https://github.com/openshift/origin/pull/12955
Oh, thanks for correction.
The PR got merged and will be available in the next build.
Actually, the merge failed and needs someone to look at it.
When the merge succeeds, please move this back to MODIFIED.
Yeah, I'm sorry about that. It's merged now.
This has been merged into ocp and is in OCP v184.108.40.206 or newer.
Verified this is fixed on
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.