1465753 – Cinder volume not being unbound of the pvc when the previous pod is deleted

Bug 1465753 - Cinder volume not being unbound of the pvc when the previous pod is deleted

Summary: Cinder volume not being unbound of the pvc when the previous pod is deleted

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	3.4.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.7.0
Assignee:	Jan Safranek
QA Contact:	Jianwei Hou
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-28 06:56 UTC by Nicolas Nosenzo
Modified:	2020-08-13 09:29 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-07-12 14:28:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Nicolas Nosenzo 2017-06-28 06:56:36 UTC

Description of problem:

When redeploying the pod, container stands at ContainerCreating, the volume is not unbound of the pvc when the previous pod is deleted:

Jun 19 14:25:51 osbeta-rtp-master0 atomic-openshift-master-api: I0619 14:25:51.116038   89051 trace.go:61] Trace "Delete /api/v1/namespaces/fi-gf-or/persistentvolumeclaims/gv1-pvc-rhub1" (started 2017-06-19 14:25:50.716046621 -0400 EDT):
Jun 19 14:25:51 osbeta-rtp-master0 atomic-openshift-master-api: "Delete /api/v1/namespaces/fi-gf-or/persistentvolumeclaims/gv1-pvc-rhub1" [399.861166ms] [258.542Âµs] END
Jun 20 23:21:04 osbeta-rtp-master0 atomic-openshift-master-controllers: I0620 23:21:04.716009   73128 pv_controller.go:389] synchronizing PersistentVolume[pvc-dc15f42e-52cc-11e7-b8f8-fa163ee805fd]: phase: Failed, bound to: "fi-gf-or/gv1-pvc-rhub1 (uid: dc15f42e-52cc-11e7-b8f8-fa163ee805fd)", boundByController: true

Version-Release number of selected component (if applicable):
- oc v3.4.1.18
- OSP 8 - Liberty
- What Cinder API version(s) is being used (v1, v2, v3)?
v1 and v2

How reproducible:
Sometimes

Steps to Reproduce:
1. Deploy a pod using the dynamic Cinder storage
2. Re-deploy the pod
3. -

Actual results:
Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "rhub1"/"fi-gf-or". list of unattached/unmounted volumes=[log]


Expected results:
Pod to be correctly deployed.

Additional info:


Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 2 Nicolas Nosenzo 2017-06-28 08:40:36 UTC

The pvc definition:

$ cat gv1-pvc-rhub1.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    volume.alpha.kubernetes.io/storage-class: dynamic
  name: gv1-pvc-rhub1
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 500Mi

Comment 3 Jan Safranek 2017-06-28 16:07:38 UTC

> When redeploying the pod, container stands at ContainerCreating, the volume is not unbound of the pvc when the previous pod is deleted

This makes no sense to me. PVCs are not bound to pods, they're bound to PVs. Why would the customer unbind their PVC from a PV? How does it relate to redeploying a pod?

Can you please get some steps to reproduce from the customer? What did they create first, what second, how did they "redeploy" their pod? In the log in the support ticket I can see this:

Jun 20 23:27:34 osbeta-rtp-master0 atomic-openshift-master-controllers: I0620 23:27:34.737333   73128 pv_controller.go:436] synchronizing PersistentVolume[pvc-dc15f42e-52cc-11e7-b8f8-fa163ee805fd]: claim fi-gf-or/gv1-pvc-rhub1 has different UID, the old one must have been deleted

That means that they created a PVC and it got bound to a dynamically provisioned PV. Then they deleted the PVC and created a new one with the same name. While there is nothing really bad about this, it shows that there were some not-so-standard actions performed, probably trying to fix the problem.

In addition, this dynamically provisioned PV should have been automatically deleted when they deleted the first PVC. It is 'Failed' instead, meaning that deletion did not succeed, but I don't know why.

To sum it up, there is something weird going on and we need further details.

Please get:

oc get pod -o yaml && oc describe pod (so we can see what pod uses what PVC and states of the pods)
oc get pvc -o yaml && oc describe pvc (so we can see all the PVCs referenced by the pods and their states)
oc get pv -o yaml && oc get describe pv (so we can see PVs and their states)

And full logs from master and the node that can't run the problematic pod would be very helpful too - just the snippet the customer attached shows that there is something odd going on with PVCs, however it does show why a node can't run a pod.

Tarball with yaml that the customer provided is nice, but it does not show *current* status of the system, just the initial one, that's why we need oc get * -o yaml.

Note You need to log in before you can comment on or make changes to this bug.