1580024 – Error processing volume "xxx" for pod "yyy", PVC "xxx" has non-bound phase ("Pending") or empty pvc.Spec.VolumeName ("")

Bug 1580024 - Error processing volume "xxx" for pod "yyy", PVC "xxx" has non-bound phase ("Pending") or empty pvc.Spec.VolumeName ("")

Summary: Error processing volume "xxx" for pod "yyy", PVC "xxx" has non-bound phase ("...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	kubernetes
Sub Component:
Version:	rhgs-3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Humble Chirammal
QA Contact:	Prasanth
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	OCS-3.11.1-devel-triage-done 1642792
TreeView+	depends on / blocked

Reported:	2018-05-19 09:30 UTC by Alexis Solanas
Modified:	2021-12-10 16:11 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-05-02 10:08:24 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
template (4.01 KB, text/plain) 2018-05-19 09:30 UTC, Alexis Solanas	no flags	Details
node logs (8.30 MB, text/x-vhdl) 2018-09-02 22:39 UTC, Alexis Solanas	no flags	Details
glusterd log (16.88 KB, text/plain) 2018-09-02 22:39 UTC, Alexis Solanas	no flags	Details
View All

Description Alexis Solanas 2018-05-19 09:30:55 UTC

Created attachment 1438867 [details]
template

Description of problem:

 When creating a Deployment config where the pod uses a gluster PV provisioned by the default StorageClass, the PV is created in gluster, but it's not attached to the pod. 

 2 minutes after the pod creation, kubelet times out, and the volume manager attaches the volume attaches the PV to the pod. 


 Pod creation:

May  2 10:05:18 ocpdc1compute01p atomic-openshift-node: I0502 10:05:18.849615    4493 config.go:415] Receiving a new pod "fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3f1-005056a9d058)"

May  2 10:05:18 ocpdc1compute01p atomic-openshift-node: I0502 10:05:18.862066    4493 volume_manager.go:340] Waiting for volumes to attach and mount for pod "fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3
f1-005056a9d058)"
May  2 10:05:18 ocpdc1compute01p atomic-openshift-node: E0502 10:05:18.891092    4493 desired_state_of_world_populator.go:272] Error processing volume "fod-postgresql-test" for pod "fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3f1-005056a9d058)": error processing PVC "test4"/"fod-postgresql-test": PVC test4/fod-postgresql-test has non-bound phase ("Pending") or empty pvc.Spec.VolumeName ("")
May  2 10:05:18 ocpdc1compute01p atomic-openshift-node: I0502 10:05:18.899146    4493 roundrobin.go:276] LoadBalancerRR: Setting endpoints for test4/glusterfs-dynamic-fod-postgresql-test: to [10.228.104.250:1 10.2
28.105.250:1 10.228.106.250:1]

 The volume already exists in gluster, but it's not attached until the timeout two minutes afterwards :

May  2 10:07:21 ocpdc1compute01p atomic-openshift-node: E0502 10:07:21.862466    4493 kubelet.go:1594] Unable to mount volumes for pod "fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3f1-005056a9d058)": timeout expired waiting for volumes to attach/mount for pod "test4"/"fod-postgresql-test-1-pfpkd". list of unattached/unmounted volumes=[fod-postgresql-test]; skipping pod
May  2 10:07:21 ocpdc1compute01p atomic-openshift-node: E0502 10:07:21.862524    4493 pod_workers.go:186] Error syncing pod 8de17442-4ddf-11e8-b3f1-005056a9d058 ("fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3f1-005056a9d058)"), skipping: timeout expired waiting for volumes to attach/mount for pod "test4"/"fod-postgresql-test-1-pfpkd". list of unattached/unmounted volumes=[fod-postgresql-test]
May  2 10:07:21 ocpdc1compute01p atomic-openshift-node: I0502 10:07:21.862566    4493 server.go:351] Event(v1.ObjectReference{Kind:"Pod", Namespace:"test4", Name:"fod-postgresql-test-1-pfpkd", UID:"8de17442-4ddf-11e8-b3f1-005056a9d058", APIVersion:"v1", ResourceVersion:"4006258", FieldPath:""}): type: 'Warning' reason: 'FailedMount' Unable to mount volumes for pod "fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3f1-005056a9d058)": timeout expired waiting for volumes to attach/mount for pod "test4"/"fod-postgresql-test-1-pfpkd". list of unattached/unmounted volumes=[fod-postgresql-test]


 And then the volume is attached to the pod:

May  2 10:07:36 ocpdc1compute01p atomic-openshift-node: I0502 10:07:36.835447    4493 volume_manager.go:340] Waiting for volumes to attach and mount for pod "fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3f1-005056a9d058)"
May  2 10:07:37 ocpdc1compute01p atomic-openshift-node: I0502 10:07:37.206092    4493 server.go:351] Event(v1.ObjectReference{Kind:"Pod", Namespace:"test4", Name:"fod-postgresql-test-1-pfpkd", UID:"8de17442-4ddf-11e8-b3f1-005056a9d058", APIVersion:"v1", ResourceVersion:"4006258", FieldPath:""}): type: 'Normal' reason: 'SuccessfulMountVolume' MountVolume.SetUp succeeded for volume "pvc-8ae52dbf-4ddf-11e8-b3f1-005056a9d058"




Version-Release number of selected component (if applicable):

 OpenShift 3.7.42-1


How reproducible:

 Randomly, it happens 40-50% of the attempts


Steps to Reproduce:

1. Install Openshift with CNS
2. Create a new DC with a pod that uses CNS storage


Actual results:

 The volume randomly fails to be attached to the pod, and it only succeeds after a 2 minutes timeout

Expected results:

 The volume is attached successfully without any timeout


Additional info:

 Attached the template used to reproduce the problem

Comment 5 Humble Chirammal 2018-08-02 08:26:38 UTC

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May  2 10:07:21 ocpdc1compute01p atomic-openshift-node: E0502 10:07:21.862466    4493 kubelet.go:1594] Unable to mount volumes for pod "fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3f1-005056a9d058)": timeout expired waiting for volumes to attach/mount for pod "test4"/"fod-postgresql-test-1-pfpkd". list of unattached/unmounted volumes=[fod-postgresql-test]; skipping pod
May  2 10:07:21 ocpdc1compute01p atomic-openshift-node: E0502 10:07:21.862524    4493 pod_workers.go:186] Error syncing pod 8de17442-4ddf-11e8-b3f1-005056a9d058 ("fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3f1-005056a9d058)"), skipping: timeout expired waiting for volumes to attach/mount for pod "test4"/"fod-postgresql-test-1-pfpkd". list of unattached/unmounted volumes=[fod-postgresql-test]

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Do we have OCP logs and gluster logs of this timestamp? I would like to analyse the logs to find out why the mount failed.

Comment 9 Alexis Solanas 2018-09-02 22:39:23 UTC

Created attachment 1480400 [details]
node logs

Comment 10 Alexis Solanas 2018-09-02 22:39:48 UTC

Created attachment 1480401 [details]
glusterd log

Note You need to log in before you can comment on or make changes to this bug.