Created attachment 1438867 [details] template Description of problem: When creating a Deployment config where the pod uses a gluster PV provisioned by the default StorageClass, the PV is created in gluster, but it's not attached to the pod. 2 minutes after the pod creation, kubelet times out, and the volume manager attaches the volume attaches the PV to the pod. Pod creation: May 2 10:05:18 ocpdc1compute01p atomic-openshift-node: I0502 10:05:18.849615 4493 config.go:415] Receiving a new pod "fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3f1-005056a9d058)" May 2 10:05:18 ocpdc1compute01p atomic-openshift-node: I0502 10:05:18.862066 4493 volume_manager.go:340] Waiting for volumes to attach and mount for pod "fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3 f1-005056a9d058)" May 2 10:05:18 ocpdc1compute01p atomic-openshift-node: E0502 10:05:18.891092 4493 desired_state_of_world_populator.go:272] Error processing volume "fod-postgresql-test" for pod "fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3f1-005056a9d058)": error processing PVC "test4"/"fod-postgresql-test": PVC test4/fod-postgresql-test has non-bound phase ("Pending") or empty pvc.Spec.VolumeName ("") May 2 10:05:18 ocpdc1compute01p atomic-openshift-node: I0502 10:05:18.899146 4493 roundrobin.go:276] LoadBalancerRR: Setting endpoints for test4/glusterfs-dynamic-fod-postgresql-test: to [10.228.104.250:1 10.2 28.105.250:1 10.228.106.250:1] The volume already exists in gluster, but it's not attached until the timeout two minutes afterwards : May 2 10:07:21 ocpdc1compute01p atomic-openshift-node: E0502 10:07:21.862466 4493 kubelet.go:1594] Unable to mount volumes for pod "fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3f1-005056a9d058)": timeout expired waiting for volumes to attach/mount for pod "test4"/"fod-postgresql-test-1-pfpkd". list of unattached/unmounted volumes=[fod-postgresql-test]; skipping pod May 2 10:07:21 ocpdc1compute01p atomic-openshift-node: E0502 10:07:21.862524 4493 pod_workers.go:186] Error syncing pod 8de17442-4ddf-11e8-b3f1-005056a9d058 ("fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3f1-005056a9d058)"), skipping: timeout expired waiting for volumes to attach/mount for pod "test4"/"fod-postgresql-test-1-pfpkd". list of unattached/unmounted volumes=[fod-postgresql-test] May 2 10:07:21 ocpdc1compute01p atomic-openshift-node: I0502 10:07:21.862566 4493 server.go:351] Event(v1.ObjectReference{Kind:"Pod", Namespace:"test4", Name:"fod-postgresql-test-1-pfpkd", UID:"8de17442-4ddf-11e8-b3f1-005056a9d058", APIVersion:"v1", ResourceVersion:"4006258", FieldPath:""}): type: 'Warning' reason: 'FailedMount' Unable to mount volumes for pod "fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3f1-005056a9d058)": timeout expired waiting for volumes to attach/mount for pod "test4"/"fod-postgresql-test-1-pfpkd". list of unattached/unmounted volumes=[fod-postgresql-test] And then the volume is attached to the pod: May 2 10:07:36 ocpdc1compute01p atomic-openshift-node: I0502 10:07:36.835447 4493 volume_manager.go:340] Waiting for volumes to attach and mount for pod "fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3f1-005056a9d058)" May 2 10:07:37 ocpdc1compute01p atomic-openshift-node: I0502 10:07:37.206092 4493 server.go:351] Event(v1.ObjectReference{Kind:"Pod", Namespace:"test4", Name:"fod-postgresql-test-1-pfpkd", UID:"8de17442-4ddf-11e8-b3f1-005056a9d058", APIVersion:"v1", ResourceVersion:"4006258", FieldPath:""}): type: 'Normal' reason: 'SuccessfulMountVolume' MountVolume.SetUp succeeded for volume "pvc-8ae52dbf-4ddf-11e8-b3f1-005056a9d058" Version-Release number of selected component (if applicable): OpenShift 3.7.42-1 How reproducible: Randomly, it happens 40-50% of the attempts Steps to Reproduce: 1. Install Openshift with CNS 2. Create a new DC with a pod that uses CNS storage Actual results: The volume randomly fails to be attached to the pod, and it only succeeds after a 2 minutes timeout Expected results: The volume is attached successfully without any timeout Additional info: Attached the template used to reproduce the problem
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ May 2 10:07:21 ocpdc1compute01p atomic-openshift-node: E0502 10:07:21.862466 4493 kubelet.go:1594] Unable to mount volumes for pod "fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3f1-005056a9d058)": timeout expired waiting for volumes to attach/mount for pod "test4"/"fod-postgresql-test-1-pfpkd". list of unattached/unmounted volumes=[fod-postgresql-test]; skipping pod May 2 10:07:21 ocpdc1compute01p atomic-openshift-node: E0502 10:07:21.862524 4493 pod_workers.go:186] Error syncing pod 8de17442-4ddf-11e8-b3f1-005056a9d058 ("fod-postgresql-test-1-pfpkd_test4(8de17442-4ddf-11e8-b3f1-005056a9d058)"), skipping: timeout expired waiting for volumes to attach/mount for pod "test4"/"fod-postgresql-test-1-pfpkd". list of unattached/unmounted volumes=[fod-postgresql-test] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Do we have OCP logs and gluster logs of this timestamp? I would like to analyse the logs to find out why the mount failed.
Created attachment 1480400 [details] node logs
Created attachment 1480401 [details] glusterd log