Description of problem: When mounts fail for containers, there is not enough information to easily understand what the cause is. Version-Release number of selected component (if applicable): 3.5 How reproducible: 100% for a FailedMount Steps to Reproduce: 1. Start a pod resulting in Failedmount 2. Try to diagnose using "oc" commands Actual results: The mount fails but no clear reason why Expected results: Clear understanding of why the mount failed. Additional info: This occurred for me with CNS (Gluster). I believe that https://github.com/kubernetes/kubernetes/pull/42006 may help: "This fixes the problem of mount errors being eaten and not displayed to users again. Specifically erros caught in MountVolume.NewMounter (like missing endpoints, etc...) Current behavior for any mount failure: Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 12m 12m 1 default-scheduler Normal Scheduled Successfully assigned glusterfs-bb-pod1 to 127.0.0.1 10m 1m 5 kubelet, 127.0.0.1 Warning FailedMount Unable to mount volumes for pod "glusterfs-bb-pod1_default(67c9dfa7-f9f5-11e6-aee2-5254003a59cf)": timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs-bb-pod1". list of unattached/unmounted volumes=[glusterfsvol] 10m 1m 5 kubelet, 127.0.0.1 Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs-bb-pod1". list of unattached/unmounted volumes=[glusterfsvol] New Behavior: For example on glusterfs - deliberately didn't create endpoints, now correct message is displayed: Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 2m 2m 1 default-scheduler Normal Scheduled Successfully assigned glusterfs-bb-pod1 to 127.0.0.1 54s 54s 1 kubelet, 127.0.0.1 Warning FailedMount Unable to mount volumes for pod "glusterfs-bb-pod1_default(8edd2c25-fa09-11e6-92ae-5254003a59cf)": timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs-bb-pod1". With error timed out waiting for the condition. list of unattached/unmounted volumes=[glusterfsvol] 54s 54s 1 kubelet, 127.0.0.1 Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs-bb-pod1". With error timed out waiting for the condition. list of unattached/unmounted volumes=[glusterfsvol] 2m 6s 814 kubelet, 127.0.0.1 Warning FailedMount MountVolume.NewMounter failed for volume "kubernetes.io/glusterfs/8edd2c25-fa09-11e6-92ae-5254003a59cf-glusterfsvol" (spec.Name: "glusterfsvol") pod "8edd2c25-fa09-11e6-92ae-5254003a59cf" (UID: "8edd2c25-fa09-11e6-92ae-5254003a59cf") with: endpoints "glusterfs-cluster" not found"
This change is already in origin/master via the kube 1.7 rebase. The question is should we pick it to 3.6. There has been a lot of focus in 3.6 on reducing API server requests/load. This generates yet another event on a per failing pod basis. It does provide useful information to the user, however. Let me ask around and see what we want to do.
fixed in 3.7 via rebase with https://github.com/kubernetes/kubernetes/pull/42006
Checked with: # oc version oc v3.7.0-0.123.0 kubernetes v1.7.0+695f48a16f features: Basic-Auth GSSAPI Kerberos SPNEGO Server xxx openshift v3.7.0-0.123.0 kubernetes v1.7.0+695f48a16f And error messages for FailedMount are more clear now.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188