Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1484475 - Improve error messages for FailedMount
Improve error messages for FailedMount
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod (Show other bugs)
3.5.0
Unspecified Unspecified
unspecified Severity medium
: ---
: 3.7.0
Assigned To: Seth Jennings
weiwei jiang
:
Depends On:
Blocks: 1542093
  Show dependency treegraph
 
Reported: 2017-08-23 13:09 EDT by Thom Carlin
Modified: 2018-02-05 09:56 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-11-28 17:07:41 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-28 21:34:54 EST

  None (edit)
Description Thom Carlin 2017-08-23 13:09:59 EDT
Description of problem:

When mounts fail for containers, there is not enough information to easily understand what the cause is.

Version-Release number of selected component (if applicable):

3.5

How reproducible:

100% for a FailedMount

Steps to Reproduce:
1. Start a pod resulting in Failedmount
2. Try to diagnose using "oc" commands

Actual results:

The mount fails but no clear reason why

Expected results:

Clear understanding of why the mount failed.

Additional info:

This occurred for me with CNS (Gluster).  I believe that https://github.com/kubernetes/kubernetes/pull/42006 may help:
"This fixes the problem of mount errors being eaten and not displayed to users again. Specifically erros caught in MountVolume.NewMounter (like missing endpoints, etc...)

Current behavior for any mount failure:

Events:
  FirstSeen    LastSeen    Count    From            SubObjectPath    Type        Reason        Message
  ---------    --------    -----    ----            -------------    --------    ------        -------
  12m        12m        1    default-scheduler            Normal        Scheduled    Successfully assigned glusterfs-bb-pod1 to 127.0.0.1
  10m        1m        5    kubelet, 127.0.0.1            Warning        FailedMount    Unable to mount volumes for pod "glusterfs-bb-pod1_default(67c9dfa7-f9f5-11e6-aee2-5254003a59cf)": timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs-bb-pod1". list of unattached/unmounted volumes=[glusterfsvol]
  10m        1m        5    kubelet, 127.0.0.1            Warning        FailedSync    Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs-bb-pod1". list of unattached/unmounted volumes=[glusterfsvol]
New Behavior:

For example on glusterfs - deliberately didn't create endpoints, now correct message is displayed:

Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  2m		2m		1	default-scheduler			Normal		Scheduled	Successfully assigned glusterfs-bb-pod1 to 127.0.0.1
  54s		54s		1	kubelet, 127.0.0.1			Warning		FailedMount	Unable to mount volumes for pod "glusterfs-bb-pod1_default(8edd2c25-fa09-11e6-92ae-5254003a59cf)": timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs-bb-pod1". With error timed out waiting for the condition. list of unattached/unmounted volumes=[glusterfsvol]
  54s		54s		1	kubelet, 127.0.0.1			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs-bb-pod1". With error timed out waiting for the condition. list of unattached/unmounted volumes=[glusterfsvol]
  2m		6s		814	kubelet, 127.0.0.1			Warning		FailedMount	MountVolume.NewMounter failed for volume "kubernetes.io/glusterfs/8edd2c25-fa09-11e6-92ae-5254003a59cf-glusterfsvol" (spec.Name: "glusterfsvol") pod "8edd2c25-fa09-11e6-92ae-5254003a59cf" (UID: "8edd2c25-fa09-11e6-92ae-5254003a59cf") with: endpoints "glusterfs-cluster" not found"
Comment 1 Seth Jennings 2017-08-25 15:43:05 EDT
This change is already in origin/master via the kube 1.7 rebase.  The question is should we pick it to 3.6.

There has been a lot of focus in 3.6 on reducing API server requests/load.  This generates yet another event on a per failing pod basis.  It does provide useful information to the user, however.

Let me ask around and see what we want to do.
Comment 2 Seth Jennings 2017-08-25 15:56:33 EDT
fixed in 3.7 via rebase with https://github.com/kubernetes/kubernetes/pull/42006
Comment 3 weiwei jiang 2017-08-31 06:19:22 EDT
Checked with:
# oc version 
oc v3.7.0-0.123.0
kubernetes v1.7.0+695f48a16f
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server xxx
openshift v3.7.0-0.123.0
kubernetes v1.7.0+695f48a16f

And error messages for FailedMount are more clear now.
Comment 6 errata-xmlrpc 2017-11-28 17:07:41 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.