1484475 – Improve error messages for FailedMount

Bug 1484475 - Improve error messages for FailedMount

Summary: Improve error messages for FailedMount

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.7.0
Assignee:	Seth Jennings
QA Contact:	weiwei jiang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1724792
TreeView+	depends on / blocked

Reported:	2017-08-23 17:09 UTC by Thom Carlin
Modified:	2019-06-28 16:04 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2017-11-28 22:07:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:3188	0	normal	SHIPPED_LIVE	Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update	2017-11-29 02:34:54 UTC

Description Thom Carlin 2017-08-23 17:09:59 UTC

Description of problem:

When mounts fail for containers, there is not enough information to easily understand what the cause is.

Version-Release number of selected component (if applicable):

3.5

How reproducible:

100% for a FailedMount

Steps to Reproduce:
1. Start a pod resulting in Failedmount
2. Try to diagnose using "oc" commands

Actual results:

The mount fails but no clear reason why

Expected results:

Clear understanding of why the mount failed.

Additional info:

This occurred for me with CNS (Gluster).  I believe that https://github.com/kubernetes/kubernetes/pull/42006 may help:
"This fixes the problem of mount errors being eaten and not displayed to users again. Specifically erros caught in MountVolume.NewMounter (like missing endpoints, etc...)

Current behavior for any mount failure:

Events:
  FirstSeen    LastSeen    Count    From            SubObjectPath    Type        Reason        Message
  ---------    --------    -----    ----            -------------    --------    ------        -------
  12m        12m        1    default-scheduler            Normal        Scheduled    Successfully assigned glusterfs-bb-pod1 to 127.0.0.1
  10m        1m        5    kubelet, 127.0.0.1            Warning        FailedMount    Unable to mount volumes for pod "glusterfs-bb-pod1_default(67c9dfa7-f9f5-11e6-aee2-5254003a59cf)": timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs-bb-pod1". list of unattached/unmounted volumes=[glusterfsvol]
  10m        1m        5    kubelet, 127.0.0.1            Warning        FailedSync    Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs-bb-pod1". list of unattached/unmounted volumes=[glusterfsvol]
New Behavior:

For example on glusterfs - deliberately didn't create endpoints, now correct message is displayed:

Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  2m		2m		1	default-scheduler			Normal		Scheduled	Successfully assigned glusterfs-bb-pod1 to 127.0.0.1
  54s		54s		1	kubelet, 127.0.0.1			Warning		FailedMount	Unable to mount volumes for pod "glusterfs-bb-pod1_default(8edd2c25-fa09-11e6-92ae-5254003a59cf)": timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs-bb-pod1". With error timed out waiting for the condition. list of unattached/unmounted volumes=[glusterfsvol]
  54s		54s		1	kubelet, 127.0.0.1			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"glusterfs-bb-pod1". With error timed out waiting for the condition. list of unattached/unmounted volumes=[glusterfsvol]
  2m		6s		814	kubelet, 127.0.0.1			Warning		FailedMount	MountVolume.NewMounter failed for volume "kubernetes.io/glusterfs/8edd2c25-fa09-11e6-92ae-5254003a59cf-glusterfsvol" (spec.Name: "glusterfsvol") pod "8edd2c25-fa09-11e6-92ae-5254003a59cf" (UID: "8edd2c25-fa09-11e6-92ae-5254003a59cf") with: endpoints "glusterfs-cluster" not found"

Comment 1 Seth Jennings 2017-08-25 19:43:05 UTC

This change is already in origin/master via the kube 1.7 rebase.  The question is should we pick it to 3.6.

There has been a lot of focus in 3.6 on reducing API server requests/load.  This generates yet another event on a per failing pod basis.  It does provide useful information to the user, however.

Let me ask around and see what we want to do.

Comment 2 Seth Jennings 2017-08-25 19:56:33 UTC

fixed in 3.7 via rebase with https://github.com/kubernetes/kubernetes/pull/42006

Comment 3 weiwei jiang 2017-08-31 10:19:22 UTC

Checked with:
# oc version 
oc v3.7.0-0.123.0
kubernetes v1.7.0+695f48a16f
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server xxx
openshift v3.7.0-0.123.0
kubernetes v1.7.0+695f48a16f

And error messages for FailedMount are more clear now.

Comment 6 errata-xmlrpc 2017-11-28 22:07:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.