1360236 – No obvious/verbose error message when NFS volume failed to mount

Bug 1360236 - No obvious/verbose error message when NFS volume failed to mount

Summary: No obvious/verbose error message when NFS volume failed to mount

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	3.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Erin Boyd
QA Contact:	Liang Xia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-07-26 10:10 UTC by Liang Xia
Modified:	2017-03-08 18:43 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-01-18 12:51:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:0066	0	normal	SHIPPED_LIVE	Red Hat OpenShift Container Platform 3.4 RPM Release Advisory	2017-01-18 17:23:26 UTC

Description Liang Xia 2016-07-26 10:10:00 UTC

Description of problem:
Since NFS can fail to mount with various reasons, it is really hard to 
debug without obvious/verbose error messages.

Version-Release number of selected component (if applicable):
openshift v3.3.0.10
kubernetes v1.3.0+57fb9ac
etcd 2.3.0+git

How reproducible:
Always

Steps to Reproduce:
1. Prepare a nfs server.
2. Create a PV using above nfs storage, but with invalid export path.
e.g, non-exist path
3. Create a PVC.
4. Check PV/PVC status.
5. Create a pod using above PVC.
6. Check the pod detail.
oc describe pod podname

Actual results:
No obvious/verbose error message why it failed to mount.

Expected results:
Show more obvious/verbose error message for easier debug.

Additional info:
# oc describe pod mypod-o4702 -n o4702
Name:            mypod-o4702
Namespace:        o4702
Security Policy:    restricted
Node:            ip-172-18-3-22.ec2.internal/172.18.3.22
Start Time:        Tue, 26 Jul 2016 03:24:43 -0400
Labels:            name=frontendhttp
Status:            Pending
IP:            
Controllers:        <none>
Containers:
  myfrontend:
    Container ID:    
    Image:        aosqe/nfs-web
    Image ID:        
    Port:        80/TCP
    State:        Waiting
      Reason:        ContainerCreating
    Ready:        False
    Restart Count:    0
    Volume Mounts:
      /mnt from nfs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-u8ut5 (ro)
    Environment Variables:    <none>
Conditions:
  Type        Status
  Initialized     True 
  Ready     False 
  PodScheduled     True 
Volumes:
  nfs:
    Type:    PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:    nfsc-o4702
    ReadOnly:    false
  default-token-u8ut5:
    Type:    Secret (a volume populated by a Secret)
    SecretName:    default-token-u8ut5
QoS Tier:    BestEffort
Events:
  FirstSeen    LastSeen    Count    From                    SubobjectPath    Type        Reason        Message
  ---------    --------    -----    ----                    -------------    --------    ------        -------
  3m        3m        1    {default-scheduler }                    Normal        Scheduled    Successfully assigned mypod-o4702 to ip-172-18-3-22.ec2.internal
  1m        1m        1    {kubelet ip-172-18-3-22.ec2.internal}            Warning        FailedMount    Unable to mount volumes for pod "mypod-o4702_o4702(06085133-5302-11e6-bc71-0e147d09b24f)": timeout expired waiting for volumes to attach/mount for pod "mypod-o4702"/"o4702". list of unattached/unmounted volumes=[nfs]
  1m        1m        1    {kubelet ip-172-18-3-22.ec2.internal}            Warning        FailedSync    Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "mypod-o4702"/"o4702". list of unattached/unmounted volumes=[nfs]

Comment 1 Bradley Childs 2016-08-03 14:09:50 UTC

https://github.com/kubernetes/kubernetes/pull/27778

Comment 2 Erin Boyd 2016-08-16 20:10:12 UTC

This will be fixed in the next release. There is already a LGTM on the current PR.

Comment 4 Eric Paris 2016-08-17 18:31:36 UTC

This is going to be fixed in kubernetes-1.4. We do not plan to backport to openshift-3.3. UpcomingReleasing

Comment 5 Erin Boyd 2016-10-26 19:35:30 UTC

please retest

Comment 6 Liang Xia 2016-10-28 05:42:18 UTC

Tried the same step as in #comment 0, still no obvious error message why it failed(the error message is the same as in #comment 0).

# openshift version
openshift v3.4.0.16+cc70b72
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

# oc describe pods mypod-od6vc -n od6vc
Name:			mypod-od6vc
Namespace:		od6vc
Security Policy:	restricted
Node:			qe-lxia-ocp34-node-registry-router-1/10.240.0.4
Start Time:		Fri, 28 Oct 2016 01:36:30 -0400
Labels:			name=frontendhttp
Status:			Pending
IP:			
Controllers:		<none>
Containers:
  myfrontend:
    Container ID:	
    Image:		aosqe/nfs-web
    Image ID:		
    Port:		80/TCP
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Volume Mounts:
      /mnt from nfs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-ia0ce (ro)
    Environment Variables:	<none>
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  nfs:
    Type:	PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:	nfsc-od6vc
    ReadOnly:	false
  default-token-ia0ce:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-ia0ce
QoS Class:	BestEffort
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From						SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----						-------------	--------	------		-------
  3m		3m		1	{default-scheduler }						Normal		Scheduled	Successfully assigned mypod-od6vc to
qe-lxia-ocp34-node-registry-router-1
  1m		1m		1	{kubelet qe-lxia-ocp34-node-registry-router-1}			Warning		FailedMount	Unable to mount volumes for pod
"mypod-od6vc_od6vc(7ad6d6e8-9cd0-11e6-98fc-42010af00003)": timeout expired waiting for volumes to attach/mount for pod
"mypod-od6vc"/"od6vc". list of unattached/unmounted volumes=[nfs]
  1m		1m		1	{kubelet qe-lxia-ocp34-node-registry-router-1}			Warning		FailedSync	Error syncing pod, skipping: timeout
expired waiting for volumes to attach/mount for pod "mypod-od6vc"/"od6vc". list of unattached/unmounted volumes=[nfs]

Comment 7 Scott Creeley 2016-10-28 16:15:19 UTC

I just tested latest Origin and I can see the upstream #27778 PR working in OCP


Events:
  FirstSeen	LastSeen	Count	From			SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  2m		2m		1	{default-scheduler }			Normal		Scheduled	Successfully assigned nfs-bb-pod2 to ose1.rhs
  44s		44s		1	{kubelet ose1.rhs}			Warning		FailedMount	Unable to mount volumes for pod
"nfs-bb-pod2_default(b2e7a914-9d28-11e6-aa36-525400f26441)": timeout expired waiting for volumes to attach/mount for pod
"nfs-bb-pod2"/"default". list of unattached/unmounted volumes=[nfsvol]
  44s		44s		1	{kubelet ose1.rhs}			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes
to attach/mount for pod "nfs-bb-pod2"/"default". list of unattached/unmounted volumes=[nfsvol]
  38s		38s		1	{kubelet ose1.rhs}			Warning		FailedMount	MountVolume.SetUp failed for volume
"kubernetes.io/nfs/b2e7a914-9d28-11e6-aa36-525400f26441-pv-nfs" (spec.Name: "pv-nfs") pod
"b2e7a914-9d28-11e6-aa36-525400f26441" (UID: "b2e7a914-9d28-11e6-aa36-525400f26441") with: mount failed: exit status 32
Mounting arguments: nfs1.rhs:/opt/data12
/etc/openshift/data/pods/b2e7a914-9d28-11e6-aa36-525400f26441/volumes/kubernetes.io~nfs/pv-nfs nfs []
Output: mount.nfs: Connection timed out

The true mount error from the provider might come after the generic "timeout expired waiting for volumes" - did you check it again a short time after you saw the original?

Comment 8 Scott Creeley 2016-10-28 16:27:49 UTC

Another example using latest Origin code more aligned with the test scenario that QE is using:



Events:
  FirstSeen	LastSeen	Count	From			SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  12s		12s		1	{default-scheduler }			Normal		Scheduled	Successfully assigned nfs-bb-pod2 to ose1.rhs
  11s		3s		5	{kubelet ose1.rhs}			Warning		FailedMount	MountVolume.SetUp failed for volume
"kubernetes.io/nfs/1a23dbb1-9d2b-11e6-aa36-525400f26441-pv-nfs1" (spec.Name: "pv-nfs1") pod
"1a23dbb1-9d2b-11e6-aa36-525400f26441" (UID: "1a23dbb1-9d2b-11e6-aa36-525400f26441") with: mount failed: exit status 32
Mounting arguments: nfs1.rhs:/opt/data36
/etc/openshift/data/pods/1a23dbb1-9d2b-11e6-aa36-525400f26441/volumes/kubernetes.io~nfs/pv-nfs1 nfs []
Output: mount.nfs: access denied by server while mounting nfs1.rhs:/opt/data36

Comment 9 Troy Dawson 2016-10-28 20:04:09 UTC

This has been merged into ose and is in OSE v3.4.0.17 or newer.

Comment 11 Liang Xia 2016-10-31 07:18:42 UTC

Tried again today, I can get the truth error in the output.

Events:
  FirstSeen	LastSeen	Count	From						SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----						-------------	--------	------		-------
  14m		14m		1	{default-scheduler }						Normal		Scheduled	Successfully assigned mypod-9xu-r to
qe-chaoyang-node-registry-router-1
  12m		56s		6	{kubelet qe-chaoyang-node-registry-router-1}			Warning		FailedMount	Unable to mount
volumes for pod "mypod-9xu-r_9xu-r(e0f1cefe-9f37-11e6-93b7-42010af0000f)": timeout expired waiting
for volumes to attach/mount for pod "mypod-9xu-r"/"9xu-r". list of unattached/unmounted
volumes=[nfs]
  12m		56s		6	{kubelet qe-chaoyang-node-registry-router-1}			Warning		FailedSync	Error syncing pod,
skipping: timeout expired waiting for volumes to attach/mount for pod "mypod-9xu-r"/"9xu-r". list of
unattached/unmounted volumes=[nfs]
  9m		37s		5	{kubelet qe-chaoyang-node-registry-router-1}			Warning		FailedMount	MountVolume.SetUp
failed for volume "kubernetes.io/nfs/e0f1cefe-9f37-11e6-93b7-42010af0000f-nfs-9xu-r" (spec.Name:
"nfs-9xu-r") pod "e0f1cefe-9f37-11e6-93b7-42010af0000f" (UID:
"e0f1cefe-9f37-11e6-93b7-42010af0000f") with: exit status 32

Comment 12 Erin Boyd 2016-11-11 17:56:52 UTC

No docs are needed for this change

Comment 14 errata-xmlrpc 2017-01-18 12:51:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066

Note You need to log in before you can comment on or make changes to this bug.