1387767 – nfs hangs / rpc issues

Bug 1387767 - nfs hangs / rpc issues

Summary: nfs hangs / rpc issues

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	nfs
Sub Component:
Version:	3.7.11
Hardware:	All
OS:	All
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Niels de Vos
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-10-21 19:56 UTC by John
Modified:	2017-03-08 11:00 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-03-08 11:00:58 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description John 2016-10-21 19:56:23 UTC

Description of problem:
gluster nfs.v3 mount unpredictably stops responding for a client. They receive the following in their /var/log/messages

	
Oct 21 19:17:25 testvol kernel: nfs: server x.x.x.x not responding, timed out
Oct 21 19:17:28 testvol kernel: nfs: server x.x.x.x not responding, timed out
Oct 21 19:17:31 testvol kernel: nfs: server x.x.x.x not responding, timed out

The Gluster server only shows the following error messages in the nfs.log:

[2016-10-21 19:17:44.843790] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0xe100e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.843806] E [MSGID: 112074] [nfs3.c:615:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed
[2016-10-21 19:17:44.843919] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x8301e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844055] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3c01e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844174] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x5201e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844268] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3401e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844334] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x2a01e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844393] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x8e01e094, Program: NFS3, ProgVers: 3, Proc: 1) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:44.844438] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x8f01e094, Program: NFS3, ProgVers: 3, Proc: 1) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:45.051784] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3c01e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:45.052042] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x8301e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:45.052202] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3401e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)
[2016-10-21 19:17:45.052356] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x5201e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server)

volume info:
Volume Name: testvol
Type: Distribute
Volume ID: 1a149875-b248-4330-ae70-0238820d7bad
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.171.156.220:/gluster/testvol/brick1
Options Reconfigured:
performance.readdir-ahead: on
performance.quick-read: on
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
auth.allow: x.x.x.x
nfs.disable: off
nfs.addr-namelookup: off
nfs.acl: off
nfs.rpc-auth-allow: x.x.x.x
nfs.trusted-sync: on



When this happens for an extended amount of time the client is unable to keep the share mounted, and eventually the client's application locks up.  This only happens with one volume on this system all others are able to access the share at the time of these events.  This client is more active than others, but the system is not under a heavy load (always about 75% CPU idle, 50% free RAM, disk IO rarely raises above 20%)  Network connectivity has been ruled out by my network team as well.  I gave them a new share with a single brick to rule out a lot of other possibilities.

Version-Release number of selected component (if applicable):
Gluster 3.7.11-2 Running on CentOS 7.1.1503
Client is RHEL 6u6

How reproducible:

Unpredictable on my side, but always predictable with this host (just a matter of time)

Steps to Reproduce:
1. 
2. 
3.

Actual results:


Expected results:
NFS session should stay established

Additional info:

Comment 1 Kaushal 2017-03-08 11:00:58 UTC

This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.

Note You need to log in before you can comment on or make changes to this bug.