Description of problem: gluster nfs.v3 mount unpredictably stops responding for a client. They receive the following in their /var/log/messages Oct 21 19:17:25 testvol kernel: nfs: server x.x.x.x not responding, timed out Oct 21 19:17:28 testvol kernel: nfs: server x.x.x.x not responding, timed out Oct 21 19:17:31 testvol kernel: nfs: server x.x.x.x not responding, timed out The Gluster server only shows the following error messages in the nfs.log: [2016-10-21 19:17:44.843790] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0xe100e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server) [2016-10-21 19:17:44.843806] E [MSGID: 112074] [nfs3.c:615:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed [2016-10-21 19:17:44.843919] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x8301e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server) [2016-10-21 19:17:44.844055] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3c01e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server) [2016-10-21 19:17:44.844174] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x5201e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server) [2016-10-21 19:17:44.844268] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3401e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server) [2016-10-21 19:17:44.844334] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x2a01e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server) [2016-10-21 19:17:44.844393] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x8e01e094, Program: NFS3, ProgVers: 3, Proc: 1) to rpc-transport (socket.nfs-server) [2016-10-21 19:17:44.844438] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x8f01e094, Program: NFS3, ProgVers: 3, Proc: 1) to rpc-transport (socket.nfs-server) [2016-10-21 19:17:45.051784] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3c01e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server) [2016-10-21 19:17:45.052042] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x8301e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server) [2016-10-21 19:17:45.052202] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3401e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server) [2016-10-21 19:17:45.052356] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x5201e094, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server) volume info: Volume Name: testvol Type: Distribute Volume ID: 1a149875-b248-4330-ae70-0238820d7bad Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 10.171.156.220:/gluster/testvol/brick1 Options Reconfigured: performance.readdir-ahead: on performance.quick-read: on performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server auth.allow: x.x.x.x nfs.disable: off nfs.addr-namelookup: off nfs.acl: off nfs.rpc-auth-allow: x.x.x.x nfs.trusted-sync: on When this happens for an extended amount of time the client is unable to keep the share mounted, and eventually the client's application locks up. This only happens with one volume on this system all others are able to access the share at the time of these events. This client is more active than others, but the system is not under a heavy load (always about 75% CPU idle, 50% free RAM, disk IO rarely raises above 20%) Network connectivity has been ruled out by my network team as well. I gave them a new share with a single brick to rule out a lot of other possibilities. Version-Release number of selected component (if applicable): Gluster 3.7.11-2 Running on CentOS 7.1.1503 Client is RHEL 6u6 How reproducible: Unpredictable on my side, but always predictable with this host (just a matter of time) Steps to Reproduce: 1. 2. 3. Actual results: Expected results: NFS session should stay established Additional info:
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.