Bug 809816 - [glusterfs-3.3.0qa33]: nfs client hung because of some rpc errors in nfs-server
[glusterfs-3.3.0qa33]: nfs client hung because of some rpc errors in nfs-server
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: nfs (Show other bugs)
mainline
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Vivek Agarwal
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-04 08:54 EDT by Raghavendra Bhat
Modified: 2016-02-17 19:02 EST (History)
4 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-24 13:14:30 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Raghavendra Bhat 2012-04-04 08:54:32 EDT
Description of problem:
3 replica volume. 6 fuse and 6 nfs clients running different tests. One of the nfs clients hung because of some rpc errors in the nfs server.


[2012-04-03 14:04:48.389913] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected
[2012-04-03 14:04:48.389935] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg
[2012-04-03 14:04:48.389950] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-mirror-client-2: Failed to build record header
[2012-04-03 14:04:48.389964] W [rpc-clnt.c:1328:rpc_clnt_record] 0-mirror-client-2: cannot build rpc-record
[2012-04-03 14:04:48.389977] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-mirror-client-2: cannot build rpc-record
[2012-04-03 14:04:48.389991] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-2: remote operation failed: Transport endpoint is not connected
[2012-04-03 14:04:48.498521] E [nlm4.c:1624:nlm4_unlock_resume] 0-nfs-NLM: fd_lookup_uint64() returned NULL
[2012-04-03 14:04:48.498563] E [nlm4.c:1631:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
[2012-04-03 14:04:48.777825] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-0: remote operation failed: Transport endpoint is not connected
[2012-04-03 14:04:48.777897] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg
[2012-04-03 14:04:48.777921] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-mirror-client-1: Failed to build record header
[2012-04-03 14:04:48.777937] W [rpc-clnt.c:1328:rpc_clnt_record] 0-mirror-client-1: cannot build rpc-record
[2012-04-03 14:04:48.777951] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-mirror-client-1: cannot build rpc-record
[2012-04-03 14:04:48.777975] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected
[2012-04-03 14:04:48.778011] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg
[2012-04-03 14:04:48.778027] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-mirror-client-2: Failed to build record header
[2012-04-03 14:04:48.778041] W [rpc-clnt.c:1328:rpc_clnt_record] 0-mirror-client-2: cannot build rpc-record
[2012-04-03 14:04:48.778055] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-mirror-client-2: cannot build rpc-record
[2012-04-03 14:04:48.778069] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-2: remote operation failed: Transport endpoint is not connected
[2012-04-03 14:04:48.781060] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-0: remote operation failed: Transport endpoint is not connected
[2012-04-03 14:04:48.781120] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg
[2012-04-03 14:04:48.781140] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-mirror-client-1: Failed to build record header
[2012-04-03 14:04:48.781154] W [rpc-clnt.c:1328:rpc_clnt_record] 0-mirror-client-1: cannot build rpc-record
[2012-04-03 14:04:48.781168] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-mirror-client-1: cannot build rpc-record
[2012-04-03 14:04:48.781182] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected
[2012-04-03 14:04:48.781204] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg
[2012-04-03 14:04:48.781219] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-mirror-client-2: Failed to build record header
[2012-04-03 14:04:48.781233] W [rpc-clnt.c:1328:rpc_clnt_record] 0-mirror-client-2: cannot build rpc-record
[2012-04-03 14:04:48.781246] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-mirror-client-2: cannot build rpc-record
[2012-04-03 14:04:48.781259] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-2: remote operation failed: Transport endpoint is not connected
[2012-04-03 14:04:48.934015] E [nlm4.c:1624:nlm4_unlock_resume] 0-nfs-NLM: fd_lookup_uint64() returned NULL
[2012-04-03 14:04:48.934050] E [nlm4.c:1631:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
[2012-04-03 14:15:40.783334] W [socket.c:1521:__socket_proto_state_machine] 0-NLM-client: reading from socket failed. Error (Transport endpoint is not connected), peer (10.16.156.18:55192)
[2012-04-03 14:15:40.783450] I [mem-pool.c:585:mem_pool_destroy] 0-nfs-server: size=2236 max=1 total=1862
[2012-04-03 14:15:40.783493] I [mem-pool.c:585:mem_pool_destroy] 0-nfs-server: size=124 max=1 total=1862

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:

nfs client hung
Expected results:
nfs client should not hang (nfs-server should reply to the nfs client)

Additional info:


gluster volume info
 
Volume Name: mirror
Type: Replicate
Volume ID: e6423147-ee12-453f-bcf6-2fb09a9087c5
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.16.156.9:/export/mirror
Brick2: 10.16.156.12:/export/mirror
Brick3: 10.16.156.15:/export/mirror
Options Reconfigured:
performance.flush-behind: off
performance.stat-prefetch: off
performance.client-io-threads: on
Comment 1 Krishna Srinivas 2012-04-04 09:16:35 EDT
[2012-04-03 14:04:48.389913] W [client3_1-fops.c:2173:client3_1_lk_cbk]
0-mirror-client-1: remote operation failed: Transport endpoint is not connected

Same is the case with 0-mirror-client-0 and 0-mirror-client-2

Can you check why it got disconnected? That is the reason nfs clients have hung.

There is an NLM related log:
[2012-04-03 14:04:48.934015] E [nlm4.c:1624:nlm4_unlock_resume] 0-nfs-NLM:
fd_lookup_uint64() returned NULL
There is a separate bug for this (802885), it does not cause the client to hang though.
Comment 2 Krishna Srinivas 2012-08-14 07:34:47 EDT
Raghavendra, moving it to on_qa, you can close this if you think it is not a bug.
Comment 4 Amar Tumballi 2013-01-11 03:41:38 EST
rabhat@redhat.com didn't see the issue in recent time, moving it to ON_QA

Note You need to log in before you can comment on or make changes to this bug.