Bug 809816

Summary: [glusterfs-3.3.0qa33]: nfs client hung because of some rpc errors in nfs-server
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: nfsAssignee: Vivek Agarwal <vagarwal>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: amarts, gluster-bugs, sankarshan, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:14:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Raghavendra Bhat 2012-04-04 12:54:32 UTC
Description of problem:
3 replica volume. 6 fuse and 6 nfs clients running different tests. One of the nfs clients hung because of some rpc errors in the nfs server.


[2012-04-03 14:04:48.389913] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected
[2012-04-03 14:04:48.389935] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg
[2012-04-03 14:04:48.389950] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-mirror-client-2: Failed to build record header
[2012-04-03 14:04:48.389964] W [rpc-clnt.c:1328:rpc_clnt_record] 0-mirror-client-2: cannot build rpc-record
[2012-04-03 14:04:48.389977] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-mirror-client-2: cannot build rpc-record
[2012-04-03 14:04:48.389991] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-2: remote operation failed: Transport endpoint is not connected
[2012-04-03 14:04:48.498521] E [nlm4.c:1624:nlm4_unlock_resume] 0-nfs-NLM: fd_lookup_uint64() returned NULL
[2012-04-03 14:04:48.498563] E [nlm4.c:1631:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
[2012-04-03 14:04:48.777825] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-0: remote operation failed: Transport endpoint is not connected
[2012-04-03 14:04:48.777897] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg
[2012-04-03 14:04:48.777921] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-mirror-client-1: Failed to build record header
[2012-04-03 14:04:48.777937] W [rpc-clnt.c:1328:rpc_clnt_record] 0-mirror-client-1: cannot build rpc-record
[2012-04-03 14:04:48.777951] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-mirror-client-1: cannot build rpc-record
[2012-04-03 14:04:48.777975] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected
[2012-04-03 14:04:48.778011] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg
[2012-04-03 14:04:48.778027] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-mirror-client-2: Failed to build record header
[2012-04-03 14:04:48.778041] W [rpc-clnt.c:1328:rpc_clnt_record] 0-mirror-client-2: cannot build rpc-record
[2012-04-03 14:04:48.778055] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-mirror-client-2: cannot build rpc-record
[2012-04-03 14:04:48.778069] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-2: remote operation failed: Transport endpoint is not connected
[2012-04-03 14:04:48.781060] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-0: remote operation failed: Transport endpoint is not connected
[2012-04-03 14:04:48.781120] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg
[2012-04-03 14:04:48.781140] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-mirror-client-1: Failed to build record header
[2012-04-03 14:04:48.781154] W [rpc-clnt.c:1328:rpc_clnt_record] 0-mirror-client-1: cannot build rpc-record
[2012-04-03 14:04:48.781168] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-mirror-client-1: cannot build rpc-record
[2012-04-03 14:04:48.781182] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected
[2012-04-03 14:04:48.781204] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg
[2012-04-03 14:04:48.781219] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-mirror-client-2: Failed to build record header
[2012-04-03 14:04:48.781233] W [rpc-clnt.c:1328:rpc_clnt_record] 0-mirror-client-2: cannot build rpc-record
[2012-04-03 14:04:48.781246] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-mirror-client-2: cannot build rpc-record
[2012-04-03 14:04:48.781259] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-2: remote operation failed: Transport endpoint is not connected
[2012-04-03 14:04:48.934015] E [nlm4.c:1624:nlm4_unlock_resume] 0-nfs-NLM: fd_lookup_uint64() returned NULL
[2012-04-03 14:04:48.934050] E [nlm4.c:1631:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume
[2012-04-03 14:15:40.783334] W [socket.c:1521:__socket_proto_state_machine] 0-NLM-client: reading from socket failed. Error (Transport endpoint is not connected), peer (10.16.156.18:55192)
[2012-04-03 14:15:40.783450] I [mem-pool.c:585:mem_pool_destroy] 0-nfs-server: size=2236 max=1 total=1862
[2012-04-03 14:15:40.783493] I [mem-pool.c:585:mem_pool_destroy] 0-nfs-server: size=124 max=1 total=1862

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:

nfs client hung
Expected results:
nfs client should not hang (nfs-server should reply to the nfs client)

Additional info:


gluster volume info
 
Volume Name: mirror
Type: Replicate
Volume ID: e6423147-ee12-453f-bcf6-2fb09a9087c5
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.16.156.9:/export/mirror
Brick2: 10.16.156.12:/export/mirror
Brick3: 10.16.156.15:/export/mirror
Options Reconfigured:
performance.flush-behind: off
performance.stat-prefetch: off
performance.client-io-threads: on

Comment 1 Krishna Srinivas 2012-04-04 13:16:35 UTC
[2012-04-03 14:04:48.389913] W [client3_1-fops.c:2173:client3_1_lk_cbk]
0-mirror-client-1: remote operation failed: Transport endpoint is not connected

Same is the case with 0-mirror-client-0 and 0-mirror-client-2

Can you check why it got disconnected? That is the reason nfs clients have hung.

There is an NLM related log:
[2012-04-03 14:04:48.934015] E [nlm4.c:1624:nlm4_unlock_resume] 0-nfs-NLM:
fd_lookup_uint64() returned NULL
There is a separate bug for this (802885), it does not cause the client to hang though.

Comment 2 Krishna Srinivas 2012-08-14 11:34:47 UTC
Raghavendra, moving it to on_qa, you can close this if you think it is not a bug.

Comment 4 Amar Tumballi 2013-01-11 08:41:38 UTC
rabhat didn't see the issue in recent time, moving it to ON_QA