Hide Forgot
Description of problem: 3 replica volume. 6 fuse and 6 nfs clients running different tests. One of the nfs clients hung because of some rpc errors in the nfs server. [2012-04-03 14:04:48.389913] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected [2012-04-03 14:04:48.389935] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg [2012-04-03 14:04:48.389950] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-mirror-client-2: Failed to build record header [2012-04-03 14:04:48.389964] W [rpc-clnt.c:1328:rpc_clnt_record] 0-mirror-client-2: cannot build rpc-record [2012-04-03 14:04:48.389977] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-mirror-client-2: cannot build rpc-record [2012-04-03 14:04:48.389991] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-2: remote operation failed: Transport endpoint is not connected [2012-04-03 14:04:48.498521] E [nlm4.c:1624:nlm4_unlock_resume] 0-nfs-NLM: fd_lookup_uint64() returned NULL [2012-04-03 14:04:48.498563] E [nlm4.c:1631:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume [2012-04-03 14:04:48.777825] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-0: remote operation failed: Transport endpoint is not connected [2012-04-03 14:04:48.777897] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg [2012-04-03 14:04:48.777921] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-mirror-client-1: Failed to build record header [2012-04-03 14:04:48.777937] W [rpc-clnt.c:1328:rpc_clnt_record] 0-mirror-client-1: cannot build rpc-record [2012-04-03 14:04:48.777951] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-mirror-client-1: cannot build rpc-record [2012-04-03 14:04:48.777975] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected [2012-04-03 14:04:48.778011] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg [2012-04-03 14:04:48.778027] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-mirror-client-2: Failed to build record header [2012-04-03 14:04:48.778041] W [rpc-clnt.c:1328:rpc_clnt_record] 0-mirror-client-2: cannot build rpc-record [2012-04-03 14:04:48.778055] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-mirror-client-2: cannot build rpc-record [2012-04-03 14:04:48.778069] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-2: remote operation failed: Transport endpoint is not connected [2012-04-03 14:04:48.781060] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-0: remote operation failed: Transport endpoint is not connected [2012-04-03 14:04:48.781120] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg [2012-04-03 14:04:48.781140] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-mirror-client-1: Failed to build record header [2012-04-03 14:04:48.781154] W [rpc-clnt.c:1328:rpc_clnt_record] 0-mirror-client-1: cannot build rpc-record [2012-04-03 14:04:48.781168] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-mirror-client-1: cannot build rpc-record [2012-04-03 14:04:48.781182] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected [2012-04-03 14:04:48.781204] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg [2012-04-03 14:04:48.781219] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-mirror-client-2: Failed to build record header [2012-04-03 14:04:48.781233] W [rpc-clnt.c:1328:rpc_clnt_record] 0-mirror-client-2: cannot build rpc-record [2012-04-03 14:04:48.781246] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-mirror-client-2: cannot build rpc-record [2012-04-03 14:04:48.781259] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-2: remote operation failed: Transport endpoint is not connected [2012-04-03 14:04:48.934015] E [nlm4.c:1624:nlm4_unlock_resume] 0-nfs-NLM: fd_lookup_uint64() returned NULL [2012-04-03 14:04:48.934050] E [nlm4.c:1631:nlm4_unlock_resume] 0-nfs-NLM: unable to unlock_fd_resume [2012-04-03 14:15:40.783334] W [socket.c:1521:__socket_proto_state_machine] 0-NLM-client: reading from socket failed. Error (Transport endpoint is not connected), peer (10.16.156.18:55192) [2012-04-03 14:15:40.783450] I [mem-pool.c:585:mem_pool_destroy] 0-nfs-server: size=2236 max=1 total=1862 [2012-04-03 14:15:40.783493] I [mem-pool.c:585:mem_pool_destroy] 0-nfs-server: size=124 max=1 total=1862 Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: nfs client hung Expected results: nfs client should not hang (nfs-server should reply to the nfs client) Additional info: gluster volume info Volume Name: mirror Type: Replicate Volume ID: e6423147-ee12-453f-bcf6-2fb09a9087c5 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.16.156.9:/export/mirror Brick2: 10.16.156.12:/export/mirror Brick3: 10.16.156.15:/export/mirror Options Reconfigured: performance.flush-behind: off performance.stat-prefetch: off performance.client-io-threads: on
[2012-04-03 14:04:48.389913] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-mirror-client-1: remote operation failed: Transport endpoint is not connected Same is the case with 0-mirror-client-0 and 0-mirror-client-2 Can you check why it got disconnected? That is the reason nfs clients have hung. There is an NLM related log: [2012-04-03 14:04:48.934015] E [nlm4.c:1624:nlm4_unlock_resume] 0-nfs-NLM: fd_lookup_uint64() returned NULL There is a separate bug for this (802885), it does not cause the client to hang though.
Raghavendra, moving it to on_qa, you can close this if you think it is not a bug.
rabhat didn't see the issue in recent time, moving it to ON_QA