Created attachment 570214 [details] Brick Log Description of problem: Server failed to send reply message back to client when graph was changed. Due to this, client application hung waiting for response from server. Brick Log:- ---------- [2012-03-15 18:22:31.303795] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed [2012-03-15 18:22:31.702957] I [server.c:627:server_rpc_notify] 0-dstore1-server: disconnecting connectionfrom 192.168.2.35:1023 [2012-03-15 18:22:31.703002] I [server.c:636:server_rpc_notify] 0-dstore1-server: starting a grace timer for tcp.dstore1-server [2012-03-15 18:22:32.055264] W [socket.c:204:__socket_rwv] 0-tcp.dstore1-server: readv failed (Connection reset by peer) [2012-03-15 18:22:32.055332] W [socket.c:775:__socket_read_simple_msg] 0-tcp.dstore1-server: reading from socket failed. Error (Connection reset by peer), peer (192.168.2.37:1021) [2012-03-15 18:22:32.055401] I [server.c:627:server_rpc_notify] 0-dstore1-server: disconnecting connectionfrom 192.168.2.37:1021 [2012-03-15 18:22:32.055431] I [server.c:636:server_rpc_notify] 0-dstore1-server: starting a grace timer for tcp.dstore1-server [2012-03-15 18:22:32.055411] I [socket.c:2377:socket_submit_reply] 0-tcp.dstore1-server: not connected (priv->connected = 255) [2012-03-15 18:22:32.055510] E [rpcsvc.c:1084:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x222529x, Program: GlusterFS 3.3.0qa28, ProgVers: 330, Proc: 13) to rpc-transport (tcp.dstore1-server) [2012-03-15 18:22:32.167417] E [server.c:171:server_submit_reply] (-->/usr/local/lib/glusterfs/3.3.0qa28/xlator/features/marker.so(marker_writev_cbk+0x1ec) [0x7ff034592d67] (-->/usr/local/lib/glusterfs/3.3.0qa28/xlator/debug/io-stats.so(io_stats_writev_cbk+0x50d) [0x7ff03437880c] (-->/usr/local/lib/glusterfs/3.3.0qa28/xlator/protocol/server.so(server_writev_cbk+0x1eb) [0x7ff034152452]))) 0-: Reply submission failed [2012-03-15 18:22:32.170369] I [server-helpers.c:330:do_lock_table_cleanup] 0-dstore1-server: finodelk released on /f6 [2012-03-15 18:22:32.170429] I [server-helpers.c:330:do_lock_table_cleanup] 0-dstore1-server: finodelk released on /f6 [2012-03-15 18:22:32.170462] I [server-helpers.c:330:do_lock_table_cleanup] 0-dstore1-server: finodelk released on /f6 Client Log:- --------- [2012-03-15 18:22:31.441506] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed [2012-03-15 18:22:32.943763] I [io-cache.c:1559:check_cache_size_ok] 1-dstore1-quick-read: Max cache size is 1044848640 [2012-03-15 18:22:32.943874] I [io-cache.c:1559:check_cache_size_ok] 1-dstore1-io-cache: Max cache size is 1044848640 [2012-03-15 18:22:32.944045] I [quota.c:3021:quota_parse_limits] 1-dstore1-quota: no "limit-set" option provided [2012-03-15 18:23:52.538513] C [client-handshake.c:135:rpc_client_ping_timer_expired] 0-dstore1-client-0: server 192.168.2.35:24009 has not responded in the last 42 seconds, disconnecting. [2012-03-15 18:23:53.627872] C [client-handshake.c:135:rpc_client_ping_timer_expired] 0-dstore1-client-2: server 192.168.2.37:24009 has not responded in the last 42 seconds, disconnecting. [2012-03-15 18:23:53.628002] C [client-handshake.c:135:rpc_client_ping_timer_expired] 0-dstore1-client-1: server 192.168.2.36:24009 has not responded in the last 42 seconds, disconnecting. [2012-03-15 18:52:35.963776] E [rpc-clnt.c:217:call_bail] 0-dstore1-client-0: bailing out frame type(GlusterFS 3.1) op(FXATTROP(34)) xid = 0x159826x sent = 2012-03-15 18:22:32.921200. timeout = 1800 [2012-03-15 18:52:35.981890] W [client3_1-fops.c:1460:client3_1_fxattrop_cbk] 0-dstore1-client-0: remote operation failed: Transport endpoint is not connected [2012-03-15 18:52:35.982172] E [rpc-clnt.c:217:call_bail] 0-dstore1-client-0: bailing out frame type(GlusterFS 3.1) op(FXATTROP(34)) xid = 0x159825x sent = 2012-03-15 18:22:32.919461. timeout = 1800 [2012-03-15 18:52:35.982211] W [client3_1-fops.c:1460:client3_1_fxattrop_cbk] 0-dstore1-client-0: remote operation failed: Transport endpoint is not connected [2012-03-15 18:52:35.982266] E [rpc-clnt.c:217:call_bail] 0-dstore1-client-0: bailing out frame type(GlusterFS 3.1) op(FXATTROP(34)) xid = 0x159824x sent = 2012-03-15 18:22:32.918427. timeout = 1800 [2012-03-15 18:52:35.982295] W [client3_1-fops.c:1460:client3_1_fxattrop_cbk] 0-dstore1-client-0: remote operation failed: Transport endpoint is not connected [2012-03-15 18:52:35.982322] E [rpc-clnt.c:217:call_bail] 0-dstore1-client-0: bailing out frame type(GlusterFS 3.1) op(WRITE(13)) xid = 0x159823x sent = 2012-03-15 18:22:32.917974. timeout = 1800 [2012-03-15 18:52:35.982344] W [client3_1-fops.c:692:client3_1_writev_cbk] 0-dstore1-client-0: remote operation failed: Transport endpoint is not connected [2012-03-15 18:53:08.144865] W [client3_1-fops.c:1266:client3_1_finodelk_cbk] 0-dstore1-client-1: remote operation failed: Transport endpoint is not connected [2012-03-15 18:53:08.144900] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-dstore1-client-2: failed to submit rpc-request (XID: 0x182849x Program: GlusterFS 3.1, ProgVers: 330, Proc: 30) to rpc-transport (dstore1-client-2) [2012-03-15 18:53:08.144941] W [client3_1-fops.c:1266:client3_1_finodelk_cbk] 0-dstore1-client-2: remote operation failed: Transport endpoint is not connected Version-Release number of selected component (if applicable): 3.3.0qa28 How reproducible: Steps to Reproduce: 1.create a replicate volume (1 X 3) 2.create fuse,nfs mounts from client 2.start dd, ping_pong, lock tests from the different mount points 3.enable quota (changes the graph) Actual results: dd and ping_pong hung waiting for the server to respond
I don't see this happening with native mount on master. But it is still happening on NFS mount. Shweta, can you confirm?
Yes, i will try re creating the bug and update it by end of day.
Tested this again. Now I'm getting no hangs for either nfs or fuse mount.
Shwetha, when you have time, can you confirm the behavior so we can take appropriate action on this
I will test on qa37 and update the bug.
I tested on qa39 and i am unable to recreate the bug.
Closing as it is not reproducible anymore.