Bug 803610 - when graph is changed writes on mount hangs waiting for server response
Summary: when graph is changed writes on mount hangs waiting for server response
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Kaushal
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-15 08:57 UTC by Shwetha Panduranga
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-05-04 07:13:05 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Brick Log (477.21 KB, text/x-log)
2012-03-15 08:57 UTC, Shwetha Panduranga
no flags Details

Description Shwetha Panduranga 2012-03-15 08:57:23 UTC
Created attachment 570214 [details]
Brick Log

Description of problem:
Server failed to send reply message back to client when graph was changed. Due to this, client application hung waiting for response from server. 

Brick Log:-
----------
[2012-03-15 18:22:31.303795] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2012-03-15 18:22:31.702957] I [server.c:627:server_rpc_notify] 0-dstore1-server: disconnecting connectionfrom 192.168.2.35:1023
[2012-03-15 18:22:31.703002] I [server.c:636:server_rpc_notify] 0-dstore1-server: starting a grace timer for tcp.dstore1-server
[2012-03-15 18:22:32.055264] W [socket.c:204:__socket_rwv] 0-tcp.dstore1-server: readv failed (Connection reset by peer)
[2012-03-15 18:22:32.055332] W [socket.c:775:__socket_read_simple_msg] 0-tcp.dstore1-server: reading from socket failed. Error (Connection reset by peer), peer (192.168.2.37:1021)
[2012-03-15 18:22:32.055401] I [server.c:627:server_rpc_notify] 0-dstore1-server: disconnecting connectionfrom 192.168.2.37:1021
[2012-03-15 18:22:32.055431] I [server.c:636:server_rpc_notify] 0-dstore1-server: starting a grace timer for tcp.dstore1-server
[2012-03-15 18:22:32.055411] I [socket.c:2377:socket_submit_reply] 0-tcp.dstore1-server: not connected (priv->connected = 255)
[2012-03-15 18:22:32.055510] E [rpcsvc.c:1084:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x222529x, Program: GlusterFS 3.3.0qa28, ProgVers: 330, Proc: 13) to rpc-transport (tcp.dstore1-server)
[2012-03-15 18:22:32.167417] E [server.c:171:server_submit_reply] (-->/usr/local/lib/glusterfs/3.3.0qa28/xlator/features/marker.so(marker_writev_cbk+0x1ec) [0x7ff034592d67] (-->/usr/local/lib/glusterfs/3.3.0qa28/xlator/debug/io-stats.so(io_stats_writev_cbk+0x50d) [0x7ff03437880c] (-->/usr/local/lib/glusterfs/3.3.0qa28/xlator/protocol/server.so(server_writev_cbk+0x1eb) [0x7ff034152452]))) 0-: Reply submission failed
[2012-03-15 18:22:32.170369] I [server-helpers.c:330:do_lock_table_cleanup] 0-dstore1-server: finodelk released on /f6
[2012-03-15 18:22:32.170429] I [server-helpers.c:330:do_lock_table_cleanup] 0-dstore1-server: finodelk released on /f6
[2012-03-15 18:22:32.170462] I [server-helpers.c:330:do_lock_table_cleanup] 0-dstore1-server: finodelk released on /f6

Client Log:-
---------
[2012-03-15 18:22:31.441506] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2012-03-15 18:22:32.943763] I [io-cache.c:1559:check_cache_size_ok] 1-dstore1-quick-read: Max cache size is 1044848640
[2012-03-15 18:22:32.943874] I [io-cache.c:1559:check_cache_size_ok] 1-dstore1-io-cache: Max cache size is 1044848640
[2012-03-15 18:22:32.944045] I [quota.c:3021:quota_parse_limits] 1-dstore1-quota: no "limit-set" option provided
[2012-03-15 18:23:52.538513] C [client-handshake.c:135:rpc_client_ping_timer_expired] 0-dstore1-client-0: server 192.168.2.35:24009 has not responded in the last 42 seconds, disconnecting.
[2012-03-15 18:23:53.627872] C [client-handshake.c:135:rpc_client_ping_timer_expired] 0-dstore1-client-2: server 192.168.2.37:24009 has not responded in the last 42 seconds, disconnecting.
[2012-03-15 18:23:53.628002] C [client-handshake.c:135:rpc_client_ping_timer_expired] 0-dstore1-client-1: server 192.168.2.36:24009 has not responded in the last 42 seconds, disconnecting.
[2012-03-15 18:52:35.963776] E [rpc-clnt.c:217:call_bail] 0-dstore1-client-0: bailing out frame type(GlusterFS 3.1) op(FXATTROP(34)) xid = 0x159826x sent = 2012-03-15 18:22:32.921200. timeout = 1800
[2012-03-15 18:52:35.981890] W [client3_1-fops.c:1460:client3_1_fxattrop_cbk] 0-dstore1-client-0: remote operation failed: Transport endpoint is not connected
[2012-03-15 18:52:35.982172] E [rpc-clnt.c:217:call_bail] 0-dstore1-client-0: bailing out frame type(GlusterFS 3.1) op(FXATTROP(34)) xid = 0x159825x sent = 2012-03-15 18:22:32.919461. timeout = 1800
[2012-03-15 18:52:35.982211] W [client3_1-fops.c:1460:client3_1_fxattrop_cbk] 0-dstore1-client-0: remote operation failed: Transport endpoint is not connected
[2012-03-15 18:52:35.982266] E [rpc-clnt.c:217:call_bail] 0-dstore1-client-0: bailing out frame type(GlusterFS 3.1) op(FXATTROP(34)) xid = 0x159824x sent = 2012-03-15 18:22:32.918427. timeout = 1800
[2012-03-15 18:52:35.982295] W [client3_1-fops.c:1460:client3_1_fxattrop_cbk] 0-dstore1-client-0: remote operation failed: Transport endpoint is not connected
[2012-03-15 18:52:35.982322] E [rpc-clnt.c:217:call_bail] 0-dstore1-client-0: bailing out frame type(GlusterFS 3.1) op(WRITE(13)) xid = 0x159823x sent = 2012-03-15 18:22:32.917974. timeout = 1800
[2012-03-15 18:52:35.982344] W [client3_1-fops.c:692:client3_1_writev_cbk] 0-dstore1-client-0: remote operation failed: Transport endpoint is not connected
[2012-03-15 18:53:08.144865] W [client3_1-fops.c:1266:client3_1_finodelk_cbk] 0-dstore1-client-1: remote operation failed: Transport endpoint is not connected
[2012-03-15 18:53:08.144900] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-dstore1-client-2: failed to submit rpc-request (XID: 0x182849x Program: GlusterFS 3.1, ProgVers: 330, Proc: 30) to rpc-transport (dstore1-client-2)
[2012-03-15 18:53:08.144941] W [client3_1-fops.c:1266:client3_1_finodelk_cbk] 0-dstore1-client-2: remote operation failed: Transport endpoint is not connected


Version-Release number of selected component (if applicable):
3.3.0qa28

How reproducible:


Steps to Reproduce:
1.create a replicate volume (1 X 3)
2.create fuse,nfs mounts from client
2.start dd, ping_pong, lock tests from the different mount points
3.enable quota (changes the graph)
  
Actual results:
dd and ping_pong hung waiting for the server to respond

Comment 1 Kaushal 2012-04-03 11:17:43 UTC
I don't see this happening with native mount on master. But it is still happening on NFS mount. 
Shweta, can you confirm?

Comment 2 Shwetha Panduranga 2012-04-04 04:25:30 UTC
Yes, i will try re creating the bug and update it by end of day.

Comment 3 Kaushal 2012-04-12 11:14:44 UTC
Tested this again. Now I'm getting no hangs for either nfs or fuse mount.

Comment 4 Amar Tumballi 2012-04-23 16:28:47 UTC
Shwetha, when you have time, can you confirm the behavior so we can take appropriate action on this

Comment 5 Shwetha Panduranga 2012-04-24 01:51:13 UTC
I will test on qa37 and update the bug.

Comment 6 Shwetha Panduranga 2012-05-04 06:40:02 UTC
I tested on qa39 and i am unable to recreate the bug.

Comment 7 Kaushal 2012-05-04 07:13:05 UTC
Closing as it is not reproducible anymore.


Note You need to log in before you can comment on or make changes to this bug.