Bug 803610 - when graph is changed writes on mount hangs waiting for server response
when graph is changed writes on mount hangs waiting for server response
Status: CLOSED WORKSFORME
Product: GlusterFS
Classification: Community
Component: rpc (Show other bugs)
mainline
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Kaushal
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-15 04:57 EDT by Shwetha Panduranga
Modified: 2015-12-01 11:45 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-05-04 03:13:05 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Brick Log (477.21 KB, text/x-log)
2012-03-15 04:57 EDT, Shwetha Panduranga
no flags Details

  None (edit)
Description Shwetha Panduranga 2012-03-15 04:57:23 EDT
Created attachment 570214 [details]
Brick Log

Description of problem:
Server failed to send reply message back to client when graph was changed. Due to this, client application hung waiting for response from server. 

Brick Log:-
----------
[2012-03-15 18:22:31.303795] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2012-03-15 18:22:31.702957] I [server.c:627:server_rpc_notify] 0-dstore1-server: disconnecting connectionfrom 192.168.2.35:1023
[2012-03-15 18:22:31.703002] I [server.c:636:server_rpc_notify] 0-dstore1-server: starting a grace timer for tcp.dstore1-server
[2012-03-15 18:22:32.055264] W [socket.c:204:__socket_rwv] 0-tcp.dstore1-server: readv failed (Connection reset by peer)
[2012-03-15 18:22:32.055332] W [socket.c:775:__socket_read_simple_msg] 0-tcp.dstore1-server: reading from socket failed. Error (Connection reset by peer), peer (192.168.2.37:1021)
[2012-03-15 18:22:32.055401] I [server.c:627:server_rpc_notify] 0-dstore1-server: disconnecting connectionfrom 192.168.2.37:1021
[2012-03-15 18:22:32.055431] I [server.c:636:server_rpc_notify] 0-dstore1-server: starting a grace timer for tcp.dstore1-server
[2012-03-15 18:22:32.055411] I [socket.c:2377:socket_submit_reply] 0-tcp.dstore1-server: not connected (priv->connected = 255)
[2012-03-15 18:22:32.055510] E [rpcsvc.c:1084:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x222529x, Program: GlusterFS 3.3.0qa28, ProgVers: 330, Proc: 13) to rpc-transport (tcp.dstore1-server)
[2012-03-15 18:22:32.167417] E [server.c:171:server_submit_reply] (-->/usr/local/lib/glusterfs/3.3.0qa28/xlator/features/marker.so(marker_writev_cbk+0x1ec) [0x7ff034592d67] (-->/usr/local/lib/glusterfs/3.3.0qa28/xlator/debug/io-stats.so(io_stats_writev_cbk+0x50d) [0x7ff03437880c] (-->/usr/local/lib/glusterfs/3.3.0qa28/xlator/protocol/server.so(server_writev_cbk+0x1eb) [0x7ff034152452]))) 0-: Reply submission failed
[2012-03-15 18:22:32.170369] I [server-helpers.c:330:do_lock_table_cleanup] 0-dstore1-server: finodelk released on /f6
[2012-03-15 18:22:32.170429] I [server-helpers.c:330:do_lock_table_cleanup] 0-dstore1-server: finodelk released on /f6
[2012-03-15 18:22:32.170462] I [server-helpers.c:330:do_lock_table_cleanup] 0-dstore1-server: finodelk released on /f6

Client Log:-
---------
[2012-03-15 18:22:31.441506] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2012-03-15 18:22:32.943763] I [io-cache.c:1559:check_cache_size_ok] 1-dstore1-quick-read: Max cache size is 1044848640
[2012-03-15 18:22:32.943874] I [io-cache.c:1559:check_cache_size_ok] 1-dstore1-io-cache: Max cache size is 1044848640
[2012-03-15 18:22:32.944045] I [quota.c:3021:quota_parse_limits] 1-dstore1-quota: no "limit-set" option provided
[2012-03-15 18:23:52.538513] C [client-handshake.c:135:rpc_client_ping_timer_expired] 0-dstore1-client-0: server 192.168.2.35:24009 has not responded in the last 42 seconds, disconnecting.
[2012-03-15 18:23:53.627872] C [client-handshake.c:135:rpc_client_ping_timer_expired] 0-dstore1-client-2: server 192.168.2.37:24009 has not responded in the last 42 seconds, disconnecting.
[2012-03-15 18:23:53.628002] C [client-handshake.c:135:rpc_client_ping_timer_expired] 0-dstore1-client-1: server 192.168.2.36:24009 has not responded in the last 42 seconds, disconnecting.
[2012-03-15 18:52:35.963776] E [rpc-clnt.c:217:call_bail] 0-dstore1-client-0: bailing out frame type(GlusterFS 3.1) op(FXATTROP(34)) xid = 0x159826x sent = 2012-03-15 18:22:32.921200. timeout = 1800
[2012-03-15 18:52:35.981890] W [client3_1-fops.c:1460:client3_1_fxattrop_cbk] 0-dstore1-client-0: remote operation failed: Transport endpoint is not connected
[2012-03-15 18:52:35.982172] E [rpc-clnt.c:217:call_bail] 0-dstore1-client-0: bailing out frame type(GlusterFS 3.1) op(FXATTROP(34)) xid = 0x159825x sent = 2012-03-15 18:22:32.919461. timeout = 1800
[2012-03-15 18:52:35.982211] W [client3_1-fops.c:1460:client3_1_fxattrop_cbk] 0-dstore1-client-0: remote operation failed: Transport endpoint is not connected
[2012-03-15 18:52:35.982266] E [rpc-clnt.c:217:call_bail] 0-dstore1-client-0: bailing out frame type(GlusterFS 3.1) op(FXATTROP(34)) xid = 0x159824x sent = 2012-03-15 18:22:32.918427. timeout = 1800
[2012-03-15 18:52:35.982295] W [client3_1-fops.c:1460:client3_1_fxattrop_cbk] 0-dstore1-client-0: remote operation failed: Transport endpoint is not connected
[2012-03-15 18:52:35.982322] E [rpc-clnt.c:217:call_bail] 0-dstore1-client-0: bailing out frame type(GlusterFS 3.1) op(WRITE(13)) xid = 0x159823x sent = 2012-03-15 18:22:32.917974. timeout = 1800
[2012-03-15 18:52:35.982344] W [client3_1-fops.c:692:client3_1_writev_cbk] 0-dstore1-client-0: remote operation failed: Transport endpoint is not connected
[2012-03-15 18:53:08.144865] W [client3_1-fops.c:1266:client3_1_finodelk_cbk] 0-dstore1-client-1: remote operation failed: Transport endpoint is not connected
[2012-03-15 18:53:08.144900] W [rpc-clnt.c:1507:rpc_clnt_submit] 0-dstore1-client-2: failed to submit rpc-request (XID: 0x182849x Program: GlusterFS 3.1, ProgVers: 330, Proc: 30) to rpc-transport (dstore1-client-2)
[2012-03-15 18:53:08.144941] W [client3_1-fops.c:1266:client3_1_finodelk_cbk] 0-dstore1-client-2: remote operation failed: Transport endpoint is not connected


Version-Release number of selected component (if applicable):
3.3.0qa28

How reproducible:


Steps to Reproduce:
1.create a replicate volume (1 X 3)
2.create fuse,nfs mounts from client
2.start dd, ping_pong, lock tests from the different mount points
3.enable quota (changes the graph)
  
Actual results:
dd and ping_pong hung waiting for the server to respond
Comment 1 Kaushal 2012-04-03 07:17:43 EDT
I don't see this happening with native mount on master. But it is still happening on NFS mount. 
Shweta, can you confirm?
Comment 2 Shwetha Panduranga 2012-04-04 00:25:30 EDT
Yes, i will try re creating the bug and update it by end of day.
Comment 3 Kaushal 2012-04-12 07:14:44 EDT
Tested this again. Now I'm getting no hangs for either nfs or fuse mount.
Comment 4 Amar Tumballi 2012-04-23 12:28:47 EDT
Shwetha, when you have time, can you confirm the behavior so we can take appropriate action on this
Comment 5 Shwetha Panduranga 2012-04-23 21:51:13 EDT
I will test on qa37 and update the bug.
Comment 6 Shwetha Panduranga 2012-05-04 02:40:02 EDT
I tested on qa39 and i am unable to recreate the bug.
Comment 7 Kaushal 2012-05-04 03:13:05 EDT
Closing as it is not reproducible anymore.

Note You need to log in before you can comment on or make changes to this bug.