Red Hat Bugzilla – Bug 823242
Add-brick to ditributed-replicate volume makes directories invisible for sometime
Last modified: 2013-12-08 20:32:05 EST
Description of problem:
Adding bricks to a distributed-replicate volume, after addition directories on the mount point will be invisible for sometime, remount makes it visible.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. create a 2x2 distributed-replicate volume
2. untar the kernel , let kernel bz2 file be there on the mount point
3. add a pair of bricks to the volume to make 3x2 dist-rep
4. ls on the mount point couple of times
Directory disappears but files still be visible.
ast one of them comes back up.
[2012-05-20 09:10:12.953076] I [client.c:2151:notify] 0-dis-rep-client-5: current graph is no longer active, destroying rpc_client
[2012-05-20 09:10:12.953095] I [client.c:2090:client_rpc_notify] 0-dis-rep-client-2: disconnected
[2012-05-20 09:10:12.953132] I [client.c:2090:client_rpc_notify] 0-dis-rep-client-3: disconnected
[2012-05-20 09:10:12.953147] E [afr-common.c:3665:afr_notify] 0-dis-rep-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2012-05-20 09:10:12.953168] I [client.c:2090:client_rpc_notify] 0-dis-rep-client-4: disconnected
[2012-05-20 09:10:12.953201] I [client.c:2090:client_rpc_notify] 0-dis-rep-client-5: disconnected
[2012-05-20 09:10:12.953218] E [afr-common.c:3665:afr_notify] 0-dis-rep-replicate-2: All subvolumes are down. Going offline until atleast one of them comes back up.
[2012-05-20 09:10:16.228641] I [client-handshake.c:1628:select_server_supported_programs] 1-dis-rep-client-7: Using Program GlusterFS 3.3.0qa42, Num (1298437), Version (330)
[2012-05-20 09:10:16.229118] I [client-handshake.c:1425:client_setvolume_cbk] 1-dis-rep-client-7: Connected to 10.16.157.66:24012, attached to remote volume '/home/bricks/dr8'.
[2012-05-20 09:10:16.229148] I [client-handshake.c:1437:client_setvolume_cbk] 1-dis-rep-client-7: Server and Client lk-version numbers are not same, reopening the fds
[2012-05-20 09:10:16.229451] I [client-handshake.c:453:client_set_lk_version_cbk] 1-dis-rep-client-7: Server lk version = 1
[2012-05-20 09:13:37.362370] C [client-handshake.c:126:rpc_client_ping_timer_expired] 1-dis-rep-client-7: server 10.16.157.66:24012 has not responded in the last 42 seconds, disconnecting.
[2012-05-20 09:13:47.814285] E [timer.c:104:gf_timer_call_cancel] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x2d3) [0x7f56902d1f31] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x211) [0x7f56902d1b94] (-->/usr/local/lib/glusterfs/3.3.0qa42/xlator/protocol/client.so(client_ping_cbk+0x290) [0x7f568bd496f3]))) 1-timer: invalid argument
[2012-05-20 09:13:47.814382] I [socket.c:2315:socket_submit_request] 1-dis-rep-client-7: not connected (priv->connected = 255)
[2012-05-20 09:13:47.814411] W [rpc-clnt.c:1498:rpc_clnt_submit] 1-dis-rep-client-7: failed to submit rpc-request (XID: 0x24x Program: GlusterFS 3.1, ProgVers: 330, Proc: 20) to rpc-transport (dis-rep-client-7)
[2012-05-20 09:13:47.814458] W [client3_1-fops.c:2546:client3_1_opendir_cbk] 1-dis-rep-client-7: remote operation failed: Transport endpoint is not connected. Path: / (00000000-0000-0000-0000-000000000001)
[2012-05-20 09:13:47.814478] I [client.c:2090:client_rpc_notify] 1-dis-rep-client-7: disconnected
[2012-05-20 09:16:51.580785] I [client-handshake.c:1628:select_server_supported_programs] 1-dis-rep-client-7: Using Program GlusterFS 3.3.0qa42, Num (1298437), Version (330)
[2012-05-20 09:16:51.581250] I [client-handshake.c:1425:client_setvolume_cbk] 1-dis-rep-client-7: Connected to 10.16.157.66:24012, att:
if the proper key is used to handshake, the issue should be fixed. (ref: http://review.gluster.com/3314)
Can you confirm the behavior with that patch in?
Shishir, I suspect this is mostly the issue of after graph switch, connection not properly getting established. If yes, make it a dup.
This bug seems to be relatedd to 823404. After a vol file change, all connections to bricks seem to be going down for extended period.
Can you please try to reproduce the bug with the latest git repo?
This works fine with 3.4.0qa5. Please re-open if found otherwise.