Red Hat Bugzilla – Bug 854646
Add-brick to ditributed-replicate volume makes directories invisible for sometime
Last modified: 2013-12-18 19:08:41 EST
+++ This bug was initially created as a clone of Bug #823242 +++
Description of problem:
Adding bricks to a distributed-replicate volume, after addition directories on the mount point will be invisible for sometime, remount makes it visible.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. create a 2x2 distributed-replicate volume
2. untar the kernel , let kernel bz2 file be there on the mount point
3. add a pair of bricks to the volume to make 3x2 dist-rep
4. ls on the mount point couple of times
Directory disappears but files still be visible.
ast one of them comes back up.
[2012-05-20 09:10:12.953076] I [client.c:2151:notify] 0-dis-rep-client-5: current graph is no longer active, destroying rpc_client
[2012-05-20 09:10:12.953095] I [client.c:2090:client_rpc_notify] 0-dis-rep-client-2: disconnected
[2012-05-20 09:10:12.953132] I [client.c:2090:client_rpc_notify] 0-dis-rep-client-3: disconnected
[2012-05-20 09:10:12.953147] E [afr-common.c:3665:afr_notify] 0-dis-rep-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2012-05-20 09:10:12.953168] I [client.c:2090:client_rpc_notify] 0-dis-rep-client-4: disconnected
[2012-05-20 09:10:12.953201] I [client.c:2090:client_rpc_notify] 0-dis-rep-client-5: disconnected
[2012-05-20 09:10:12.953218] E [afr-common.c:3665:afr_notify] 0-dis-rep-replicate-2: All subvolumes are down. Going offline until atleast one of them comes back up.
[2012-05-20 09:10:16.228641] I [client-handshake.c:1628:select_server_supported_programs] 1-dis-rep-client-7: Using Program GlusterFS 3.3.0qa42, Num (1298437), Version (330)
[2012-05-20 09:10:16.229118] I [client-handshake.c:1425:client_setvolume_cbk] 1-dis-rep-client-7: Connected to 10.16.157.66:24012, attached to remote volume '/home/bricks/dr8'.
[2012-05-20 09:10:16.229148] I [client-handshake.c:1437:client_setvolume_cbk] 1-dis-rep-client-7: Server and Client lk-version numbers are not same, reopening the fds
[2012-05-20 09:10:16.229451] I [client-handshake.c:453:client_set_lk_version_cbk] 1-dis-rep-client-7: Server lk version = 1
[2012-05-20 09:13:37.362370] C [client-handshake.c:126:rpc_client_ping_timer_expired] 1-dis-rep-client-7: server 10.16.157.66:24012 has not responded in the last 42 seconds, disconnecting.
[2012-05-20 09:13:47.814285] E [timer.c:104:gf_timer_call_cancel] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x2d3) [0x7f56902d1f31] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x211) [0x7f56902d1b94] (-->/usr/local/lib/glusterfs/3.3.0qa42/xlator/protocol/client.so(client_ping_cbk+0x290) [0x7f568bd496f3]))) 1-timer: invalid argument
[2012-05-20 09:13:47.814382] I [socket.c:2315:socket_submit_request] 1-dis-rep-client-7: not connected (priv->connected = 255)
[2012-05-20 09:13:47.814411] W [rpc-clnt.c:1498:rpc_clnt_submit] 1-dis-rep-client-7: failed to submit rpc-request (XID: 0x24x Program: GlusterFS 3.1, ProgVers: 330, Proc: 20) to rpc-transport (dis-rep-client-7)
[2012-05-20 09:13:47.814458] W [client3_1-fops.c:2546:client3_1_opendir_cbk] 1-dis-rep-client-7: remote operation failed: Transport endpoint is not connected. Path: / (00000000-0000-0000-0000-000000000001)
[2012-05-20 09:13:47.814478] I [client.c:2090:client_rpc_notify] 1-dis-rep-client-7: disconnected
[2012-05-20 09:16:51.580785] I [client-handshake.c:1628:select_server_supported_programs] 1-dis-rep-client-7: Using Program GlusterFS 3.3.0qa42, Num (1298437), Version (330)
[2012-05-20 09:16:51.581250] I [client-handshake.c:1425:client_setvolume_cbk] 1-dis-rep-client-7: Connected to 10.16.157.66:24012, att:
--- Additional comment from firstname.lastname@example.org on 2012-05-21 08:55:15 EDT ---
if the proper key is used to handshake, the issue should be fixed. (ref: http://review.gluster.com/3314)
Can you confirm the behavior with that patch in?
--- Additional comment from email@example.com on 2012-05-22 07:11:07 EDT ---
Shishir, I suspect this is mostly the issue of after graph switch, connection not properly getting established. If yes, make it a dup.
--- Additional comment from firstname.lastname@example.org on 2012-05-22 07:28:46 EDT ---
This bug seems to be relatedd to 823404. After a vol file change, all connections to bricks seem to be going down for extended period.
--- Additional comment from email@example.com on 2012-07-10 23:59:35 EDT ---
Can you please try to reproduce the bug with the latest git repo?
Verified on the build 188.8.131.52rhs.beta3-1.el6rhs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
For information on the advisory, and where to find the updated files, follow the link below.
If the solution does not work for you, open a new bug report.