Bug 823242

Summary: Add-brick to ditributed-replicate volume makes directories invisible for sometime
Product: [Community] GlusterFS Reporter: shylesh <shmohan>
Component: glusterdAssignee: shishir gowda <sgowda>
Status: CLOSED WORKSFORME QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: pre-releaseCC: gluster-bugs, nsathyan, vbellur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 854646 (view as bug list) Environment:
Last Closed: 2012-12-26 05:12:50 EST Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 854646    

Description shylesh 2012-05-20 05:22:48 EDT
Description of problem:
Adding bricks to a distributed-replicate volume, after addition directories on the mount point will be invisible for sometime, remount makes it visible.

Version-Release number of selected component (if applicable):

3.3.0qa42
How reproducible:


Steps to Reproduce:
1. create a 2x2 distributed-replicate volume 
2. untar the kernel , let kernel bz2 file be there on the mount point
3. add a pair of bricks to the volume to make 3x2 dist-rep
4. ls on the mount point couple of times
  
Actual results:
Directory disappears but files still be visible.

Expected results:


Additional info:
ast one of them comes back up.
[2012-05-20 09:10:12.953076] I [client.c:2151:notify] 0-dis-rep-client-5: current graph is no longer active, destroying rpc_client 
[2012-05-20 09:10:12.953095] I [client.c:2090:client_rpc_notify] 0-dis-rep-client-2: disconnected
[2012-05-20 09:10:12.953132] I [client.c:2090:client_rpc_notify] 0-dis-rep-client-3: disconnected
[2012-05-20 09:10:12.953147] E [afr-common.c:3665:afr_notify] 0-dis-rep-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2012-05-20 09:10:12.953168] I [client.c:2090:client_rpc_notify] 0-dis-rep-client-4: disconnected
[2012-05-20 09:10:12.953201] I [client.c:2090:client_rpc_notify] 0-dis-rep-client-5: disconnected
[2012-05-20 09:10:12.953218] E [afr-common.c:3665:afr_notify] 0-dis-rep-replicate-2: All subvolumes are down. Going offline until atleast one of them comes back up.
[2012-05-20 09:10:16.228641] I [client-handshake.c:1628:select_server_supported_programs] 1-dis-rep-client-7: Using Program GlusterFS 3.3.0qa42, Num (1298437), Version (330)
[2012-05-20 09:10:16.229118] I [client-handshake.c:1425:client_setvolume_cbk] 1-dis-rep-client-7: Connected to 10.16.157.66:24012, attached to remote volume '/home/bricks/dr8'.
[2012-05-20 09:10:16.229148] I [client-handshake.c:1437:client_setvolume_cbk] 1-dis-rep-client-7: Server and Client lk-version numbers are not same, reopening the fds
[2012-05-20 09:10:16.229451] I [client-handshake.c:453:client_set_lk_version_cbk] 1-dis-rep-client-7: Server lk version = 1
[2012-05-20 09:13:37.362370] C [client-handshake.c:126:rpc_client_ping_timer_expired] 1-dis-rep-client-7: server 10.16.157.66:24012 has not responded in the last 42 seconds, disconnecting.
[2012-05-20 09:13:47.814285] E [timer.c:104:gf_timer_call_cancel] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x2d3) [0x7f56902d1f31] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x211) [0x7f56902d1b94] (-->/usr/local/lib/glusterfs/3.3.0qa42/xlator/protocol/client.so(client_ping_cbk+0x290) [0x7f568bd496f3]))) 1-timer: invalid argument
[2012-05-20 09:13:47.814382] I [socket.c:2315:socket_submit_request] 1-dis-rep-client-7: not connected (priv->connected = 255)
[2012-05-20 09:13:47.814411] W [rpc-clnt.c:1498:rpc_clnt_submit] 1-dis-rep-client-7: failed to submit rpc-request (XID: 0x24x Program: GlusterFS 3.1, ProgVers: 330, Proc: 20) to rpc-transport (dis-rep-client-7)
[2012-05-20 09:13:47.814458] W [client3_1-fops.c:2546:client3_1_opendir_cbk] 1-dis-rep-client-7: remote operation failed: Transport endpoint is not connected. Path: / (00000000-0000-0000-0000-000000000001)
[2012-05-20 09:13:47.814478] I [client.c:2090:client_rpc_notify] 1-dis-rep-client-7: disconnected
[2012-05-20 09:16:51.580785] I [client-handshake.c:1628:select_server_supported_programs] 1-dis-rep-client-7: Using Program GlusterFS 3.3.0qa42, Num (1298437), Version (330)
[2012-05-20 09:16:51.581250] I [client-handshake.c:1425:client_setvolume_cbk] 1-dis-rep-client-7: Connected to 10.16.157.66:24012, att:
Comment 1 Amar Tumballi 2012-05-21 08:55:15 EDT
if the proper key is used to handshake, the issue should be fixed. (ref: http://review.gluster.com/3314)

Can you confirm the behavior with that patch in?
Comment 2 Amar Tumballi 2012-05-22 07:11:07 EDT
Shishir, I suspect this is mostly the issue of after graph switch, connection not properly getting established. If yes, make it a dup.
Comment 3 shishir gowda 2012-05-22 07:28:46 EDT
This bug seems to be relatedd to 823404. After a vol file change, all connections to bricks seem to be going down for extended period.
Comment 4 shishir gowda 2012-07-10 23:59:35 EDT
Can you please try to reproduce the bug with the latest git repo?
Comment 5 shishir gowda 2012-12-26 05:12:50 EST
This works fine with 3.4.0qa5. Please re-open if found otherwise.