Bug 854646

Summary: Add-brick to ditributed-replicate volume makes directories invisible for sometime
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vidya Sakar <vinaraya>
Component: glusterdAssignee: Amar Tumballi <amarts>
Status: CLOSED ERRATA QA Contact: shylesh <shmohan>
Severity: high Docs Contact:
Priority: medium    
Version: 2.0CC: gluster-bugs, rfortier, rhs-bugs, sdharane, shmohan, vbellur, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 823242 Environment:
Last Closed: 2013-09-23 22:39:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 823242    
Bug Blocks:    

Description Vidya Sakar 2012-09-05 13:42:03 UTC
+++ This bug was initially created as a clone of Bug #823242 +++

Description of problem:
Adding bricks to a distributed-replicate volume, after addition directories on the mount point will be invisible for sometime, remount makes it visible.

Version-Release number of selected component (if applicable):

3.3.0qa42
How reproducible:


Steps to Reproduce:
1. create a 2x2 distributed-replicate volume 
2. untar the kernel , let kernel bz2 file be there on the mount point
3. add a pair of bricks to the volume to make 3x2 dist-rep
4. ls on the mount point couple of times
  
Actual results:
Directory disappears but files still be visible.

Expected results:


Additional info:
ast one of them comes back up.
[2012-05-20 09:10:12.953076] I [client.c:2151:notify] 0-dis-rep-client-5: current graph is no longer active, destroying rpc_client 
[2012-05-20 09:10:12.953095] I [client.c:2090:client_rpc_notify] 0-dis-rep-client-2: disconnected
[2012-05-20 09:10:12.953132] I [client.c:2090:client_rpc_notify] 0-dis-rep-client-3: disconnected
[2012-05-20 09:10:12.953147] E [afr-common.c:3665:afr_notify] 0-dis-rep-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2012-05-20 09:10:12.953168] I [client.c:2090:client_rpc_notify] 0-dis-rep-client-4: disconnected
[2012-05-20 09:10:12.953201] I [client.c:2090:client_rpc_notify] 0-dis-rep-client-5: disconnected
[2012-05-20 09:10:12.953218] E [afr-common.c:3665:afr_notify] 0-dis-rep-replicate-2: All subvolumes are down. Going offline until atleast one of them comes back up.
[2012-05-20 09:10:16.228641] I [client-handshake.c:1628:select_server_supported_programs] 1-dis-rep-client-7: Using Program GlusterFS 3.3.0qa42, Num (1298437), Version (330)
[2012-05-20 09:10:16.229118] I [client-handshake.c:1425:client_setvolume_cbk] 1-dis-rep-client-7: Connected to 10.16.157.66:24012, attached to remote volume '/home/bricks/dr8'.
[2012-05-20 09:10:16.229148] I [client-handshake.c:1437:client_setvolume_cbk] 1-dis-rep-client-7: Server and Client lk-version numbers are not same, reopening the fds
[2012-05-20 09:10:16.229451] I [client-handshake.c:453:client_set_lk_version_cbk] 1-dis-rep-client-7: Server lk version = 1
[2012-05-20 09:13:37.362370] C [client-handshake.c:126:rpc_client_ping_timer_expired] 1-dis-rep-client-7: server 10.16.157.66:24012 has not responded in the last 42 seconds, disconnecting.
[2012-05-20 09:13:47.814285] E [timer.c:104:gf_timer_call_cancel] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x2d3) [0x7f56902d1f31] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x211) [0x7f56902d1b94] (-->/usr/local/lib/glusterfs/3.3.0qa42/xlator/protocol/client.so(client_ping_cbk+0x290) [0x7f568bd496f3]))) 1-timer: invalid argument
[2012-05-20 09:13:47.814382] I [socket.c:2315:socket_submit_request] 1-dis-rep-client-7: not connected (priv->connected = 255)
[2012-05-20 09:13:47.814411] W [rpc-clnt.c:1498:rpc_clnt_submit] 1-dis-rep-client-7: failed to submit rpc-request (XID: 0x24x Program: GlusterFS 3.1, ProgVers: 330, Proc: 20) to rpc-transport (dis-rep-client-7)
[2012-05-20 09:13:47.814458] W [client3_1-fops.c:2546:client3_1_opendir_cbk] 1-dis-rep-client-7: remote operation failed: Transport endpoint is not connected. Path: / (00000000-0000-0000-0000-000000000001)
[2012-05-20 09:13:47.814478] I [client.c:2090:client_rpc_notify] 1-dis-rep-client-7: disconnected
[2012-05-20 09:16:51.580785] I [client-handshake.c:1628:select_server_supported_programs] 1-dis-rep-client-7: Using Program GlusterFS 3.3.0qa42, Num (1298437), Version (330)
[2012-05-20 09:16:51.581250] I [client-handshake.c:1425:client_setvolume_cbk] 1-dis-rep-client-7: Connected to 10.16.157.66:24012, att:

--- Additional comment from amarts on 2012-05-21 08:55:15 EDT ---

if the proper key is used to handshake, the issue should be fixed. (ref: http://review.gluster.com/3314)

Can you confirm the behavior with that patch in?

--- Additional comment from amarts on 2012-05-22 07:11:07 EDT ---

Shishir, I suspect this is mostly the issue of after graph switch, connection not properly getting established. If yes, make it a dup.

--- Additional comment from sgowda on 2012-05-22 07:28:46 EDT ---

This bug seems to be relatedd to 823404. After a vol file change, all connections to bricks seem to be going down for extended period.

--- Additional comment from sgowda on 2012-07-10 23:59:35 EDT ---

Can you please try to reproduce the bug with the latest git repo?

Comment 4 shylesh 2013-07-10 07:31:13 UTC
Verified on the build 3.4.0.12rhs.beta3-1.el6rhs.x86_64

Comment 5 Scott Haines 2013-09-23 22:39:10 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 6 Scott Haines 2013-09-23 22:43:41 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html