Bug 765386 (GLUSTER-3654)

Summary: stripe replicated : with one brick down ls hangs
Product: [Community] GlusterFS Reporter: Lakshmipathi G <lakshmipathi>
Component: replicateAssignee: Rajesh <rajesh>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3-betaCC: amarts, gluster-bugs, vagarwal, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-28 08:31:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Lakshmipathi G 2011-09-28 06:33:49 UTC
it continues to hang even after the brick-10.1.12.171 comes back & glusterd started on it.

Comment 1 Lakshmipathi G 2011-09-28 06:36:09 UTC
(In reply to comment #1)
> it continues to hang even after the brick-10.1.12.171 comes back & glusterd
> started on it.

client-log --
--------
[2011-09-27 20:56:36.628442] E [socket.c:1713:socket_connect_finish] 0-repstrp-client-3: connection to 10.1.12.171:24010 failed (Connection refused)
[2011-09-27 20:56:36.631409] E [socket.c:1713:socket_connect_finish] 0-repstrp-client-2: connection to 10.1.12.171:24009 failed (Connection refused)
[2011-09-27 21:09:42.163635] I [afr-common.c:3595:AFR_LOCAL_INIT] 0-repstrp-replicate-1: no subvolumes up
[2011-09-27 21:15:02.62279] I [client-handshake.c:1085:select_server_supported_programs] 0-repstrp-client-2: Using Program GlusterFS 3.3.0qa12, Num (1298437), Version (310)
[2011-09-27 21:15:02.62790] I [client-handshake.c:917:client_setvolume_cbk] 0-repstrp-client-2: Connected to 10.1.12.171:24009, attached to remote volume '/export/repstrp22'.
[2011-09-27 21:15:02.62841] I [afr-common.c:3455:afr_notify] 0-repstrp-replicate-1: Subvolume 'repstrp-client-2' came back up; going online.
[2011-09-27 21:15:02.200319] I [client-handshake.c:1085:select_server_supported_programs] 0-repstrp-client-3: Using Program GlusterFS 3.3.0qa12, Num (1298437), Version (310)
[2011-09-27 21:15:02.487983] I [client-handshake.c:917:client_setvolume_cbk] 0-repstrp-client-3: Connected to 10.1.12.171:24010, attached to remote volume '/export/repstrp_220'.
[2011-09-27 21:15:02.488027] I [afr-common.c:3459:afr_notify] 0-repstrp-replicate-1: subvol 1 came up, start crawl

Comment 2 Lakshmipathi G 2011-09-28 09:13:52 UTC
created stripe-replicated volume with 3.3qa10 - 

# gluster volume info
 
Volume Name: repstrp
Type: Striped-Replicate (RAID 01)
Status: Started
Number of Bricks: 1 x 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.1.11.140:/export/repstrp22
Brick2: 10.1.11.141:/export/repstrp22
Brick3: 10.1.12.171:/export/repstrp22
Brick4: 10.1.12.171:/export/repstrp_220

mount it & reboot brick (3,4) 10.1.12.171-
now doing ls on mountpt just hangs.


client log-
----------------------------------------------------
[2011-09-27 19:20:50.140054] I [afr-common.c:3459:afr_notify] 0-repstrp-replicate-0: subvol 0 came up, start crawl
[2011-09-27 19:20:50.140081] I [afr-common.c:3554:afr_notify] 0-repstrp-replicate-0: All subvolumes came up, start crawl
[2011-09-27 19:20:50.145960] I [fuse-bridge.c:3340:fuse_graph_setup] 0-fuse: switched to graph 0
[2011-09-27 19:20:50.146076] I [fuse-bridge.c:2924:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.10
[2011-09-27 19:20:50.283341] I [afr-common.c:1757:afr_set_root_inode_on_first_lookup] 0-repstrp-replicate-0: added root inode
[2011-09-27 19:20:50.284584] I [afr-common.c:1757:afr_set_root_inode_on_first_lookup] 0-repstrp-replicate-1: added root inode
[2011-09-27 20:56:16.588027] C [client-handshake.c:121:rpc_client_ping_timer_expired] 0-repstrp-client-3: server 10.1.12.171:24010 has not responded in the last 42 seconds, disconnecting.
[2011-09-27 20:56:16.608258] C [client-handshake.c:121:rpc_client_ping_timer_expired] 0-repstrp-client-2: server 10.1.12.171:24009 has not responded in the last 42 seconds, disconnecting.
[2011-09-27 20:56:16.640979] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x2affa7d27ed9] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x2affa7d2767e] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x2affa7d275ee]))) 0-repstrp-client-3: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2011-09-27 20:55:33.42194
[2011-09-27 20:56:16.641017] W [client3_1-fops.c:2250:client3_1_lookup_cbk] 0-repstrp-client-3: remote operation failed: Transport endpoint is not connected. Path: /
[2011-09-27 20:56:16.641081] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x2affa7d27ed9] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x2affa7d2767e] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x2affa7d275ee]))) 0-repstrp-client-3: forced unwinding frame type(GlusterFS Handshake) op(PING(3)) called at 2011-09-27 20:55:34.524228
[2011-09-27 20:56:16.641766] W [client-handshake.c:265:client_ping_cbk] 0-repstrp-client-3: timer must have expired
[2011-09-27 20:56:16.641787] I [client.c:1885:client_rpc_notify] 0-repstrp-client-3: disconnected
[2011-09-27 20:56:16.641842] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x2affa7d27ed9] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x2affa7d2767e] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x2affa7d275ee]))) 0-repstrp-client-2: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2011-09-27 20:55:33.42173
[2011-09-27 20:56:16.641858] W [client3_1-fops.c:2250:client3_1_lookup_cbk] 0-repstrp-client-2: remote operation failed: Transport endpoint is not connected. Path: /
[2011-09-27 20:56:16.642105] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x2affa7d27ed9] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x2affa7d2767e] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x2affa7d275ee]))) 0-repstrp-client-2: forced unwinding frame type(GlusterFS Handshake) op(PING(3)) called at 2011-09-27 20:55:34.524237
[2011-09-27 20:56:16.642124] W [client-handshake.c:265:client_ping_cbk] 0-repstrp-client-2: timer must have expired
[2011-09-27 20:56:16.642135] I [client.c:1885:client_rpc_notify] 0-repstrp-client-2: disconnected
[2011-09-27 20:56:16.642144] E [afr-common.c:3484:afr_notify] 0-repstrp-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2011-09-27 20:56:16.676308] W [fuse-bridge.c:1570:fuse_create_cbk] 0-glusterfs-fuse: 14960: /i => -1 (Input/output error)
[2011-09-27 20:56:28.524599] W [fuse-bridge.c:1570:fuse_create_cbk] 0-glusterfs-fuse: 14964: /i3 => -1 (Input/output error)
[2011-09-27 20:56:30.191192] I [afr-common.c:3595:AFR_LOCAL_INIT] 0-repstrp-replicate-1: no subvolumes up
[2011-09-27 20:56:30.192439] I [afr-common.c:3595:AFR_LOCAL_INIT] 0-repstrp-replicate-1: no subvolumes up
[2011-09-27 20:56:30.192503] I [afr-common.c:3595:AFR_LOCAL_INIT] 0-repstrp-replicate-1: no subvolumes up
[2011-09-27 20:56:36.628442] E [socket.c:1713:socket_connect_finish] 0-repstrp-client-3: connection to 10.1.12.171:24010 failed (Connection refused)
[2011-09-27 20:56:36.631409] E [socket.c:1713:socket_connect_finish] 0-repstrp-client-2: connection to 10.1.12.171:24009 failed (Connection refused)
---------------------------------------------

Comment 3 shishir gowda 2011-09-29 02:15:15 UTC
looks like a afr issue. reassigning it to kp

Comment 4 Amar Tumballi 2012-02-28 08:31:44 UTC
With latest 3.3.0qa24 releases, not seeing the behavior. Please re-open the bug if seen again.