Hide Forgot
it continues to hang even after the brick-10.1.12.171 comes back & glusterd started on it.
(In reply to comment #1) > it continues to hang even after the brick-10.1.12.171 comes back & glusterd > started on it. client-log -- -------- [2011-09-27 20:56:36.628442] E [socket.c:1713:socket_connect_finish] 0-repstrp-client-3: connection to 10.1.12.171:24010 failed (Connection refused) [2011-09-27 20:56:36.631409] E [socket.c:1713:socket_connect_finish] 0-repstrp-client-2: connection to 10.1.12.171:24009 failed (Connection refused) [2011-09-27 21:09:42.163635] I [afr-common.c:3595:AFR_LOCAL_INIT] 0-repstrp-replicate-1: no subvolumes up [2011-09-27 21:15:02.62279] I [client-handshake.c:1085:select_server_supported_programs] 0-repstrp-client-2: Using Program GlusterFS 3.3.0qa12, Num (1298437), Version (310) [2011-09-27 21:15:02.62790] I [client-handshake.c:917:client_setvolume_cbk] 0-repstrp-client-2: Connected to 10.1.12.171:24009, attached to remote volume '/export/repstrp22'. [2011-09-27 21:15:02.62841] I [afr-common.c:3455:afr_notify] 0-repstrp-replicate-1: Subvolume 'repstrp-client-2' came back up; going online. [2011-09-27 21:15:02.200319] I [client-handshake.c:1085:select_server_supported_programs] 0-repstrp-client-3: Using Program GlusterFS 3.3.0qa12, Num (1298437), Version (310) [2011-09-27 21:15:02.487983] I [client-handshake.c:917:client_setvolume_cbk] 0-repstrp-client-3: Connected to 10.1.12.171:24010, attached to remote volume '/export/repstrp_220'. [2011-09-27 21:15:02.488027] I [afr-common.c:3459:afr_notify] 0-repstrp-replicate-1: subvol 1 came up, start crawl
created stripe-replicated volume with 3.3qa10 - # gluster volume info Volume Name: repstrp Type: Striped-Replicate (RAID 01) Status: Started Number of Bricks: 1 x 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.1.11.140:/export/repstrp22 Brick2: 10.1.11.141:/export/repstrp22 Brick3: 10.1.12.171:/export/repstrp22 Brick4: 10.1.12.171:/export/repstrp_220 mount it & reboot brick (3,4) 10.1.12.171- now doing ls on mountpt just hangs. client log- ---------------------------------------------------- [2011-09-27 19:20:50.140054] I [afr-common.c:3459:afr_notify] 0-repstrp-replicate-0: subvol 0 came up, start crawl [2011-09-27 19:20:50.140081] I [afr-common.c:3554:afr_notify] 0-repstrp-replicate-0: All subvolumes came up, start crawl [2011-09-27 19:20:50.145960] I [fuse-bridge.c:3340:fuse_graph_setup] 0-fuse: switched to graph 0 [2011-09-27 19:20:50.146076] I [fuse-bridge.c:2924:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.10 [2011-09-27 19:20:50.283341] I [afr-common.c:1757:afr_set_root_inode_on_first_lookup] 0-repstrp-replicate-0: added root inode [2011-09-27 19:20:50.284584] I [afr-common.c:1757:afr_set_root_inode_on_first_lookup] 0-repstrp-replicate-1: added root inode [2011-09-27 20:56:16.588027] C [client-handshake.c:121:rpc_client_ping_timer_expired] 0-repstrp-client-3: server 10.1.12.171:24010 has not responded in the last 42 seconds, disconnecting. [2011-09-27 20:56:16.608258] C [client-handshake.c:121:rpc_client_ping_timer_expired] 0-repstrp-client-2: server 10.1.12.171:24009 has not responded in the last 42 seconds, disconnecting. [2011-09-27 20:56:16.640979] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x2affa7d27ed9] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x2affa7d2767e] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x2affa7d275ee]))) 0-repstrp-client-3: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2011-09-27 20:55:33.42194 [2011-09-27 20:56:16.641017] W [client3_1-fops.c:2250:client3_1_lookup_cbk] 0-repstrp-client-3: remote operation failed: Transport endpoint is not connected. Path: / [2011-09-27 20:56:16.641081] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x2affa7d27ed9] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x2affa7d2767e] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x2affa7d275ee]))) 0-repstrp-client-3: forced unwinding frame type(GlusterFS Handshake) op(PING(3)) called at 2011-09-27 20:55:34.524228 [2011-09-27 20:56:16.641766] W [client-handshake.c:265:client_ping_cbk] 0-repstrp-client-3: timer must have expired [2011-09-27 20:56:16.641787] I [client.c:1885:client_rpc_notify] 0-repstrp-client-3: disconnected [2011-09-27 20:56:16.641842] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x2affa7d27ed9] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x2affa7d2767e] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x2affa7d275ee]))) 0-repstrp-client-2: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2011-09-27 20:55:33.42173 [2011-09-27 20:56:16.641858] W [client3_1-fops.c:2250:client3_1_lookup_cbk] 0-repstrp-client-2: remote operation failed: Transport endpoint is not connected. Path: / [2011-09-27 20:56:16.642105] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x2affa7d27ed9] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x2affa7d2767e] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x2affa7d275ee]))) 0-repstrp-client-2: forced unwinding frame type(GlusterFS Handshake) op(PING(3)) called at 2011-09-27 20:55:34.524237 [2011-09-27 20:56:16.642124] W [client-handshake.c:265:client_ping_cbk] 0-repstrp-client-2: timer must have expired [2011-09-27 20:56:16.642135] I [client.c:1885:client_rpc_notify] 0-repstrp-client-2: disconnected [2011-09-27 20:56:16.642144] E [afr-common.c:3484:afr_notify] 0-repstrp-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up. [2011-09-27 20:56:16.676308] W [fuse-bridge.c:1570:fuse_create_cbk] 0-glusterfs-fuse: 14960: /i => -1 (Input/output error) [2011-09-27 20:56:28.524599] W [fuse-bridge.c:1570:fuse_create_cbk] 0-glusterfs-fuse: 14964: /i3 => -1 (Input/output error) [2011-09-27 20:56:30.191192] I [afr-common.c:3595:AFR_LOCAL_INIT] 0-repstrp-replicate-1: no subvolumes up [2011-09-27 20:56:30.192439] I [afr-common.c:3595:AFR_LOCAL_INIT] 0-repstrp-replicate-1: no subvolumes up [2011-09-27 20:56:30.192503] I [afr-common.c:3595:AFR_LOCAL_INIT] 0-repstrp-replicate-1: no subvolumes up [2011-09-27 20:56:36.628442] E [socket.c:1713:socket_connect_finish] 0-repstrp-client-3: connection to 10.1.12.171:24010 failed (Connection refused) [2011-09-27 20:56:36.631409] E [socket.c:1713:socket_connect_finish] 0-repstrp-client-2: connection to 10.1.12.171:24009 failed (Connection refused) ---------------------------------------------
looks like a afr issue. reassigning it to kp
With latest 3.3.0qa24 releases, not seeing the behavior. Please re-open the bug if seen again.