Bug 763834 (GLUSTER-2102) - [3.1.1qa5]: Replicate fails with ctdb with failover
Summary: [3.1.1qa5]: Replicate fails with ctdb with failover
Keywords:
Status: CLOSED WORKSFORME
Alias: GLUSTER-2102
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.1.1
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: shishir gowda
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-11-13 00:10 UTC by Harshavardhana
Modified: 2015-03-23 01:04 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Harshavardhana 2010-11-13 00:10:31 UTC
Setup is simple enough 2 backends replicated setup with glusterfs serving CTDB. 

IP failover is fine, ip migrates smoothly. 

But when an I/O is going on from windows CIFS client, we see a hang for 42secs on the i/o which seems unwise for a replicated volume. 

This in-turn makes windows believe that connection is lost and turns out the file which was being copied is aborted.  


Relevant log messages are as below

----------------------

[2010-11-09 17:08:33.481173] I [client-handshake.c:829:client_setvolume_cbk] repl-client-0: Connected 
to 10.1.10.112:24010, attached to remote volume '/sdb'.
[2010-11-10 13:02:28.281725] E [socket.c:1657:socket_connect_finish] repl-client-0: connection to 10.1
.10.112:24010 failed (No route to host)
[2010-11-10 13:04:51.671883] I [client-handshake.c:993:select_server_supported_programs] repl-client-0
: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2010-11-10 13:04:51.675596] I [client-handshake.c:829:client_setvolume_cbk] repl-client-0: Connected 
to 10.1.10.112:24010, attached to remote volume '/sdb'.
[2010-11-10 13:04:51.675618] I [client-handshake.c:698:client_post_handshake] repl-client-0: 1 fds ope
n - Delaying child_up until they are re-opened
[2010-11-12 11:42:36.160148] E [socket.c:1657:socket_connect_finish] repl-client-0: connection to 10.1
.10.112:24010 failed (No route to host)
[2010-11-12 11:45:32.464700] I [client-handshake.c:993:select_server_supported_programs] repl-client-0
: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2010-11-12 11:45:32.465001] I [client-handshake.c:829:client_setvolume_cbk] repl-client-0: Connected 
to 10.1.10.112:24010, attached to remote volume '/sdb'.
[2010-11-12 15:55:25.971214] I [afr-common.c:716:afr_lookup_done] repl-replicate-0: background  meta-d
ata self-heal triggered. path: /lost+found
[2010-11-12 15:55:26.78397] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] repl-replicat
e-0: background  meta-data self-heal completed on /lost+found
[2010-11-12 15:59:07.328764] E [client-handshake.c:116:rpc_client_ping_timer_expired] repl-client-0: S
erver 10.1.10.112:24010 has not responded in the last 42 seconds, disconnecting.
[2010-11-12 15:59:07.423161] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_c
lnt_notify+0xb9) [0x317f40f689] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x317f
40ee2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x317f40ed9e]))) rpc-clnt: forced unwi
nding frame type(GlusterFS 3.1) op(WRITE(13)) called at 2010-11-12 15:56:47.243685
[2010-11-12 15:59:07.425476] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_c
lnt_notify+0xb9) [0x317f40f689] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x317f
40ee2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x317f40ed9e]))) rpc-clnt: forced unwi
nding frame type(GlusterFS 3.1) op(WRITE(13)) called at 2010-11-12 15:56:47.243801
[2010-11-12 15:59:07.425610] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_c
lnt_notify+0xb9) [0x317f40f689] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x317f
40ee2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x317f40ed9e]))) rpc-clnt: forced unwi
nding frame type(GlusterFS 3.1) op(WRITE(13)) called at 2010-11-12 15:56:47.243853
[2010-11-12 15:59:07.425781] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x317f40f689] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x317f40ee2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x317f40ed9e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(WRITE(13)) called at 2010-11-12 15:56:47.243869
[2010-11-12 15:59:07.425920] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x317f40f689] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x317f40ee2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x317f40ed9e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(WRITE(13)) called at 2010-11-12 15:56:47.243884
[2010-11-12 15:59:07.434647] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x317f40f689] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x317f40ee2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x317f40ed9e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(WRITE(13)) called at 2010-11-12 15:56:47.243899
[2010-11-12 15:59:07.434794] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x317f40f689] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x317f40ee2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x317f40ed9e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(WRITE(13)) called at 2010-11-12 15:56:47.243914
[2010-11-12 15:59:07.434924] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x317f40f689] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x317f40ee2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x317f40ed9e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2010-11-12 15:56:47.244556
[2010-11-12 15:59:07.435438] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x317f40f689] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x317f40ee2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x317f40ed9e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(FXATTROP(34)) called at 2010-11-12 15:56:47.244569
[2010-11-12 15:59:07.435566] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x317f40f689] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x317f40ee2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x317f40ed9e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(FINODELK(30)) called at 2010-11-12 15:56:47.244581
[2010-11-12 15:59:07.435603] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x317f40f689] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x317f40ee2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x317f40ed9e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2010-11-12 15:56:51.668307
[2010-11-12 15:59:07.435659] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x317f40f689] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x317f40ee2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x317f40ed9e]))) rpc-clnt: forced unwinding frame type(GlusterFS Handshake) op(PING(3)) called at 2010-11-12 15:57:01.103051
[2010-11-12 15:59:07.435693] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x317f40f689] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x317f40ee2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x317f40ed9e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2010-11-12 15:57:26.739047
[2010-11-12 15:59:07.435750] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x317f40f689] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x317f40ee2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x317f40ed9e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2010-11-12 15:58:01.676669
[2010-11-12 15:59:07.435791] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x317f40f689] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x317f40ee2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x317f40ed9e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(FINODELK(30)) called at 2010-11-12 15:58:25.668064
[2010-11-12 15:59:10.376323] E [socket.c:1657:socket_connect_finish] repl-client-0: connection to 10.1.10.112:24010 failed (No route to host)
------------------------------------------------

Node was shutdown to see ip migrate and CIFS does failover which didn't happen. 

Running a stand alone dd we could see a block for 42secs for i/o. 

I am using Native GlusterFS mount for CTDB.

Comment 1 Harshavardhana 2011-01-26 22:36:23 UTC
CTDB failover works, it was a configuration issue.


Note You need to log in before you can comment on or make changes to this bug.