Description of problem: I create a replicated volume on server (GlusterFS version 3.6.1) and mount it to a Linux client 2 nodes but difference version (glusterfs-fuse 3.4.0.57rhs-1.el6_5 and glusterfs-fuse-3.6.0.29-2.el6).fuse-version 3.4.0.57rhs-1.el6_5 has no problem but fuse-version-3.6.0.29-2.el6 can't mount and make fuse 3.4.0.57rhs-1.el6_5 disconnect from server.Is this bug for fuse-3.6.0.29-2.el6 ? Thanks. Version-Release number of selected component (if applicable): glusterfs-3.6.1 (2 server) FreeBSD 9.2 glusterfs-fuse-3.4.0.57rhs-1.el6_5 (client) Linux CentOS 6.5 glusterfs-fuse-3.6.0.29-2.el6 (client) Linux CentOS 6.5 How reproducible: 1.Add peer with 2 FreeBSD nodes : OK 2.Create distributed volume : OK 3.Mount volume on the Linux client-fuse-3.4.0.57rhs-1.el6_5 : OK 4.Mount volume on the Linux client-fuse-3.6.0.29-2.el6 : ERROR (client fuse-3.4.0.57rhs-1.el6_5 disconnect from server) Steps to Reproduce: 192.168.231.1 (host 1 FreeBSD 9.2) 192.168.231.2 (host 2 FreeBSD 9.2) 192.168.231.3 (Client Linux 6.5) glusterfs-fuse-3.4.0.57rhs-1.el6_5 192.168.231.4 (Client Linux 6.5) glusterfs-fuse-3.6.0.29-2.el6 (host 1) # gluster peer probe 192.168.231.2 # gluster volume create rep-vol replica 2 192.168.231.1:/mnt/test12/brick02 192.168.231.2:/mnt/test12/brick02 # gluster volume start rep-vol (Client 1) # mount.glusterfs 192.168.231.1:/rep-vol /mnt/replicated (OK) (Client 2) # mount.glusterfs 192.168.231.2:/rep-vol /mnt/replicated (ERROR) Actual results: client 2 (glusterfs-fuse-3.6.0.29-2.el6) can't mount glusterfs volume. Expected results: mount ok on two clients. Additional info: [2015-01-09 03:58:56.183047] E [socket.c:2169:socket_connect_finish] 0-rep-vol-client-0: connection to 192.168.231.1:49152 failed (Connection refused) [2015-01-09 03:59:00.183443] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-0: changing port to 49152 (from 0) [2015-01-09 03:59:00.189524] E [socket.c:2169:socket_connect_finish] 0-rep-vol-client-0: connection to 192.168.231.1:49152 failed (Connection refused) [2015-01-09 03:59:04.187947] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-0: changing port to 49152 (from 0) [2015-01-09 03:59:04.191556] E [socket.c:2169:socket_connect_finish] 0-rep-vol-client-0: connection to 192.168.231.1:49152 failed (Connection refused)
This might be related to bug 1181588, although that is mainly about rebalance. Could you attach the full logs from the client mounts, and at least one log from the bricks? Also, as workaround, you should be able to use the current 3.6 RPMs from this repository: - http://download.gluster.org/pub/gluster/glusterfs/3.6/LATEST/CentOS/
You also may want to try disabling performance.readdir-ahead on the volume: # gluster volume set $VOLUME performance.readdir-ahead off The older clients do not have this functionality and mounting may get prevented bacause of it. The logs should give some more hints about it too.
(In reply to Niels de Vos from comment #1) > This might be related to bug 1181588, although that is mainly about > rebalance. > > Could you attach the full logs from the client mounts, and at least one log > from the bricks? > > Also, as workaround, you should be able to use the current 3.6 RPMs from > this repository: > - http://download.gluster.org/pub/gluster/glusterfs/3.6/LATEST/CentOS/ Client full logs 1: volume rep-vol-client-0 2: type protocol/client 3: option ping-timeout 42 4: option remote-host 192.168.231.2 5: option remote-subvolume /mnt/test12/brick01 6: option transport-type socket 7: option send-gids true 8: end-volume 9: 10: volume rep-vol-client-1 11: type protocol/client 12: option ping-timeout 42 13: option remote-host 192.168.231.1 14: option remote-subvolume /mnt/test12/brick01 15: option transport-type socket 16: option send-gids true 17: end-volume 18: 19: volume rep-vol-replicate-0 20: type cluster/replicate 21: subvolumes rep-vol-client-0 rep-vol-client-1 22: end-volume 23: 24: volume rep-vol-dht 25: type cluster/distribute 26: subvolumes rep-vol-replicate-0 27: end-volume 28: 29: volume rep-vol-write-behind 30: type performance/write-behind 31: subvolumes rep-vol-dht 32: end-volume 33: 34: volume rep-vol-read-ahead 35: type performance/read-ahead 36: subvolumes rep-vol-write-behind 37: end-volume 38: 39: volume rep-vol-io-cache 40: type performance/io-cache 41: subvolumes rep-vol-read-ahead 42: end-volume 43: 44: volume rep-vol-quick-read 45: type performance/quick-read 46: subvolumes rep-vol-io-cache 47: end-volume 48: 49: volume rep-vol-open-behind 50: type performance/open-behind 51: subvolumes rep-vol-quick-read 52: end-volume 53: 54: volume rep-vol-md-cache 55: type performance/md-cache 56: subvolumes rep-vol-open-behind 57: end-volume 58: 59: volume rep-vol 60: type debug/io-stats 61: option latency-measurement off 62: option count-fop-hits off 63: subvolumes rep-vol-md-cache 64: end-volume 65: 66: volume meta-autoload 67: type meta 68: subvolumes rep-vol 69: end-volume +------------------------------------------------------------------------------+ [2015-01-14 04:17:42.230293] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-1: changing port to 49152 (from 0) [2015-01-14 04:17:42.230382] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-0: changing port to 49152 (from 0) [2015-01-14 04:17:42.242279] I [client-handshake.c:1415:select_server_supported_programs] 0-rep-vol-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-01-14 04:17:42.242485] I [client-handshake.c:1415:select_server_supported_programs] 0-rep-vol-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-01-14 04:17:42.251988] I [client-handshake.c:1200:client_setvolume_cbk] 0-rep-vol-client-1: Connected to rep-vol-client-1, attached to remote volume '/mnt/test12/brick01'. [2015-01-14 04:17:42.252033] I [client-handshake.c:1212:client_setvolume_cbk] 0-rep-vol-client-1: Server and Client lk-version numbers are not same, reopening the fds [2015-01-14 04:17:42.252204] I [MSGID: 108005] [afr-common.c:4245:afr_notify] 0-rep-vol-replicate-0: Subvolume 'rep-vol-client-1' came back up; going online. [2015-01-14 04:17:42.252380] I [client-handshake.c:188:client_set_lk_version_cbk] 0-rep-vol-client-1: Server lk version = 1 [2015-01-14 04:17:42.254012] I [client-handshake.c:1200:client_setvolume_cbk] 0-rep-vol-client-0: Connected to rep-vol-client-0, attached to remote volume '/mnt/test12/brick01'. [2015-01-14 04:17:42.254083] I [client-handshake.c:1212:client_setvolume_cbk] 0-rep-vol-client-0: Server and Client lk-version numbers are not same, reopening the fds [2015-01-14 04:17:42.260922] I [fuse-bridge.c:5042:fuse_graph_setup] 0-fuse: switched to graph 0 [2015-01-14 04:17:42.261065] I [client-handshake.c:188:client_set_lk_version_cbk] 0-rep-vol-client-0: Server lk version = 1 [2015-01-14 04:17:42.261265] I [fuse-bridge.c:3971:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.14 [2015-01-14 04:28:55.344774] C [rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-rep-vol-client-0: server 192.168.231.2:49152 has not responded in the last 42 seconds, disconnecting. [2015-01-14 04:28:55.345449] E [rpc-clnt.c:362:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15d) [0x7fc84c67ee6d] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91) [0x7fc84c67e8a1] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fc84c67e7ee]))) 0-rep-vol-client-0: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-01-14 04:17:42.261641 (xid=0x8) [2015-01-14 04:28:55.345505] W [client-rpc-fops.c:2761:client3_3_lookup_cbk] 0-rep-vol-client-0: remote operation failed: Transport endpoint is not connected. Path: / (00000000-0000-0000-0000-000000000001) [2015-01-14 04:28:55.345589] E [rpc-clnt.c:362:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15d) [0x7fc84c67ee6d] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91) [0x7fc84c67e8a1] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fc84c67e7ee]))) 0-rep-vol-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-01-14 04:28:13.335734 (xid=0x17) [2015-01-14 04:28:55.345607] W [rpc-clnt-ping.c:154:rpc_clnt_ping_cbk] 0-rep-vol-client-0: socket disconnected [2015-01-14 04:28:55.345648] I [client.c:2215:client_rpc_notify] 0-rep-vol-client-0: disconnected from rep-vol-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2015-01-14 04:29:05.353569] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-0: changing port to 49152 (from 0) [2015-01-14 04:30:31.364619] W [socket.c:529:__socket_rwv] 0-rep-vol-client-0: readv on 192.168.231.2:49152 failed (Connection reset by peer) [2015-01-14 04:30:31.364752] E [rpc-clnt.c:362:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15d) [0x7fc84c67ee6d] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91) [0x7fc84c67e8a1] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fc84c67e7ee]))) 0-rep-vol-client-0: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2015-01-14 04:29:05.359733 (xid=0x1a) [2015-01-14 04:30:31.364772] W [client-handshake.c:1602:client_dump_version_cbk] 0-rep-vol-client-0: received RPC status error [2015-01-14 04:30:31.364796] I [client.c:2215:client_rpc_notify] 0-rep-vol-client-0: disconnected from rep-vol-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2015-01-14 04:30:33.377415] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-0: changing port to 49152 (from 0) [2015-01-14 04:30:33.383467] E [socket.c:2169:socket_connect_finish] 0-rep-vol-client-0: connection to 192.168.231.2:49152 failed (Connection refused) [2015-01-14 04:30:37.384097] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-0: changing port to 49152 (from 0) [2015-01-14 04:30:37.390130] E [socket.c:2169:socket_connect_finish] 0-rep-vol-client-0: connection to 192.168.231.2:49152 failed (Connection refused) [2015-01-14 04:30:41.390781] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-0: changing port to 49152 (from 0) [2015-01-14 04:30:41.396221] E [socket.c:2169:socket_connect_finish] 0-rep-vol-client-0: connection to 192.168.231.2:49152 failed (Connection refused) [2015-01-14 04:31:43.467776] E [rpc-clnt.c:362:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15d) [0x7fc84c67ee6d] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91) [0x7fc84c67e8a1] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fc84c67e7ee]))) 0-rep-vol-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-01-14 04:17:42.261673 (xid=0x8) [2015-01-14 04:31:43.467813] W [client-rpc-fops.c:2761:client3_3_lookup_cbk] 0-rep-vol-client-1: remote operation failed: Transport endpoint is not connected. Path: / (00000000-0000-0000-0000-000000000001) [2015-01-14 04:31:43.473742] I [socket.c:3132:socket_submit_request] 0-rep-vol-client-1: not connected (priv->connected = 0) [2015-01-14 04:31:43.473779] W [rpc-clnt.c:1562:rpc_clnt_submit] 0-rep-vol-client-1: failed to submit rpc-request (XID: 0x1c Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (rep-vol-client-1) [2015-01-14 04:31:43.473812] W [client-rpc-fops.c:2761:client3_3_lookup_cbk] 0-rep-vol-client-1: remote operation failed: Transport endpoint is not connected. Path: / (00000000-0000-0000-0000-000000000001) [2015-01-14 04:31:43.473952] E [rpc-clnt.c:362:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15d) [0x7fc84c67ee6d] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91) [0x7fc84c67e8a1] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fc84c67e7ee]))) 0-rep-vol-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-01-14 04:31:01.419996 (xid=0x1b) [2015-01-14 04:31:43.473969] W [rpc-clnt-ping.c:154:rpc_clnt_ping_cbk] 0-rep-vol-client-1: socket disconnected [2015-01-14 04:31:43.473996] I [client.c:2215:client_rpc_notify] 0-rep-vol-client-1: disconnected from rep-vol-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2015-01-14 04:31:43.474033] E [MSGID: 108006] [afr-common.c:4283:afr_notify] 0-rep-vol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2015-01-14 04:31:43.474296] W [fuse-bridge.c:757:fuse_attr_cbk] 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected) [2015-01-14 04:31:43.474608] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-1: changing port to 49152 (from 0) [2015-01-14 04:31:43.487765] I [fuse-bridge.c:4883:fuse_thread_proc] 0-fuse: unmounting /mnt/replicated/ [2015-01-14 04:31:43.487955] W [glusterfsd.c:1182:cleanup_and_exit] (--> 0-: received signum (15), shutting down [2015-01-14 04:31:43.488005] I [fuse-bridge.c:5561:fini] 0-fuse: Unmounting '/mnt/replicated/'. ********************************************************************** bricks log [2015-01-07 10:34:25.903591] E [posix.c:3747:posix_getxattr] 0-rep-vol-posix: getxattr failed on /mnt/test12/brick01/file0: key = ^\trusted.afr.rep-vol-client-0^Ltrusted.gfid^\t rusted.afr.rep-vol-client-1 (Attribute not found) [2015-01-07 10:34:25.906693] E [posix.c:3747:posix_getxattr] 0-rep-vol-posix: getxattr failed on /mnt/test12/brick01/file0: key = ^\trusted.afr.rep-vol-client-0^Ltrusted.gfid^\t rusted.afr.rep-vol-client-1 (Attribute not found) [2015-01-07 10:34:25.909627] E [posix.c:3747:posix_getxattr] 0-rep-vol-posix: getxattr failed on /mnt/test12/brick01/file0: key = ^\trusted.afr.rep-vol-client-0^Ltrusted.gfid^\t rusted.afr.rep-vol-client-1 (Attribute not found) pending frames: frame : type(0) op(27) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2015-01-09 03:55:31 dlfcn 1 .............................................. Thanks
(In reply to Niels de Vos from comment #2) > You also may want to try disabling performance.readdir-ahead on the volume: > > # gluster volume set $VOLUME performance.readdir-ahead off > > The older clients do not have this functionality and mounting may get > prevented bacause of it. The logs should give some more hints about it too. I try to do it from your advice but still get error the same. Thanks
This seems to be an issue in the posix-xlator. For some reason the 3.4 client is able to crash the brick process. When posix_getxattr() gets called with an xattr that is not treated specially, the posix-xlator will retrieve a list with all xattrs the file has. It then goes through that list and fetches all xattrs. This is where things seem to go wrong. The name of the xattr that gets fetched is not the name of a existing xattr, but a string with the concatenated list of all xattrs. [posix.c:3747:posix_getxattr] 0-rep-vol-posix: getxattr failed on /mnt/test12/brick01/file0: key = ^\trusted.afr.rep-vol-client-0^Ltrusted.gfid^\t rusted.afr.rep-vol-client-1 (Attribute not found) From xlators/storage/posix/src/posix.c: size = sys_llistxattr (real_path, list, size); if (size < 0) { op_ret = -1; op_errno = errno; goto out; } remaining_size = size; list_offset = 0; while (remaining_size > 0) { strcpy (keybuffer, list + list_offset); size = sys_lgetxattr (real_path, keybuffer, NULL, 0); if (size == -1) { op_ret = -1; op_errno = errno; gf_log (this->name, GF_LOG_ERROR, "getxattr failed on " "%s: key = %s (%s)", real_path, keybuffer, strerror (op_errno)); break; } libglusterfs/src/syscall.c:sys_llistxattr() depends on the OS where the brick process is running. So I suspect that the problem is related to how FreeBSD returns the list of xattrs.
please checkout https://bugzilla.redhat.com/show_bug.cgi?id=1452961
*** This bug has been marked as a duplicate of bug 1452961 ***