Bug 1181500 - Brick process on FreeBSD crashes when mounting with a 3.4 Linux client
Summary: Brick process on FreeBSD crashes when mounting with a 3.4 Linux client
Keywords:
Status: CLOSED DUPLICATE of bug 1452961
Alias: None
Product: GlusterFS
Classification: Community
Component: posix
Version: mainline
Hardware: x86_64
OS: FreeBSD
unspecified
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-01-13 10:23 UTC by Kajornsak
Modified: 2018-11-19 08:16 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-19 08:16:01 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Kajornsak 2015-01-13 10:23:47 UTC
Description of problem:
     I create a replicated volume on server (GlusterFS version 3.6.1) and mount it to a Linux client 2 nodes but difference version (glusterfs-fuse 3.4.0.57rhs-1.el6_5 and glusterfs-fuse-3.6.0.29-2.el6).fuse-version 3.4.0.57rhs-1.el6_5 has no problem but fuse-version-3.6.0.29-2.el6 can't mount and make fuse 3.4.0.57rhs-1.el6_5 disconnect from server.Is this bug for fuse-3.6.0.29-2.el6 ? Thanks.

Version-Release number of selected component (if applicable):
     glusterfs-3.6.1                    (2 server) FreeBSD 9.2
     glusterfs-fuse-3.4.0.57rhs-1.el6_5 (client) Linux CentOS 6.5
     glusterfs-fuse-3.6.0.29-2.el6      (client) Linux CentOS 6.5

How reproducible:
1.Add peer with 2 FreeBSD nodes : OK
2.Create distributed volume : OK
3.Mount volume on the Linux client-fuse-3.4.0.57rhs-1.el6_5 : OK
4.Mount volume on the Linux client-fuse-3.6.0.29-2.el6 : ERROR
     (client fuse-3.4.0.57rhs-1.el6_5 disconnect from server)

Steps to Reproduce:
        192.168.231.1 (host 1 FreeBSD 9.2)
	192.168.231.2 (host 2 FreeBSD 9.2)
	192.168.231.3 (Client Linux 6.5) glusterfs-fuse-3.4.0.57rhs-1.el6_5
        192.168.231.4 (Client Linux 6.5) glusterfs-fuse-3.6.0.29-2.el6

(host 1)
        # gluster peer probe 192.168.231.2 	
	# gluster volume create rep-vol replica 2 192.168.231.1:/mnt/test12/brick02 192.168.231.2:/mnt/test12/brick02
	# gluster volume start rep-vol

(Client 1)
	# mount.glusterfs 192.168.231.1:/rep-vol /mnt/replicated (OK)

(Client 2)
	# mount.glusterfs 192.168.231.2:/rep-vol /mnt/replicated (ERROR)

Actual results:
client 2 (glusterfs-fuse-3.6.0.29-2.el6) can't mount glusterfs volume.

Expected results:
mount ok on two clients.

Additional info:
[2015-01-09 03:58:56.183047] E [socket.c:2169:socket_connect_finish] 0-rep-vol-client-0: connection to 192.168.231.1:49152 failed (Connection refused)
[2015-01-09 03:59:00.183443] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-0: changing port to 49152 (from 0)
[2015-01-09 03:59:00.189524] E [socket.c:2169:socket_connect_finish] 0-rep-vol-client-0: connection to 192.168.231.1:49152 failed (Connection refused)
[2015-01-09 03:59:04.187947] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-0: changing port to 49152 (from 0)
[2015-01-09 03:59:04.191556] E [socket.c:2169:socket_connect_finish] 0-rep-vol-client-0: connection to 192.168.231.1:49152 failed (Connection refused)

Comment 1 Niels de Vos 2015-01-13 12:34:17 UTC
This might be related to bug 1181588, although that is mainly about rebalance.

Could you attach the full logs from the client mounts, and at least one log from the bricks?

Also, as workaround, you should be able to use the current 3.6 RPMs from this repository:
 - http://download.gluster.org/pub/gluster/glusterfs/3.6/LATEST/CentOS/

Comment 2 Niels de Vos 2015-01-13 12:41:29 UTC
You also may want to try disabling performance.readdir-ahead on the volume:

    # gluster volume set $VOLUME performance.readdir-ahead off

The older clients do not have this functionality and mounting may get prevented bacause of it. The logs should give some more hints about it too.

Comment 3 Kajornsak 2015-01-14 04:56:55 UTC
(In reply to Niels de Vos from comment #1)
> This might be related to bug 1181588, although that is mainly about
> rebalance.
> 
> Could you attach the full logs from the client mounts, and at least one log
> from the bricks?
> 
> Also, as workaround, you should be able to use the current 3.6 RPMs from
> this repository:
>  - http://download.gluster.org/pub/gluster/glusterfs/3.6/LATEST/CentOS/

Client full logs

 1: volume rep-vol-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host 192.168.231.2
  5:     option remote-subvolume /mnt/test12/brick01
  6:     option transport-type socket
  7:     option send-gids true
  8: end-volume
  9:
 10: volume rep-vol-client-1
 11:     type protocol/client
 12:     option ping-timeout 42
 13:     option remote-host 192.168.231.1
 14:     option remote-subvolume /mnt/test12/brick01
 15:     option transport-type socket
 16:     option send-gids true
 17: end-volume
 18:
 19: volume rep-vol-replicate-0
 20:     type cluster/replicate
 21:     subvolumes rep-vol-client-0 rep-vol-client-1
 22: end-volume
 23:
 24: volume rep-vol-dht
 25:     type cluster/distribute
 26:     subvolumes rep-vol-replicate-0
 27: end-volume
 28:
 29: volume rep-vol-write-behind
 30:     type performance/write-behind
 31:     subvolumes rep-vol-dht
 32: end-volume
 33:
 34: volume rep-vol-read-ahead
 35:     type performance/read-ahead
 36:     subvolumes rep-vol-write-behind
 37: end-volume
 38:
 39: volume rep-vol-io-cache
 40:     type performance/io-cache
 41:     subvolumes rep-vol-read-ahead
 42: end-volume
 43:
 44: volume rep-vol-quick-read
 45:     type performance/quick-read
 46:     subvolumes rep-vol-io-cache
 47: end-volume
 48:
 49: volume rep-vol-open-behind
 50:     type performance/open-behind
 51:     subvolumes rep-vol-quick-read
 52: end-volume
 53:
 54: volume rep-vol-md-cache
 55:     type performance/md-cache
 56:     subvolumes rep-vol-open-behind
 57: end-volume
 58:
 59: volume rep-vol
 60:     type debug/io-stats
 61:     option latency-measurement off
 62:     option count-fop-hits off
 63:     subvolumes rep-vol-md-cache
 64: end-volume
 65:
 66: volume meta-autoload
 67:     type meta
 68:     subvolumes rep-vol
 69: end-volume

+------------------------------------------------------------------------------+
[2015-01-14 04:17:42.230293] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-1: changing port to 49152 (from 0)
[2015-01-14 04:17:42.230382] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-0: changing port to 49152 (from 0)
[2015-01-14 04:17:42.242279] I [client-handshake.c:1415:select_server_supported_programs] 0-rep-vol-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-01-14 04:17:42.242485] I [client-handshake.c:1415:select_server_supported_programs] 0-rep-vol-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-01-14 04:17:42.251988] I [client-handshake.c:1200:client_setvolume_cbk] 0-rep-vol-client-1: Connected to rep-vol-client-1, attached to remote volume '/mnt/test12/brick01'.
[2015-01-14 04:17:42.252033] I [client-handshake.c:1212:client_setvolume_cbk] 0-rep-vol-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2015-01-14 04:17:42.252204] I [MSGID: 108005] [afr-common.c:4245:afr_notify] 0-rep-vol-replicate-0: Subvolume 'rep-vol-client-1' came back up; going online.
[2015-01-14 04:17:42.252380] I [client-handshake.c:188:client_set_lk_version_cbk] 0-rep-vol-client-1: Server lk version = 1
[2015-01-14 04:17:42.254012] I [client-handshake.c:1200:client_setvolume_cbk] 0-rep-vol-client-0: Connected to rep-vol-client-0, attached to remote volume '/mnt/test12/brick01'.
[2015-01-14 04:17:42.254083] I [client-handshake.c:1212:client_setvolume_cbk] 0-rep-vol-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2015-01-14 04:17:42.260922] I [fuse-bridge.c:5042:fuse_graph_setup] 0-fuse: switched to graph 0
[2015-01-14 04:17:42.261065] I [client-handshake.c:188:client_set_lk_version_cbk] 0-rep-vol-client-0: Server lk version = 1
[2015-01-14 04:17:42.261265] I [fuse-bridge.c:3971:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.14
[2015-01-14 04:28:55.344774] C [rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-rep-vol-client-0: server 192.168.231.2:49152 has not responded in the last 42 seconds, disconnecting.
[2015-01-14 04:28:55.345449] E [rpc-clnt.c:362:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15d) [0x7fc84c67ee6d] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91) [0x7fc84c67e8a1] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fc84c67e7ee]))) 0-rep-vol-client-0: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-01-14 04:17:42.261641 (xid=0x8)
[2015-01-14 04:28:55.345505] W [client-rpc-fops.c:2761:client3_3_lookup_cbk] 0-rep-vol-client-0: remote operation failed: Transport endpoint is not connected. Path: / (00000000-0000-0000-0000-000000000001)
[2015-01-14 04:28:55.345589] E [rpc-clnt.c:362:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15d) [0x7fc84c67ee6d] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91) [0x7fc84c67e8a1] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fc84c67e7ee]))) 0-rep-vol-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-01-14 04:28:13.335734 (xid=0x17)
[2015-01-14 04:28:55.345607] W [rpc-clnt-ping.c:154:rpc_clnt_ping_cbk] 0-rep-vol-client-0: socket disconnected
[2015-01-14 04:28:55.345648] I [client.c:2215:client_rpc_notify] 0-rep-vol-client-0: disconnected from rep-vol-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2015-01-14 04:29:05.353569] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-0: changing port to 49152 (from 0)
[2015-01-14 04:30:31.364619] W [socket.c:529:__socket_rwv] 0-rep-vol-client-0: readv on 192.168.231.2:49152 failed (Connection reset by peer)
[2015-01-14 04:30:31.364752] E [rpc-clnt.c:362:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15d) [0x7fc84c67ee6d] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91) [0x7fc84c67e8a1] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fc84c67e7ee]))) 0-rep-vol-client-0: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2015-01-14 04:29:05.359733 (xid=0x1a)
[2015-01-14 04:30:31.364772] W [client-handshake.c:1602:client_dump_version_cbk] 0-rep-vol-client-0: received RPC status error
[2015-01-14 04:30:31.364796] I [client.c:2215:client_rpc_notify] 0-rep-vol-client-0: disconnected from rep-vol-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2015-01-14 04:30:33.377415] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-0: changing port to 49152 (from 0)
[2015-01-14 04:30:33.383467] E [socket.c:2169:socket_connect_finish] 0-rep-vol-client-0: connection to 192.168.231.2:49152 failed (Connection refused)
[2015-01-14 04:30:37.384097] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-0: changing port to 49152 (from 0)
[2015-01-14 04:30:37.390130] E [socket.c:2169:socket_connect_finish] 0-rep-vol-client-0: connection to 192.168.231.2:49152 failed (Connection refused)
[2015-01-14 04:30:41.390781] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-0: changing port to 49152 (from 0)
[2015-01-14 04:30:41.396221] E [socket.c:2169:socket_connect_finish] 0-rep-vol-client-0: connection to 192.168.231.2:49152 failed (Connection refused)
[2015-01-14 04:31:43.467776] E [rpc-clnt.c:362:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15d) [0x7fc84c67ee6d] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91) [0x7fc84c67e8a1] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fc84c67e7ee]))) 0-rep-vol-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-01-14 04:17:42.261673 (xid=0x8)
[2015-01-14 04:31:43.467813] W [client-rpc-fops.c:2761:client3_3_lookup_cbk] 0-rep-vol-client-1: remote operation failed: Transport endpoint is not connected. Path: / (00000000-0000-0000-0000-000000000001)
[2015-01-14 04:31:43.473742] I [socket.c:3132:socket_submit_request] 0-rep-vol-client-1: not connected (priv->connected = 0)
[2015-01-14 04:31:43.473779] W [rpc-clnt.c:1562:rpc_clnt_submit] 0-rep-vol-client-1: failed to submit rpc-request (XID: 0x1c Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (rep-vol-client-1)
[2015-01-14 04:31:43.473812] W [client-rpc-fops.c:2761:client3_3_lookup_cbk] 0-rep-vol-client-1: remote operation failed: Transport endpoint is not connected. Path: / (00000000-0000-0000-0000-000000000001)
[2015-01-14 04:31:43.473952] E [rpc-clnt.c:362:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15d) [0x7fc84c67ee6d] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91) [0x7fc84c67e8a1] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fc84c67e7ee]))) 0-rep-vol-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-01-14 04:31:01.419996 (xid=0x1b)
[2015-01-14 04:31:43.473969] W [rpc-clnt-ping.c:154:rpc_clnt_ping_cbk] 0-rep-vol-client-1: socket disconnected
[2015-01-14 04:31:43.473996] I [client.c:2215:client_rpc_notify] 0-rep-vol-client-1: disconnected from rep-vol-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2015-01-14 04:31:43.474033] E [MSGID: 108006] [afr-common.c:4283:afr_notify] 0-rep-vol-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2015-01-14 04:31:43.474296] W [fuse-bridge.c:757:fuse_attr_cbk] 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected)
[2015-01-14 04:31:43.474608] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-rep-vol-client-1: changing port to 49152 (from 0)
[2015-01-14 04:31:43.487765] I [fuse-bridge.c:4883:fuse_thread_proc] 0-fuse: unmounting /mnt/replicated/
[2015-01-14 04:31:43.487955] W [glusterfsd.c:1182:cleanup_and_exit] (--> 0-: received signum (15), shutting down
[2015-01-14 04:31:43.488005] I [fuse-bridge.c:5561:fini] 0-fuse: Unmounting '/mnt/replicated/'.

**********************************************************************

bricks log

[2015-01-07 10:34:25.903591] E [posix.c:3747:posix_getxattr] 0-rep-vol-posix: getxattr failed on /mnt/test12/brick01/file0: key = ^\trusted.afr.rep-vol-client-0^Ltrusted.gfid^\t
rusted.afr.rep-vol-client-1 (Attribute not found)
[2015-01-07 10:34:25.906693] E [posix.c:3747:posix_getxattr] 0-rep-vol-posix: getxattr failed on /mnt/test12/brick01/file0: key = ^\trusted.afr.rep-vol-client-0^Ltrusted.gfid^\t
rusted.afr.rep-vol-client-1 (Attribute not found)
[2015-01-07 10:34:25.909627] E [posix.c:3747:posix_getxattr] 0-rep-vol-posix: getxattr failed on /mnt/test12/brick01/file0: key = ^\trusted.afr.rep-vol-client-0^Ltrusted.gfid^\t
rusted.afr.rep-vol-client-1 (Attribute not found)
pending frames:
frame : type(0) op(27)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2015-01-09 03:55:31
dlfcn 1

..............................................
Thanks

Comment 4 Kajornsak 2015-01-14 05:01:50 UTC
(In reply to Niels de Vos from comment #2)
> You also may want to try disabling performance.readdir-ahead on the volume:
> 
>     # gluster volume set $VOLUME performance.readdir-ahead off
> 
> The older clients do not have this functionality and mounting may get
> prevented bacause of it. The logs should give some more hints about it too.

I try to do it from your advice but still get error the same. Thanks

Comment 5 Niels de Vos 2015-01-14 09:29:41 UTC
This seems to be an issue in the posix-xlator. For some reason the 3.4 client is able to crash the brick process.

When posix_getxattr() gets called with an xattr that is not treated specially, the posix-xlator will retrieve a list with all xattrs the file has. It then goes through that list and fetches all xattrs. This is where things seem to go wrong. The name of the xattr that gets fetched is not the name of a existing xattr, but a string with the concatenated list of all xattrs.

[posix.c:3747:posix_getxattr] 0-rep-vol-posix: getxattr failed on /mnt/test12/brick01/file0: key = ^\trusted.afr.rep-vol-client-0^Ltrusted.gfid^\t
rusted.afr.rep-vol-client-1 (Attribute not found)

From xlators/storage/posix/src/posix.c:

        size = sys_llistxattr (real_path, list, size);
        if (size < 0) {
                op_ret = -1;
                op_errno = errno;
                goto out;
        }

        remaining_size = size;
        list_offset = 0;
        while (remaining_size > 0) {
                strcpy (keybuffer, list + list_offset);
                size = sys_lgetxattr (real_path, keybuffer, NULL, 0);
                if (size == -1) {
                        op_ret = -1;
                        op_errno = errno;
                        gf_log (this->name, GF_LOG_ERROR, "getxattr failed on "
                                "%s: key = %s (%s)", real_path, keybuffer,
                                strerror (op_errno));
                        break;
                }


libglusterfs/src/syscall.c:sys_llistxattr() depends on the OS where the brick process is running. So I suspect that the problem is related to how FreeBSD returns the list of xattrs.

Comment 6 Iblis Lin 2017-07-13 02:45:38 UTC
please checkout https://bugzilla.redhat.com/show_bug.cgi?id=1452961

Comment 7 Vijay Bellur 2018-11-19 08:16:01 UTC

*** This bug has been marked as a duplicate of bug 1452961 ***


Note You need to log in before you can comment on or make changes to this bug.