Description of problem: If a single brick of distributed-replicate volume goes down, the geo-rep session becomes faulty. The geo-rep session should go faulty only if whole sub-volume is down. Version-Release number of selected component (if applicable):3.3.0.5rhs-38.el6rhs How reproducible: Consistently Steps to Reproduce: 1.Start a geo-rep sesssion between master(dist-repl) and slave(any) 2.Kill any of the brick-process 3.Check the geo-rep status Actual results: Geo-rep status goes to faulty Expected results: With only one brick down , geo-rep shouldn't go faulty. Additional info: Log-snippet from geo-rep log file #################################################################### [2012-11-16 12:55:44.462631] I [master:683:crawl] _GMaster: primary master with volume id b79a88bf-53cc-4852-8f00-f81c66ac0e43 ... [2012-11-16 12:55:44.463169] D [master:698:crawl] _GMaster: entering . [2012-11-16 12:55:44.464071] E [syncdutils:184:log_raise_exception] <top>: glusterfs session went down [ENOTCONN] [2012-11-16 12:55:44.464179] E [syncdutils:190:log_raise_exception] <top>: FULL EXCEPTION TRACE: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 120, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 400, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 874, in service_loop gmaster_builder()(self, args[0]).crawl_loop() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 540, in crawl_loop self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 700, in crawl xtl = self.xtime(path) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 376, in xtime return self.xtime_low(rsc.server, path, **opts) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 110, in xtime_low xt = server.xtime(path, self.uuid) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 270, in ff return f(*a) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 365, in xtime return struct.unpack('!II', Xattr.lgetxattr(path, '.'.join([cls.GX_NSPACE, uuid, 'xtime']), 8)) File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 43, in lgetxattr return cls._query_xattr( path, siz, 'lgetxattr', attr) File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 35, in _query_xattr cls.raise_oserr() File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 25, in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 107] Transport endpoint is not connected [2012-11-16 12:55:44.465147] I [syncdutils:148:finalize] <top>: exiting. [2012-11-16 12:55:45.423644] D [monitor(monitor):100:monitor] Monitor: worker died in startup phase ##################################################################### Log snippet from geo-rep gluster-log-file #################################################################### [2012-11-16 12:57:08.495768] D [client-handshake.c:184:client_start_ping] 0-master-client-1: returning as transport is already disconnected OR there are no frames (1 || 1) [2012-11-16 12:57:08.495788] D [client.c:2043:client_rpc_notify] 0-master-client-2: got RPC_CLNT_CONNECT [2012-11-16 12:57:08.495819] D [client-handshake.c:184:client_start_ping] 0-master-client-2: returning as transport is already disconnected OR there are no frames (1 || 1) [2012-11-16 12:57:08.495839] D [client.c:2043:client_rpc_notify] 0-master-client-3: got RPC_CLNT_CONNECT [2012-11-16 12:57:08.495871] D [client-handshake.c:184:client_start_ping] 0-master-client-3: returning as transport is already disconnected OR there are no frames (1 || 1) [2012-11-16 12:57:08.495903] D [client-handshake.c:1670:server_has_portmap] 0-master-client-0: detected portmapper on server [2012-11-16 12:57:08.495937] D [client-handshake.c:184:client_start_ping] 0-master-client-0: returning as transport is already disconnected OR there are no frames (1 || 1) [2012-11-16 12:57:08.495960] D [client-handshake.c:1670:server_has_portmap] 0-master-client-1: detected portmapper on server [2012-11-16 12:57:08.495986] D [client-handshake.c:184:client_start_ping] 0-master-client-1: returning as transport is already disconnected OR there are no frames (1 || 1) [2012-11-16 12:57:08.496007] D [client-handshake.c:1670:server_has_portmap] 0-master-client-2: detected portmapper on server [2012-11-16 12:57:08.496031] D [client-handshake.c:184:client_start_ping] 0-master-client-2: returning as transport is already disconnected OR there are no frames (1 || 1) [2012-11-16 12:57:08.496052] D [client-handshake.c:1670:server_has_portmap] 0-master-client-3: detected portmapper on server [2012-11-16 12:57:08.496076] D [client-handshake.c:184:client_start_ping] 0-master-client-3: returning as transport is already disconnected OR there are no frames (1 || 1) [2012-11-16 12:57:08.496100] I [rpc-clnt.c:1659:rpc_clnt_reconfig] 0-master-client-0: changing port to 24009 (from 0) [2012-11-16 12:57:08.496131] I [rpc-clnt.c:1659:rpc_clnt_reconfig] 0-master-client-1: changing port to 24010 (from 0) [2012-11-16 12:57:08.496161] E [client-handshake.c:1717:client_query_portmap_cbk] 0-master-client-2: failed to get the port number for remote subvolume [2012-11-16 12:57:08.496190] I [rpc-clnt.c:1659:rpc_clnt_reconfig] 0-master-client-3: changing port to 24012 (from 0) [2012-11-16 12:57:08.496216] D [socket.c:184:__socket_rwv] 0-master-client-0: EOF from peer 10.70.34.56:24007 [2012-11-16 12:57:08.496234] D [socket.c:1512:__socket_proto_state_machine] 0-master-client-0: reading from socket failed. Error (Transport endpoint is not connected), peer (10.70.34.56:24007) [2012-11-16 12:57:08.496244] D [socket.c:1798:socket_event_handler] 0-transport: disconnecting now [2012-11-16 12:57:08.496260] D [client.c:2108:client_rpc_notify] 0-master-client-0: disconnected (skipped notify) [2012-11-16 12:57:08.496272] D [socket.c:184:__socket_rwv] 0-master-client-1: EOF from peer 10.70.34.56:24007 [2012-11-16 12:57:08.496280] D [socket.c:1512:__socket_proto_state_machine] 0-master-client-1: reading from socket failed. Error (Transport endpoint is not connected), peer (10.70.34.56:24007) [2012-11-16 12:57:08.496293] D [socket.c:1798:socket_event_handler] 0-transport: disconnecting now [2012-11-16 12:57:08.496306] D [client.c:2108:client_rpc_notify] 0-master-client-1: disconnected (skipped notify) [2012-11-16 12:57:08.496317] D [socket.c:184:__socket_rwv] 0-master-client-2: EOF from peer 10.70.34.56:24007 [2012-11-16 12:57:08.496326] D [socket.c:1512:__socket_proto_state_machine] 0-master-client-2: reading from socket failed. Error (Transport endpoint is not connected), peer (10.70.34.56:24007) [2012-11-16 12:57:08.496334] D [socket.c:1798:socket_event_handler] 0-transport: disconnecting now [2012-11-16 12:57:08.496345] I [client.c:2090:client_rpc_notify] 0-master-client-2: disconnected [2012-11-16 12:57:08.496357] D [socket.c:184:__socket_rwv] 0-master-client-3: EOF from peer 10.70.34.56:24007 [2012-11-16 12:57:08.496365] D [socket.c:1512:__socket_proto_state_machine] 0-master-client-3: reading from socket failed. Error (Transport endpoint is not connected), peer (10.70.34.56:24007) [2012-11-16 12:57:08.496373] D [socket.c:1798:socket_event_handler] 0-transport: disconnecting now [2012-11-16 12:57:08.496384] D [client.c:2108:client_rpc_notify] 0-master-client-3: disconnected (skipped notify) [2012-11-16 12:57:12.480479] D [name.c:149:client_fill_address_family] 0-master-client-0: address-family not specified, guessing it to be inet/inet6 [2012-11-16 12:57:12.482834] D [common-utils.c:151:gf_resolve_ip6] 0-resolver: returning ip-10.70.34.56 (port-24007) for hostname: 10.70.34.56 and port: 24007 [2012-11-16 12:57:12.482931] D [name.c:149:client_fill_address_family] 0-master-client-1: address-family not specified, guessing it to be inet/inet6 [2012-11-16 12:57:12.483022] D [client.c:2043:client_rpc_notify] 0-master-client-0: got RPC_CLNT_CONNECT [2012-11-16 12:57:12.483132] D [client-handshake.c:184:client_start_ping] 0-master-client-0: returning as transport is already disconnected OR there are no frames (1 || 1) [2012-11-16 12:57:12.483287] I [client-handshake.c:1636:select_server_supported_programs] 0-master-client-0: Using Program GlusterFS 3.3.0.5rhs, Num (1298437), Version (330) [2012-11-16 12:57:12.483382] D [client-handshake.c:184:client_start_ping] 0-master-client-0: returning as transport is already disconnected OR there are no frames (1 || 1) [2012-11-16 12:57:12.483677] D [client-handshake.c:1407:client_setvolume_cbk] 0-master-client-0: clnt-lk-version = 1, server-lk-version = 0 [2012-11-16 12:57:12.483712] I [client-handshake.c:1433:client_setvolume_cbk] 0-master-client-0: Connected to 10.70.34.56:24009, attached to remote volume '/exportdir/brick1'. [2012-11-16 12:57:12.483735] I [client-handshake.c:1445:client_setvolume_cbk] 0-master-client-0: Server and Client lk-version numbers are not same, reopening the fds [2012-11-16 12:57:12.483751] D [client-handshake.c:1295:client_post_handshake] 0-master-client-0: No fds to open - notifying all parents child up [2012-11-16 12:57:12.483767] D [client-handshake.c:489:client_set_lk_version] 0-master-client-0: Sending SET_LK_VERSION [2012-11-16 12:57:12.483812] I [afr-common.c:3628:afr_notify] 0-master-replicate-0: Subvolume 'master-client-0' came back up; going online. [2012-11-16 12:57:12.483967] I [client-handshake.c:453:client_set_lk_version_cbk] 0-master-client-0: Server lk version = 1 [2012-11-16 12:57:12.485182] D [common-utils.c:151:gf_resolve_ip6] 0-resolver: returning ip-10.70.34.56 (port-24007) for hostname: 10.70.34.56 and port: 24007 [2012-11-16 12:57:12.485264] D [name.c:149:client_fill_address_family] 0-master-client-2: address-family not specified, guessing it to be inet/inet6 [2012-11-16 12:57:12.485348] D [client.c:2043:client_rpc_notify] 0-master-client-1: got RPC_CLNT_CONNECT [2012-11-16 12:57:12.485434] D [client-handshake.c:184:client_start_ping] 0-master-client-1: returning as transport is already disconnected OR there are no frames (1 || 1) [2012-11-16 12:57:12.485588] I [client-handshake.c:1636:select_server_supported_programs] 0-master-client-1: Using Program GlusterFS 3.3.0.5rhs, Num (1298437), Version (330) [2012-11-16 12:57:12.485684] D [client-handshake.c:184:client_start_ping] 0-master-client-1: returning as transport is already disconnected OR there are no frames (1 || 1) [2012-11-16 12:57:12.485974] D [client-handshake.c:1407:client_setvolume_cbk] 0-master-client-1: clnt-lk-version = 1, server-lk-version = 0 [2012-11-16 12:57:12.486009] I [client-handshake.c:1433:client_setvolume_cbk] 0-master-client-1: Connected to 10.70.34.56:24010, attached to remote volume '/exportdir/brick2'. [2012-11-16 12:57:12.486032] I [client-handshake.c:1445:client_setvolume_cbk] 0-master-client-1: Server and Client lk-version numbers are not same, reopening the fds [2012-11-16 12:57:12.486048] D [client-handshake.c:1295:client_post_handshake] 0-master-client-1: No fds to open - notifying all parents child up [2012-11-16 12:57:12.486068] D [client-handshake.c:489:client_set_lk_version] 0-master-client-1: Sending SET_LK_VERSION [2012-11-16 12:57:12.486211] I [client-handshake.c:453:client_set_lk_version_cbk] 0-master-client-1: Server lk version = 1 [2012-11-16 12:57:12.486398] D [dht-diskusage.c:80:dht_du_info_cbk] 0-master-dht: on subvolume 'master-replicate-0': avail_percent is: 99.00 and avail_space is: 409665339392 and avail_inodes is: 99.00 [2012-11-16 12:57:12.487536] D [common-utils.c:151:gf_resolve_ip6] 0-resolver: returning ip-10.70.34.56 (port-24007) for hostname: 10.70.34.56 and port: 24007 [2012-11-16 12:57:12.487621] D [name.c:149:client_fill_address_family] 0-master-client-3: address-family not specified, guessing it to be inet/inet6 [2012-11-16 12:57:12.487704] D [client.c:2043:client_rpc_notify] 0-master-client-2: got RPC_CLNT_CONNECT [2012-11-16 12:57:12.487789] D [client-handshake.c:184:client_start_ping] 0-master-client-2: returning as transport is already disconnected OR there are no frames (1 || 1) [2012-11-16 12:57:12.487938] D [client-handshake.c:1670:server_has_portmap] 0-master-client-2: detected portmapper on server [2012-11-16 12:57:12.488011] D [client-handshake.c:184:client_start_ping] 0-master-client-2: returning as transport is already disconnected OR there are no frames (1 || 1) [2012-11-16 12:57:12.488184] D [client-handshake.c:1717:client_query_portmap_cbk] 0-master-client-2: failed to get the port number for remote subvolume [2012-11-16 12:57:12.488255] D [socket.c:184:__socket_rwv] 0-master-client-2: EOF from peer 10.70.34.56:24007 [2012-11-16 12:57:12.488283] D [socket.c:1512:__socket_proto_state_machine] 0-master-client-2: reading from socket failed. Error (Transport endpoint is not connected), peer (10.70.34.56:24007) [2012-11-16 12:57:12.488300] D [socket.c:1798:socket_event_handler] 0-transport: disconnecting now [2012-11-16 12:57:12.488335] I [client.c:2090:client_rpc_notify] 0-master-client-2: disconnected [2012-11-16 12:57:12.489878] D [common-utils.c:151:gf_resolve_ip6] 0-resolver: returning ip-10.70.34.56 (port-24007) for hostname: 10.70.34.56 and port: 24007 [2012-11-16 12:57:12.490003] D [client.c:2043:client_rpc_notify] 0-master-client-3: got RPC_CLNT_CONNECT [2012-11-16 12:57:12.490092] D [client-handshake.c:184:client_start_ping] 0-master-client-3: returning as transport is already disconnected OR there are no frames (1 || 1) [2012-11-16 12:57:12.490217] I [client-handshake.c:1636:select_server_supported_programs] 0-master-client-3: Using Program GlusterFS 3.3.0.5rhs, Num (1298437), Version (330) [2012-11-16 12:57:12.490294] D [client-handshake.c:184:client_start_ping] 0-master-client-3: returning as transport is already disconnected OR there are no frames (1 || 1) [2012-11-16 12:57:12.490571] D [client-handshake.c:1407:client_setvolume_cbk] 0-master-client-3: clnt-lk-version = 1, server-lk-version = 0 [2012-11-16 12:57:12.490601] I [client-handshake.c:1433:client_setvolume_cbk] 0-master-client-3: Connected to 10.70.34.56:24012, attached to remote volume '/exportdir/brick4'. [2012-11-16 12:57:12.490620] I [client-handshake.c:1445:client_setvolume_cbk] 0-master-client-3: Server and Client lk-version numbers are not same, reopening the fds [2012-11-16 12:57:12.490645] D [client-handshake.c:1295:client_post_handshake] 0-master-client-3: No fds to open - notifying all parents child up [2012-11-16 12:57:12.490656] D [client-handshake.c:489:client_set_lk_version] 0-master-client-3: Sending SET_LK_VERSION [2012-11-16 12:57:12.490699] I [afr-common.c:3628:afr_notify] 0-master-replicate-1: Subvolume 'master-client-3' came back up; going online. [2012-11-16 12:57:12.490738] D [fuse-bridge.c:4240:notify] 0-fuse: got event 5 on graph 0 [2012-11-16 12:57:12.493491] I [fuse-bridge.c:4222:fuse_graph_setup] 0-fuse: switched to graph 0 [2012-11-16 12:57:12.493603] I [client-handshake.c:453:client_set_lk_version_cbk] 0-master-client-3: Server lk version = 1 [2012-11-16 12:57:12.493655] D [dht-diskusage.c:80:dht_du_info_cbk] 0-master-dht: on subvolume 'master-replicate-1': avail_percent is: 99.00 and avail_space is: 409665339392 and avail_inodes is: 99.00 [2012-11-16 12:57:12.493665] D [fuse-bridge.c:3917:fuse_get_mount_status] 0-fuse: mount status is 0 [2012-11-16 12:57:12.493783] I [fuse-bridge.c:3405:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13 [2012-11-16 12:57:12.494319] I [afr-common.c:1965:afr_set_root_inode_on_first_lookup] 0-master-replicate-0: added root inode [2012-11-16 12:57:12.494392] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-0: pending_matrix: [ 0 0 ] [2012-11-16 12:57:12.494405] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-0: pending_matrix: [ 0 0 ] [2012-11-16 12:57:12.494414] D [afr-self-heal-common.c:829:afr_mark_sources] 0-master-replicate-0: Number of sources: 0 [2012-11-16 12:57:12.494422] D [afr-self-heal-data.c:861:afr_lookup_select_read_child_by_txn_type] 0-master-replicate-0: returning read_child: 1 [2012-11-16 12:57:12.494430] D [afr-common.c:1294:afr_lookup_select_read_child] 0-master-replicate-0: Source selected as 1 for / [2012-11-16 12:57:12.494442] D [afr-common.c:1097:afr_lookup_build_response_params] 0-master-replicate-0: Building lookup response from 1 [2012-11-16 12:57:12.494967] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-0: pending_matrix: [ 0 0 ] [2012-11-16 12:57:12.494999] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-0: pending_matrix: [ 0 0 ] [2012-11-16 12:57:12.495022] D [afr-self-heal-common.c:829:afr_mark_sources] 0-master-replicate-0: Number of sources: 0 [2012-11-16 12:57:12.495040] D [afr-self-heal-data.c:861:afr_lookup_select_read_child_by_txn_type] 0-master-replicate-0: returning read_child: 1 [2012-11-16 12:57:12.495049] D [afr-common.c:1294:afr_lookup_select_read_child] 0-master-replicate-0: Source selected as 1 for / [2012-11-16 12:57:12.495057] D [afr-common.c:1097:afr_lookup_build_response_params] 0-master-replicate-0: Building lookup response from 1 [2012-11-16 12:57:12.495091] I [afr-common.c:1965:afr_set_root_inode_on_first_lookup] 0-master-replicate-1: added root inode [2012-11-16 12:57:12.495106] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-1: pending_matrix: [ 0 0 ] [2012-11-16 12:57:12.495115] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-1: pending_matrix: [ 0 0 ] [2012-11-16 12:57:12.495123] D [afr-self-heal-common.c:829:afr_mark_sources] 0-master-replicate-1: Number of sources: 0 [2012-11-16 12:57:12.495131] D [afr-self-heal-data.c:861:afr_lookup_select_read_child_by_txn_type] 0-master-replicate-1: returning read_child: 1 [2012-11-16 12:57:12.495139] D [afr-common.c:1294:afr_lookup_select_read_child] 0-master-replicate-1: Source selected as 1 for / [2012-11-16 12:57:12.495148] D [afr-common.c:1097:afr_lookup_build_response_params] 0-master-replicate-1: Building lookup response from 1 [2012-11-16 12:57:12.495156] D [afr-common.c:1636:afr_lookup_perform_self_heal] 0-master-replicate-1: Only 1 child up - do not attempt to detect self heal [2012-11-16 12:57:14.494848] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-1: pending_matrix: [ 0 0 ] [2012-11-16 12:57:14.494881] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-1: pending_matrix: [ 0 0 ] [2012-11-16 12:57:14.494891] D [afr-self-heal-common.c:829:afr_mark_sources] 0-master-replicate-1: Number of sources: 0 [2012-11-16 12:57:14.494900] D [afr-self-heal-data.c:861:afr_lookup_select_read_child_by_txn_type] 0-master-replicate-1: returning read_child: 1 [2012-11-16 12:57:14.494908] D [afr-common.c:1294:afr_lookup_select_read_child] 0-master-replicate-1: Source selected as 1 for / [2012-11-16 12:57:14.494918] D [afr-common.c:1097:afr_lookup_build_response_params] 0-master-replicate-1: Building lookup response from 1 [2012-11-16 12:57:14.494927] D [afr-common.c:1636:afr_lookup_perform_self_heal] 0-master-replicate-1: Only 1 child up - do not attempt to detect self heal [2012-11-16 12:57:14.494977] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-0: pending_matrix: [ 0 0 ] [2012-11-16 12:57:14.494989] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-0: pending_matrix: [ 0 0 ] [2012-11-16 12:57:14.494998] D [afr-self-heal-common.c:829:afr_mark_sources] 0-master-replicate-0: Number of sources: 0 [2012-11-16 12:57:14.495006] D [afr-self-heal-data.c:861:afr_lookup_select_read_child_by_txn_type] 0-master-replicate-0: returning read_child: 1 [2012-11-16 12:57:14.495013] D [afr-common.c:1294:afr_lookup_select_read_child] 0-master-replicate-0: Source selected as 1 for / [2012-11-16 12:57:14.495022] D [afr-common.c:1097:afr_lookup_build_response_params] 0-master-replicate-0: Building lookup response from 1 [2012-11-16 12:57:14.525784] D [afr-common.c:704:afr_get_call_child] 0-master-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-11-16 12:57:14.525840] D [afr-common.c:704:afr_get_call_child] 0-master-replicate-1: Returning 0, call_child: 1, last_index: -1 [2012-11-16 12:57:14.526549] D [afr-common.c:704:afr_get_call_child] 0-master-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-11-16 12:57:14.526594] D [afr-common.c:704:afr_get_call_child] 0-master-replicate-1: Returning 0, call_child: 1, last_index: -1 [2012-11-16 12:57:14.527123] D [fuse-helpers.c:484:fuse_flip_xattr_ns] 0-glusterfs-fuse: PID: -1, checking xattr(s): volume-mark*, *xtime [2012-11-16 12:57:14.529547] D [fuse-helpers.c:484:fuse_flip_xattr_ns] 0-glusterfs-fuse: PID: -1, checking xattr(s): volume-mark*, *xtime [2012-11-16 12:57:14.530003] W [fuse-bridge.c:2841:fuse_xattr_cbk] 0-glusterfs-fuse: 8: GETXATTR(trusted.glusterfs.b79a88bf-53cc-4852-8f00-f81c66ac0e43.xtime) / => -1 (Transport endpoint is not connected) [2012-11-16 12:57:14.537909] D [fuse-bridge.c:4028:fuse_thread_proc] 0-glusterfs-fuse: terminating upon getting ENODEV when reading /dev/fuse [2012-11-16 12:57:14.537965] I [fuse-bridge.c:4122:fuse_thread_proc] 0-fuse: unmounting /tmp/gsyncd-aux-mount-kEaq3u [2012-11-16 12:57:14.538293] W [glusterfsd.c:831:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x390d4e5ccd] (-->/lib64/libpthread.so.0() [0x390dc077f1] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x405cfd]))) 0-: received signum (15), shutting down [2012-11-16 12:57:14.538317] D [glusterfsd-mgmt.c:2157:glusterfs_mgmt_pmap_signout] 0-fsd-mgmt: portmapper signout arguments not given [2012-11-16 12:57:14.538328] I [fuse-bridge.c:4672:fini] 0-fuse: Unmounting '/tmp/gsyncd-aux-mount-kEaq3u'.
Should it not be a feature-bug? In geo-rep context, how can we differentiate between the cases when data and when only redundancy is lost due to the brick down? If we can cut back safely on overly cautiousness, that's good, but if we can't, that does not seem to me to be a problem. Is there any spec or actual feature request that the current situation does not comply to?
(In reply to comment #2) > Should it not be a feature-bug? In geo-rep context, how can we differentiate > between the cases when data and when only redundancy is lost due to the > brick down? Is the failure due to xtime aggregation? > > If we can cut back safely on overly cautiousness, that's good, but if we > can't, that does not seem to me to be a problem. Is there any spec or actual > feature request that the current situation does not comply to? In general, this violates the high availability that gluster provides.
(In reply to comment #3) > Is the failure due to xtime aggregation? I think assert-no-child-down. > In general, this violates the high availability that gluster provides. OK, why I said this should be an enhancement bug, is that I don't see an easy way to fix it, and the behavior is in accordance with what we aimed as of the current implementation. Do you have any idea how to attack this?
(In reply to comment #4) > (In reply to comment #3) > > Is the failure due to xtime aggregation? > > I think assert-no-child-down. assert-no-child-down should take effect after all children of distribute node are down. We will need to investigate why assert-no-child-down kicked in when only one of the bricks of a volume with replica count 2 went down.
Sorry for spreading confusion, I implied to assert-no-child-down because errno is ENOTCONN, and that usually means the gluster client is terminated, and brick-down + client termination had assert-no-child-down smell. A superficial chain of thought... Indeed, looking into the gluster log (which I missed), it seems to be aggregation. The aggregation logic should be then refined. Maybe it's easy? :)
> Indeed, looking into the gluster log (which I missed), it seems to be > aggregation. The aggregation logic should be then refined. Maybe it's easy? > :) Yeah, maybe an additional flag in local to determine the least number of children on which this operation needs to succeed?
(In reply to comment #7) > > > Indeed, looking into the gluster log (which I missed), it seems to be > > aggregation. The aggregation logic should be then refined. Maybe it's easy? > > :) > > Yeah, maybe an additional flag in local to determine the least number of > children on which this operation needs to succeed? How can you narrow it down to a numeric measure? It's the topology that matters AFAIK... How exactly does DHT manage assert-child-no-down, ie. on what circumstances does it trigger the assertion? Maybe we could use the same logic for aggregation.
(In reply to comment #8) > (In reply to comment #7) > > > > > Indeed, looking into the gluster log (which I missed), it seems to be > > > aggregation. The aggregation logic should be then refined. Maybe it's easy? > > > :) > > > > Yeah, maybe an additional flag in local to determine the least number of > > children on which this operation needs to succeed? > > How can you narrow it down to a numeric measure? It's the topology that > matters AFAIK... Since you intend to keep the aggregation logic generic, you can use a numeric measure to determine the topology. For afr, you need at least one reply to succeed. For dht and stripe, you need replies from all STACK_WINDs to succeed. > > How exactly does DHT manage assert-child-no-down, ie. on what circumstances > does it trigger the assertion? Maybe we could use the same logic for > aggregation. DHT manages assert-child-no-down by listening to CHILD_DOWN notification.
looks like the client getting terminated is not due to dht getting CHILD_DOWN when one brick is taken down. Did a small experiment with Vijaykumar: 2x2 distributed-replicate volume, killed a brick, and getfattr for trusted.glusterfs.<volume-id>.xtime from a mount point (with client-pid = -1) and got the following (pasted from IRC) 16:15 <vijaykumar> [root@rhs01 client-1]# getfattr -e hex -n trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime f* 16:15 <vijaykumar> f0: trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime: Transport endpoint is not connected 16:15 <vijaykumar> f1: trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime: Transport endpoint is not connected 16:15 <vijaykumar> f2: trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime: Transport endpoint is not connected 16:15 <vijaykumar> f3: trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime: Transport endpoint is not connected 16:15 <vijaykumar> # file: f4 16:15 <vijaykumar> trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime=0x512c920800004bf3 16:15 <vijaykumar> f5: trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime: Transport endpoint is not connected 16:15 <vijaykumar> # file: f6 16:15 <vijaykumar> trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime=0x512c92080000543a 16:15 <vijaykumar> # file: f7 16:15 <vijaykumar> trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime=0x512c920800005942 16:15 <vijaykumar> # file: f8 16:15 <vijaykumar> trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime=0x512c920800005d74 16:15 <vijaykumar> # file: f9 16:15 <vijaykumar> trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime=0x512c920800006248 for files which hash to the subvol (where a brick was down) getfattr returns "Transport endpoint is not connected" (which can be seen in the client logs as per comment #1). There should be at least an xtime that is given back to the client. Further, as per comment #1 there is an termination of the client process (but this does not happen in our test).
*** This bug has been marked as a duplicate of bug 959069 ***
with the newer geo-replication implementation, this is taken care.
With newer geo-rep in place, this scenario is now obsolete. In the current scenario if a single brick goes down: * If the setup is replicate - then other node takes care. * If the setup is distribute - that particular gsync session goes faulty. This is the expected behavior.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html