Bug 877293

Summary:	A single brick down of a dist-rep volume results in geo-rep session "faulty"
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Vijaykumar Koppad <vkoppad>
Component:	geo-replication	Assignee:	Csaba Henk <csaba>
Status:	CLOSED ERRATA	QA Contact:	Vijaykumar Koppad <vkoppad>
Severity:	high	Docs Contact:
Priority:	high
Version:	2.0	CC:	aavati, amarts, bbandari, csaba, pkarampu, rhs-bugs, shaines, surs, vbellur, vshankar
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.4.0.14rhs-1	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	959069 (view as bug list)		Environment:
Last Closed:	2013-09-23 22:29:52 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	850514, 959069

Description Vijaykumar Koppad 2012-11-16 07:29:42 UTC

Description of problem: If a single brick of distributed-replicate volume goes down, the geo-rep session becomes faulty. The geo-rep session should go faulty only if whole sub-volume is down. 


Version-Release number of selected component (if applicable):3.3.0.5rhs-38.el6rhs


How reproducible: Consistently 


Steps to Reproduce:
1.Start a geo-rep sesssion between master(dist-repl) and slave(any)
2.Kill any of the brick-process 
3.Check the geo-rep status 
  
Actual results: Geo-rep status goes to faulty 


Expected results: With only one brick down , geo-rep shouldn't go faulty. 


Additional info:

Log-snippet from geo-rep log file 
####################################################################
[2012-11-16 12:55:44.462631] I [master:683:crawl] _GMaster: primary master with volume id b79a88bf-53cc-4852-8f00-f81c66ac0e43 ...
[2012-11-16 12:55:44.463169] D [master:698:crawl] _GMaster: entering .
[2012-11-16 12:55:44.464071] E [syncdutils:184:log_raise_exception] <top>: glusterfs session went down [ENOTCONN]
[2012-11-16 12:55:44.464179] E [syncdutils:190:log_raise_exception] <top>: FULL EXCEPTION TRACE: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 120, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 400, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 874, in service_loop
    gmaster_builder()(self, args[0]).crawl_loop()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 540, in crawl_loop
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 700, in crawl
    xtl = self.xtime(path)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 376, in xtime
    return self.xtime_low(rsc.server, path, **opts)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 110, in xtime_low
    xt = server.xtime(path, self.uuid)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 270, in ff
    return f(*a)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 365, in xtime
    return struct.unpack('!II', Xattr.lgetxattr(path, '.'.join([cls.GX_NSPACE, uuid, 'xtime']), 8))
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 43, in lgetxattr
    return cls._query_xattr( path, siz, 'lgetxattr', attr)
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 35, in _query_xattr
    cls.raise_oserr()
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 25, in raise_oserr
    raise OSError(errn, os.strerror(errn))
OSError: [Errno 107] Transport endpoint is not connected
[2012-11-16 12:55:44.465147] I [syncdutils:148:finalize] <top>: exiting.
[2012-11-16 12:55:45.423644] D [monitor(monitor):100:monitor] Monitor: worker died in startup phase
#####################################################################

Log snippet from geo-rep gluster-log-file 
####################################################################

[2012-11-16 12:57:08.495768] D [client-handshake.c:184:client_start_ping] 0-master-client-1: returning as transport is already disconnected OR there are no frames (1 || 1)
[2012-11-16 12:57:08.495788] D [client.c:2043:client_rpc_notify] 0-master-client-2: got RPC_CLNT_CONNECT
[2012-11-16 12:57:08.495819] D [client-handshake.c:184:client_start_ping] 0-master-client-2: returning as transport is already disconnected OR there are no frames (1 || 1)
[2012-11-16 12:57:08.495839] D [client.c:2043:client_rpc_notify] 0-master-client-3: got RPC_CLNT_CONNECT
[2012-11-16 12:57:08.495871] D [client-handshake.c:184:client_start_ping] 0-master-client-3: returning as transport is already disconnected OR there are no frames (1 || 1)
[2012-11-16 12:57:08.495903] D [client-handshake.c:1670:server_has_portmap] 0-master-client-0: detected portmapper on server
[2012-11-16 12:57:08.495937] D [client-handshake.c:184:client_start_ping] 0-master-client-0: returning as transport is already disconnected OR there are no frames (1 || 1)
[2012-11-16 12:57:08.495960] D [client-handshake.c:1670:server_has_portmap] 0-master-client-1: detected portmapper on server
[2012-11-16 12:57:08.495986] D [client-handshake.c:184:client_start_ping] 0-master-client-1: returning as transport is already disconnected OR there are no frames (1 || 1)
[2012-11-16 12:57:08.496007] D [client-handshake.c:1670:server_has_portmap] 0-master-client-2: detected portmapper on server
[2012-11-16 12:57:08.496031] D [client-handshake.c:184:client_start_ping] 0-master-client-2: returning as transport is already disconnected OR there are no frames (1 || 1)
[2012-11-16 12:57:08.496052] D [client-handshake.c:1670:server_has_portmap] 0-master-client-3: detected portmapper on server
[2012-11-16 12:57:08.496076] D [client-handshake.c:184:client_start_ping] 0-master-client-3: returning as transport is already disconnected OR there are no frames (1 || 1)
[2012-11-16 12:57:08.496100] I [rpc-clnt.c:1659:rpc_clnt_reconfig] 0-master-client-0: changing port to 24009 (from 0)
[2012-11-16 12:57:08.496131] I [rpc-clnt.c:1659:rpc_clnt_reconfig] 0-master-client-1: changing port to 24010 (from 0)
[2012-11-16 12:57:08.496161] E [client-handshake.c:1717:client_query_portmap_cbk] 0-master-client-2: failed to get the port number for remote subvolume
[2012-11-16 12:57:08.496190] I [rpc-clnt.c:1659:rpc_clnt_reconfig] 0-master-client-3: changing port to 24012 (from 0)
[2012-11-16 12:57:08.496216] D [socket.c:184:__socket_rwv] 0-master-client-0: EOF from peer 10.70.34.56:24007
[2012-11-16 12:57:08.496234] D [socket.c:1512:__socket_proto_state_machine] 0-master-client-0: reading from socket failed. Error (Transport endpoint is not connected), peer (10.70.34.56:24007)
[2012-11-16 12:57:08.496244] D [socket.c:1798:socket_event_handler] 0-transport: disconnecting now
[2012-11-16 12:57:08.496260] D [client.c:2108:client_rpc_notify] 0-master-client-0: disconnected (skipped notify)
[2012-11-16 12:57:08.496272] D [socket.c:184:__socket_rwv] 0-master-client-1: EOF from peer 10.70.34.56:24007
[2012-11-16 12:57:08.496280] D [socket.c:1512:__socket_proto_state_machine] 0-master-client-1: reading from socket failed. Error (Transport endpoint is not connected), peer (10.70.34.56:24007)
[2012-11-16 12:57:08.496293] D [socket.c:1798:socket_event_handler] 0-transport: disconnecting now
[2012-11-16 12:57:08.496306] D [client.c:2108:client_rpc_notify] 0-master-client-1: disconnected (skipped notify)
[2012-11-16 12:57:08.496317] D [socket.c:184:__socket_rwv] 0-master-client-2: EOF from peer 10.70.34.56:24007
[2012-11-16 12:57:08.496326] D [socket.c:1512:__socket_proto_state_machine] 0-master-client-2: reading from socket failed. Error (Transport endpoint is not connected), peer (10.70.34.56:24007)
[2012-11-16 12:57:08.496334] D [socket.c:1798:socket_event_handler] 0-transport: disconnecting now
[2012-11-16 12:57:08.496345] I [client.c:2090:client_rpc_notify] 0-master-client-2: disconnected
[2012-11-16 12:57:08.496357] D [socket.c:184:__socket_rwv] 0-master-client-3: EOF from peer 10.70.34.56:24007
[2012-11-16 12:57:08.496365] D [socket.c:1512:__socket_proto_state_machine] 0-master-client-3: reading from socket failed. Error (Transport endpoint is not connected), peer (10.70.34.56:24007)
[2012-11-16 12:57:08.496373] D [socket.c:1798:socket_event_handler] 0-transport: disconnecting now
[2012-11-16 12:57:08.496384] D [client.c:2108:client_rpc_notify] 0-master-client-3: disconnected (skipped notify)
[2012-11-16 12:57:12.480479] D [name.c:149:client_fill_address_family] 0-master-client-0: address-family not specified, guessing it to be inet/inet6
[2012-11-16 12:57:12.482834] D [common-utils.c:151:gf_resolve_ip6] 0-resolver: returning ip-10.70.34.56 (port-24007) for hostname: 10.70.34.56 and port: 24007
[2012-11-16 12:57:12.482931] D [name.c:149:client_fill_address_family] 0-master-client-1: address-family not specified, guessing it to be inet/inet6
[2012-11-16 12:57:12.483022] D [client.c:2043:client_rpc_notify] 0-master-client-0: got RPC_CLNT_CONNECT
[2012-11-16 12:57:12.483132] D [client-handshake.c:184:client_start_ping] 0-master-client-0: returning as transport is already disconnected OR there are no frames (1 || 1)
[2012-11-16 12:57:12.483287] I [client-handshake.c:1636:select_server_supported_programs] 0-master-client-0: Using Program GlusterFS 3.3.0.5rhs, Num (1298437), Version (330)
[2012-11-16 12:57:12.483382] D [client-handshake.c:184:client_start_ping] 0-master-client-0: returning as transport is already disconnected OR there are no frames (1 || 1)
[2012-11-16 12:57:12.483677] D [client-handshake.c:1407:client_setvolume_cbk] 0-master-client-0: clnt-lk-version = 1, server-lk-version = 0
[2012-11-16 12:57:12.483712] I [client-handshake.c:1433:client_setvolume_cbk] 0-master-client-0: Connected to 10.70.34.56:24009, attached to remote volume '/exportdir/brick1'.
[2012-11-16 12:57:12.483735] I [client-handshake.c:1445:client_setvolume_cbk] 0-master-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2012-11-16 12:57:12.483751] D [client-handshake.c:1295:client_post_handshake] 0-master-client-0: No fds to open - notifying all parents child up
[2012-11-16 12:57:12.483767] D [client-handshake.c:489:client_set_lk_version] 0-master-client-0: Sending SET_LK_VERSION
[2012-11-16 12:57:12.483812] I [afr-common.c:3628:afr_notify] 0-master-replicate-0: Subvolume 'master-client-0' came back up; going online.
[2012-11-16 12:57:12.483967] I [client-handshake.c:453:client_set_lk_version_cbk] 0-master-client-0: Server lk version = 1
[2012-11-16 12:57:12.485182] D [common-utils.c:151:gf_resolve_ip6] 0-resolver: returning ip-10.70.34.56 (port-24007) for hostname: 10.70.34.56 and port: 24007
[2012-11-16 12:57:12.485264] D [name.c:149:client_fill_address_family] 0-master-client-2: address-family not specified, guessing it to be inet/inet6
[2012-11-16 12:57:12.485348] D [client.c:2043:client_rpc_notify] 0-master-client-1: got RPC_CLNT_CONNECT
[2012-11-16 12:57:12.485434] D [client-handshake.c:184:client_start_ping] 0-master-client-1: returning as transport is already disconnected OR there are no frames (1 || 1)
[2012-11-16 12:57:12.485588] I [client-handshake.c:1636:select_server_supported_programs] 0-master-client-1: Using Program GlusterFS 3.3.0.5rhs, Num (1298437), Version (330)
[2012-11-16 12:57:12.485684] D [client-handshake.c:184:client_start_ping] 0-master-client-1: returning as transport is already disconnected OR there are no frames (1 || 1)
[2012-11-16 12:57:12.485974] D [client-handshake.c:1407:client_setvolume_cbk] 0-master-client-1: clnt-lk-version = 1, server-lk-version = 0
[2012-11-16 12:57:12.486009] I [client-handshake.c:1433:client_setvolume_cbk] 0-master-client-1: Connected to 10.70.34.56:24010, attached to remote volume '/exportdir/brick2'.
[2012-11-16 12:57:12.486032] I [client-handshake.c:1445:client_setvolume_cbk] 0-master-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2012-11-16 12:57:12.486048] D [client-handshake.c:1295:client_post_handshake] 0-master-client-1: No fds to open - notifying all parents child up
[2012-11-16 12:57:12.486068] D [client-handshake.c:489:client_set_lk_version] 0-master-client-1: Sending SET_LK_VERSION
[2012-11-16 12:57:12.486211] I [client-handshake.c:453:client_set_lk_version_cbk] 0-master-client-1: Server lk version = 1
[2012-11-16 12:57:12.486398] D [dht-diskusage.c:80:dht_du_info_cbk] 0-master-dht: on subvolume 'master-replicate-0': avail_percent is: 99.00 and avail_space is: 409665339392 and avail_inodes is: 99.00
[2012-11-16 12:57:12.487536] D [common-utils.c:151:gf_resolve_ip6] 0-resolver: returning ip-10.70.34.56 (port-24007) for hostname: 10.70.34.56 and port: 24007
[2012-11-16 12:57:12.487621] D [name.c:149:client_fill_address_family] 0-master-client-3: address-family not specified, guessing it to be inet/inet6
[2012-11-16 12:57:12.487704] D [client.c:2043:client_rpc_notify] 0-master-client-2: got RPC_CLNT_CONNECT
[2012-11-16 12:57:12.487789] D [client-handshake.c:184:client_start_ping] 0-master-client-2: returning as transport is already disconnected OR there are no frames (1 || 1)
[2012-11-16 12:57:12.487938] D [client-handshake.c:1670:server_has_portmap] 0-master-client-2: detected portmapper on server
[2012-11-16 12:57:12.488011] D [client-handshake.c:184:client_start_ping] 0-master-client-2: returning as transport is already disconnected OR there are no frames (1 || 1)
[2012-11-16 12:57:12.488184] D [client-handshake.c:1717:client_query_portmap_cbk] 0-master-client-2: failed to get the port number for remote subvolume
[2012-11-16 12:57:12.488255] D [socket.c:184:__socket_rwv] 0-master-client-2: EOF from peer 10.70.34.56:24007
[2012-11-16 12:57:12.488283] D [socket.c:1512:__socket_proto_state_machine] 0-master-client-2: reading from socket failed. Error (Transport endpoint is not connected), peer (10.70.34.56:24007)
[2012-11-16 12:57:12.488300] D [socket.c:1798:socket_event_handler] 0-transport: disconnecting now
[2012-11-16 12:57:12.488335] I [client.c:2090:client_rpc_notify] 0-master-client-2: disconnected
[2012-11-16 12:57:12.489878] D [common-utils.c:151:gf_resolve_ip6] 0-resolver: returning ip-10.70.34.56 (port-24007) for hostname: 10.70.34.56 and port: 24007
[2012-11-16 12:57:12.490003] D [client.c:2043:client_rpc_notify] 0-master-client-3: got RPC_CLNT_CONNECT
[2012-11-16 12:57:12.490092] D [client-handshake.c:184:client_start_ping] 0-master-client-3: returning as transport is already disconnected OR there are no frames (1 || 1)
[2012-11-16 12:57:12.490217] I [client-handshake.c:1636:select_server_supported_programs] 0-master-client-3: Using Program GlusterFS 3.3.0.5rhs, Num (1298437), Version (330)
[2012-11-16 12:57:12.490294] D [client-handshake.c:184:client_start_ping] 0-master-client-3: returning as transport is already disconnected OR there are no frames (1 || 1)
[2012-11-16 12:57:12.490571] D [client-handshake.c:1407:client_setvolume_cbk] 0-master-client-3: clnt-lk-version = 1, server-lk-version = 0
[2012-11-16 12:57:12.490601] I [client-handshake.c:1433:client_setvolume_cbk] 0-master-client-3: Connected to 10.70.34.56:24012, attached to remote volume '/exportdir/brick4'.
[2012-11-16 12:57:12.490620] I [client-handshake.c:1445:client_setvolume_cbk] 0-master-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2012-11-16 12:57:12.490645] D [client-handshake.c:1295:client_post_handshake] 0-master-client-3: No fds to open - notifying all parents child up
[2012-11-16 12:57:12.490656] D [client-handshake.c:489:client_set_lk_version] 0-master-client-3: Sending SET_LK_VERSION
[2012-11-16 12:57:12.490699] I [afr-common.c:3628:afr_notify] 0-master-replicate-1: Subvolume 'master-client-3' came back up; going online.
[2012-11-16 12:57:12.490738] D [fuse-bridge.c:4240:notify] 0-fuse: got event 5 on graph 0
[2012-11-16 12:57:12.493491] I [fuse-bridge.c:4222:fuse_graph_setup] 0-fuse: switched to graph 0
[2012-11-16 12:57:12.493603] I [client-handshake.c:453:client_set_lk_version_cbk] 0-master-client-3: Server lk version = 1
[2012-11-16 12:57:12.493655] D [dht-diskusage.c:80:dht_du_info_cbk] 0-master-dht: on subvolume 'master-replicate-1': avail_percent is: 99.00 and avail_space is: 409665339392 and avail_inodes is: 99.00
[2012-11-16 12:57:12.493665] D [fuse-bridge.c:3917:fuse_get_mount_status] 0-fuse: mount status is 0
[2012-11-16 12:57:12.493783] I [fuse-bridge.c:3405:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13
[2012-11-16 12:57:12.494319] I [afr-common.c:1965:afr_set_root_inode_on_first_lookup] 0-master-replicate-0: added root inode
[2012-11-16 12:57:12.494392] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-0: pending_matrix: [ 0 0 ]
[2012-11-16 12:57:12.494405] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-0: pending_matrix: [ 0 0 ]
[2012-11-16 12:57:12.494414] D [afr-self-heal-common.c:829:afr_mark_sources] 0-master-replicate-0: Number of sources: 0
[2012-11-16 12:57:12.494422] D [afr-self-heal-data.c:861:afr_lookup_select_read_child_by_txn_type] 0-master-replicate-0: returning read_child: 1
[2012-11-16 12:57:12.494430] D [afr-common.c:1294:afr_lookup_select_read_child] 0-master-replicate-0: Source selected as 1 for /
[2012-11-16 12:57:12.494442] D [afr-common.c:1097:afr_lookup_build_response_params] 0-master-replicate-0: Building lookup response from 1
[2012-11-16 12:57:12.494967] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-0: pending_matrix: [ 0 0 ]
[2012-11-16 12:57:12.494999] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-0: pending_matrix: [ 0 0 ]
[2012-11-16 12:57:12.495022] D [afr-self-heal-common.c:829:afr_mark_sources] 0-master-replicate-0: Number of sources: 0
[2012-11-16 12:57:12.495040] D [afr-self-heal-data.c:861:afr_lookup_select_read_child_by_txn_type] 0-master-replicate-0: returning read_child: 1
[2012-11-16 12:57:12.495049] D [afr-common.c:1294:afr_lookup_select_read_child] 0-master-replicate-0: Source selected as 1 for /
[2012-11-16 12:57:12.495057] D [afr-common.c:1097:afr_lookup_build_response_params] 0-master-replicate-0: Building lookup response from 1
[2012-11-16 12:57:12.495091] I [afr-common.c:1965:afr_set_root_inode_on_first_lookup] 0-master-replicate-1: added root inode
[2012-11-16 12:57:12.495106] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-1: pending_matrix: [ 0 0 ]
[2012-11-16 12:57:12.495115] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-1: pending_matrix: [ 0 0 ]
[2012-11-16 12:57:12.495123] D [afr-self-heal-common.c:829:afr_mark_sources] 0-master-replicate-1: Number of sources: 0
[2012-11-16 12:57:12.495131] D [afr-self-heal-data.c:861:afr_lookup_select_read_child_by_txn_type] 0-master-replicate-1: returning read_child: 1
[2012-11-16 12:57:12.495139] D [afr-common.c:1294:afr_lookup_select_read_child] 0-master-replicate-1: Source selected as 1 for /
[2012-11-16 12:57:12.495148] D [afr-common.c:1097:afr_lookup_build_response_params] 0-master-replicate-1: Building lookup response from 1
[2012-11-16 12:57:12.495156] D [afr-common.c:1636:afr_lookup_perform_self_heal] 0-master-replicate-1: Only 1 child up - do not attempt to detect self heal
[2012-11-16 12:57:14.494848] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-1: pending_matrix: [ 0 0 ]
[2012-11-16 12:57:14.494881] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-1: pending_matrix: [ 0 0 ]
[2012-11-16 12:57:14.494891] D [afr-self-heal-common.c:829:afr_mark_sources] 0-master-replicate-1: Number of sources: 0
[2012-11-16 12:57:14.494900] D [afr-self-heal-data.c:861:afr_lookup_select_read_child_by_txn_type] 0-master-replicate-1: returning read_child: 1
[2012-11-16 12:57:14.494908] D [afr-common.c:1294:afr_lookup_select_read_child] 0-master-replicate-1: Source selected as 1 for /
[2012-11-16 12:57:14.494918] D [afr-common.c:1097:afr_lookup_build_response_params] 0-master-replicate-1: Building lookup response from 1
[2012-11-16 12:57:14.494927] D [afr-common.c:1636:afr_lookup_perform_self_heal] 0-master-replicate-1: Only 1 child up - do not attempt to detect self heal
[2012-11-16 12:57:14.494977] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-0: pending_matrix: [ 0 0 ]
[2012-11-16 12:57:14.494989] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-master-replicate-0: pending_matrix: [ 0 0 ]
[2012-11-16 12:57:14.494998] D [afr-self-heal-common.c:829:afr_mark_sources] 0-master-replicate-0: Number of sources: 0
[2012-11-16 12:57:14.495006] D [afr-self-heal-data.c:861:afr_lookup_select_read_child_by_txn_type] 0-master-replicate-0: returning read_child: 1
[2012-11-16 12:57:14.495013] D [afr-common.c:1294:afr_lookup_select_read_child] 0-master-replicate-0: Source selected as 1 for /
[2012-11-16 12:57:14.495022] D [afr-common.c:1097:afr_lookup_build_response_params] 0-master-replicate-0: Building lookup response from 1
[2012-11-16 12:57:14.525784] D [afr-common.c:704:afr_get_call_child] 0-master-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-11-16 12:57:14.525840] D [afr-common.c:704:afr_get_call_child] 0-master-replicate-1: Returning 0, call_child: 1, last_index: -1
[2012-11-16 12:57:14.526549] D [afr-common.c:704:afr_get_call_child] 0-master-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-11-16 12:57:14.526594] D [afr-common.c:704:afr_get_call_child] 0-master-replicate-1: Returning 0, call_child: 1, last_index: -1
[2012-11-16 12:57:14.527123] D [fuse-helpers.c:484:fuse_flip_xattr_ns] 0-glusterfs-fuse: PID: -1, checking xattr(s): volume-mark*, *xtime
[2012-11-16 12:57:14.529547] D [fuse-helpers.c:484:fuse_flip_xattr_ns] 0-glusterfs-fuse: PID: -1, checking xattr(s): volume-mark*, *xtime
[2012-11-16 12:57:14.530003] W [fuse-bridge.c:2841:fuse_xattr_cbk] 0-glusterfs-fuse: 8: GETXATTR(trusted.glusterfs.b79a88bf-53cc-4852-8f00-f81c66ac0e43.xtime) / => -1 (Transport endpoint is not connected)
[2012-11-16 12:57:14.537909] D [fuse-bridge.c:4028:fuse_thread_proc] 0-glusterfs-fuse: terminating upon getting ENODEV when reading /dev/fuse
[2012-11-16 12:57:14.537965] I [fuse-bridge.c:4122:fuse_thread_proc] 0-fuse: unmounting /tmp/gsyncd-aux-mount-kEaq3u
[2012-11-16 12:57:14.538293] W [glusterfsd.c:831:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x390d4e5ccd] (-->/lib64/libpthread.so.0() [0x390dc077f1] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x405cfd]))) 0-: received signum (15), shutting down
[2012-11-16 12:57:14.538317] D [glusterfsd-mgmt.c:2157:glusterfs_mgmt_pmap_signout] 0-fsd-mgmt: portmapper signout arguments not given
[2012-11-16 12:57:14.538328] I [fuse-bridge.c:4672:fini] 0-fuse: Unmounting '/tmp/gsyncd-aux-mount-kEaq3u'.

Comment 2 Csaba Henk 2012-11-17 15:31:43 UTC

Should it not be a feature-bug? In geo-rep context, how can we differentiate between the cases when data and when only redundancy is lost due to the brick down?

If we can cut back safely on overly cautiousness, that's good, but if we can't, that does not seem to me to be a problem. Is there any spec or actual feature request that the current situation does not comply to?

Comment 3 Vijay Bellur 2012-11-23 10:50:17 UTC

(In reply to comment #2)
> Should it not be a feature-bug? In geo-rep context, how can we differentiate
> between the cases when data and when only redundancy is lost due to the
> brick down?

Is the failure due to xtime aggregation?

> 
> If we can cut back safely on overly cautiousness, that's good, but if we
> can't, that does not seem to me to be a problem. Is there any spec or actual
> feature request that the current situation does not comply to?

In general, this violates the high availability that gluster provides.

Comment 4 Csaba Henk 2012-11-23 12:50:20 UTC

(In reply to comment #3)
> Is the failure due to xtime aggregation?

I think assert-no-child-down.

> In general, this violates the high availability that gluster provides.

OK, why I said this should be an enhancement bug, is that I don't see an
easy way to fix it, and the behavior is in accordance with what we aimed as
of the current implementation. Do you have any idea how to attack this?

Comment 5 Vijay Bellur 2012-11-23 12:55:26 UTC

(In reply to comment #4)
> (In reply to comment #3)
> > Is the failure due to xtime aggregation?
> 
> I think assert-no-child-down.

assert-no-child-down should take effect after all children of distribute node are down. We will need to investigate why assert-no-child-down kicked in when only one of the bricks of a volume with replica count 2 went down.

Comment 6 Csaba Henk 2012-11-23 15:07:41 UTC

Sorry for spreading confusion, I implied to assert-no-child-down because errno is ENOTCONN, and that usually means the gluster client is terminated, and brick-down + client termination had assert-no-child-down smell. A superficial chain of thought...

Indeed, looking into the gluster log (which I missed), it seems to be aggregation. The aggregation logic should be then refined. Maybe it's easy? :)

Comment 7 Vijay Bellur 2012-11-25 15:21:34 UTC

 
> Indeed, looking into the gluster log (which I missed), it seems to be
> aggregation. The aggregation logic should be then refined. Maybe it's easy?
> :)

Yeah, maybe an additional flag in local to determine the least number of children on which this operation needs to succeed?

Comment 8 Csaba Henk 2012-11-26 17:23:17 UTC

(In reply to comment #7)
>  
> > Indeed, looking into the gluster log (which I missed), it seems to be
> > aggregation. The aggregation logic should be then refined. Maybe it's easy?
> > :)
> 
> Yeah, maybe an additional flag in local to determine the least number of
> children on which this operation needs to succeed?

How can you narrow it down to a numeric measure? It's the topology that matters AFAIK...

How exactly does DHT manage assert-child-no-down, ie. on what circumstances does it trigger the assertion? Maybe we could use the same logic for aggregation.

Comment 9 Vijay Bellur 2012-11-26 18:30:29 UTC

(In reply to comment #8)
> (In reply to comment #7)
> >  
> > > Indeed, looking into the gluster log (which I missed), it seems to be
> > > aggregation. The aggregation logic should be then refined. Maybe it's easy?
> > > :)
> > 
> > Yeah, maybe an additional flag in local to determine the least number of
> > children on which this operation needs to succeed?
> 
> How can you narrow it down to a numeric measure? It's the topology that
> matters AFAIK...

Since you intend to keep the aggregation logic generic, you can use a numeric measure to determine the topology. For afr, you need at least one reply to succeed. For dht and stripe, you need replies from all STACK_WINDs to succeed.

> 
> How exactly does DHT manage assert-child-no-down, ie. on what circumstances
> does it trigger the assertion? Maybe we could use the same logic for
> aggregation.

DHT manages assert-child-no-down by listening to CHILD_DOWN notification.

Comment 10 Venky Shankar 2013-02-26 11:00:05 UTC

looks like the client getting terminated is not due to dht getting CHILD_DOWN when  one brick is taken down.

Did a small experiment with Vijaykumar: 2x2 distributed-replicate volume, killed a brick, and getfattr for trusted.glusterfs.<volume-id>.xtime from a mount point (with client-pid = -1) and got the following (pasted from IRC)

16:15 <vijaykumar> [root@rhs01 client-1]# getfattr -e hex -n 
                   trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime f*
16:15 <vijaykumar> f0: trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime: Transport endpoint is not 
                   connected
16:15 <vijaykumar> f1: trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime: Transport endpoint is not 
                   connected
16:15 <vijaykumar> f2: trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime: Transport endpoint is not 
                   connected
16:15 <vijaykumar> f3: trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime: Transport endpoint is not 
                   connected
16:15 <vijaykumar> # file: f4
16:15 <vijaykumar> trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime=0x512c920800004bf3
16:15 <vijaykumar> f5: trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime: Transport endpoint is not 
                   connected
16:15 <vijaykumar> # file: f6
16:15 <vijaykumar> trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime=0x512c92080000543a
16:15 <vijaykumar> # file: f7
16:15 <vijaykumar> trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime=0x512c920800005942
16:15 <vijaykumar> # file: f8
16:15 <vijaykumar> trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime=0x512c920800005d74
16:15 <vijaykumar> # file: f9
16:15 <vijaykumar> trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime=0x512c920800006248


for files which hash to the subvol (where a brick was down) getfattr returns "Transport endpoint is not connected" (which can be seen in the client logs as per comment #1). There should be at least an xtime that is given back to the client.

Further, as per comment #1 there is an termination of the client process (but this does not happen in our test).

Comment 13 Csaba Henk 2013-05-08 09:29:39 UTC


*** This bug has been marked as a duplicate of bug 959069 ***

Comment 14 Amar Tumballi 2013-08-01 09:38:30 UTC

with the newer geo-replication implementation, this is taken care.

Comment 15 Sachidananda Urs 2013-08-07 06:13:59 UTC

With newer geo-rep in place, this scenario is now obsolete. In the current scenario if a single brick goes down:

* If the setup is replicate - then other node takes care.
* If the setup is distribute - that particular gsync session goes faulty.

This is the expected behavior.

Comment 16 Scott Haines 2013-09-23 22:29:52 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html