Bug 764261 (GLUSTER-2529)

Summary: Starting Gsync causes ENOTCONN to glusterfs client
Product: [Community] GlusterFS Reporter: Lakshmipathi G <lakshmipathi>
Component: geo-replicationAssignee: Csaba Henk <csaba>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: mainlineCC: csaba, gluster-bugs, kbudiger
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTA Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Lakshmipathi G 2011-03-15 14:13:57 UTC
testing with 3.2.1qa2 , created a afr volume with 2 bricks and mount it on brick1 itself.
# mount -t glusterfs 192.168.1.150:/321qa2 /data/laks/mnt

After starting gsyncd as -
gluster volume gsync start :321qa2 root.12.137:/export/dir3
df shows -
------------
# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1             99541724  16932296  77471444  18% /
tmpfs                  1037084         0   1037084   0% /dev/shm
10.1.10.199:/mnt/soho_storage/samba/shares/opt
                     974589952 242859552 731730400  25% /opt
df: `/data/laks/mnt': Transport endpoint is not connected
-----------------
server log-

+------------------------------------------------------------------------------+
[2011-03-16 00:46:26.888968] I [server-handshake.c:535:server_setvolume] 321qa2-server: accepted client from 192.168.1.51:1019
[2011-03-16 00:46:28.304951] I [server-handshake.c:535:server_setvolume] 321qa2-server: accepted client from 192.168.1.150:1014
[2011-03-16 00:46:35.256329] I [server-handshake.c:535:server_setvolume] 321qa2-server: accepted client from 192.168.1.150:1009
[2011-03-16 00:51:22.890004] I [server.c:428:server_rpc_notify] 321qa2-server: disconnected connection from 192.168.1.150:1009
[2011-03-16 00:51:22.890053] I [server-helpers.c:756:server_connection_destroy] 321qa2-server: destroyed connection of RHEL5.5-20224-2011/03/16-00:46:31:207670-321qa2-
client-1
[2011-03-16 00:51:36.744579] I [server-handshake.c:535:server_setvolume] 321qa2-server: accepted client from 192.168.1.150:1018
[2011-03-16 00:52:04.194216] I [server.c:428:server_rpc_notify] 321qa2-server: disconnected connection from 192.168.1.51:1019
[2011-03-16 00:52:04.194323] I [server-helpers.c:756:server_connection_destroy] 321qa2-server: destroyed connection of RHEL5.5-18357-2011/03/16-00:46:23:844709-321qa2-
client-1
[2011-03-16 00:52:05.230262] I [rpc-clnt.c:696:rpc_clnt_handle_cbk] rpc-clnt: recieved rpc message (XID: 0x2a, Ver: 2, Program: 52743234, ProgVers: 1, Proc: 1) from rp
c-transport (glusterfs)
[2011-03-16 00:52:05.230321] I [glusterfsd-mgmt.c:62:mgmt_cbk_spec] mgmt: Volume file changed
[2011-03-16 00:52:05.230476] I [glusterfsd.c:710:cleanup_and_exit] glusterfsd: shutting down
[2011-03-16 00:52:05.782920] E [socket.c:1830:socket_server_event_handler] socket.glusterfsd: Failed to set keep-alive: Operation not supported
[2011-03-16 00:52:06.867431] W [graph.c:274:gf_add_cmdline_options] 321qa2-server: adding option 'listen-port' for volume '321qa2-server' with value '24015'
[2011-03-16 00:52:06.868410] W [rpc-transport.c:444:validate_volume_options] tcp.321qa2-server: option 'listen-port' is deprecated, preferred is 'transport.socket.list
en-port', continuing with correction
Given volfile:
+------------------------------------------------------------------------------+
  8:     subvolumes 321qa2-posix
  9: end-volume
 10: 
 11: volume 321qa2-locks
 12:     type features/locks
 13:     subvolumes 321qa2-access-control
 14: end-volume
 15: 
 16: volume 321qa2-io-threads
 17:     type performance/io-threads
 18:     subvolumes 321qa2-locks
 19: end-volume
 20: 
 21: volume 321qa2-marker
 22:     type features/marker
 23:     option volume-uuid bd162a47-2d33-4e23-aad1-1b2ce7d8ceb5
 24:     option timestamp-file /etc/glusterd/vols/321qa2/marker.tstamp
 25:     subvolumes 321qa2-io-threads
 26: end-volume
 27: 
 28: volume /data/export12
 29:     type debug/io-stats
 30:     subvolumes 321qa2-marker
 31: end-volume
 32: 
 33: volume 321qa2-server
 34:     type protocol/server
 35:     option transport-type tcp
 36:     option auth.addr./data/export12.allow *
 37:     subvolumes /data/export12
 38: end-volume

+------------------------------------------------------------------------------+
[2011-03-16 00:52:12.884879] I [server-handshake.c:535:server_setvolume] 321qa2-server: accepted client from 192.168.1.150:1015
[2011-03-16 00:52:13.298163] I [server-handshake.c:535:server_setvolume] 321qa2-server: accepted client from 192.168.1.51:1016
[2011-03-16 00:52:17.764392] I [server-handshake.c:535:server_setvolume] 321qa2-server: accepted client from 192.168.1.51:1012
(END) 
---------

client-log:
=========
+------------------------------------------------------------------------------+
[2011-03-16 00:51:36.744458] I [client-handshake.c:1027:select_server_supported_programs] 321qa2-client-1: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-03-16 00:51:36.744658] I [client-handshake.c:863:client_setvolume_cbk] 321qa2-client-1: Connected to 192.168.1.150:24014, attached to remote volume '/data/export12'.
[2011-03-16 00:51:36.744674] I [afr-common.c:2552:afr_notify] 321qa2-replicate-0: Subvolume '321qa2-client-1' came back up; going online.
[2011-03-16 00:51:36.752378] I [client-handshake.c:1027:select_server_supported_programs] 321qa2-client-0: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-03-16 00:51:36.752519] I [fuse-bridge.c:2897:fuse_init] glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.10
[2011-03-16 00:51:36.753058] I [afr-common.c:819:afr_fresh_lookup_cbk] 321qa2-replicate-0: added root inode
[2011-03-16 00:51:36.753402] I [client-handshake.c:863:client_setvolume_cbk] 321qa2-client-0: Connected to 192.168.1.51:24014, attached to remote volume '/data/export11'.
[2011-03-16 00:52:03.640239] I [client.c:1601:client_rpc_notify] 321qa2-client-0: disconnected
[2011-03-16 00:52:05.230393] I [rpc-clnt.c:696:rpc_clnt_handle_cbk] rpc-clnt: recieved rpc message (XID: 0x2a, Ver: 2, Program: 52743234, ProgVers: 1, Proc: 1) from rpc-transport (glusterfs)
[2011-03-16 00:52:05.230417] I [glusterfsd-mgmt.c:62:mgmt_cbk_spec] mgmt: Volume file changed
[2011-03-16 00:52:05.231800] I [client.c:1601:client_rpc_notify] 321qa2-client-1: disconnected
[2011-03-16 00:52:05.231815] E [afr-common.c:2584:afr_notify] 321qa2-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2011-03-16 00:52:06.826842] I [glusterfsd-mgmt.c:636:mgmt_getspec_cbk] : No change in volfile, continuing
[2011-03-16 00:52:13.838338] E [socket.c:1677:socket_connect_finish] 321qa2-client-0: connection to 192.168.1.51:24014 failed (Connection refused)
[2011-03-16 00:52:13.838376] E [afr-common.c:2584:afr_notify] 321qa2-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2011-03-16 00:52:13.838407] E [afr-common.c:2584:afr_notify] 321qa2-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2011-03-16 00:52:16.846649] E [afr-common.c:2584:afr_notify] 321qa2-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2011-03-16 00:52:16.846700] E [afr-common.c:2584:afr_notify] 321qa2-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2011-03-16 00:52:17.717021] W [fuse-bridge.c:413:fuse_attr_cbk] glusterfs-fuse: 26: LOOKUP() / => -1 (Transport endpoint is not connected)
[2011-03-16 00:52:18.843879] E [socket.c:1677:socket_connect_finish] 321qa2-client-1: connection to 192.168.1.150:24014 failed (Connection refused)
[2011-03-16 00:52:18.843942] E [afr-common.c:2584:afr_notify] 321qa2-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2011-03-16 00:52:18.843979] E [afr-common.c:2584:afr_notify] 321qa2-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2011-03-16 00:52:18.852001] E [afr-common.c:2584:afr_notify] 321qa2-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2011-03-16 00:52:18.852029] E [afr-common.c:2584:afr_notify] 321qa2-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2011-03-16 00:52:18.938674] W [fuse-bridge.c:2277:fuse_statfs_cbk] glusterfs-fuse: 28: ERR => -1 (Transport endpoint is not connected)
[2011-03-16 00:52:19.855997] E [afr-common.c:2584:afr_notify] 321qa2-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2011-03-16 00:52:19.856057] E [afr-common.c:2584:afr_notify] 321qa2-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2011-03-16 00:52:21.861379] E [afr-common.c:2584:afr_notify] 321qa2-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2011-03-16 00:52:21.861431] E [afr-common.c:2584:afr_notify] 321qa2-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.

Comment 1 Lakshmipathi G 2011-03-17 05:41:34 UTC
tested with latest commit "b44b06a9d0adb50b426e0ee195a9867e01240ada" and http://patches.gluster.com/patch/6493/ 
http://patches.gluster.com/patch/6494/

Still starting gsync umount the mntpt.

Comment 2 Lakshmipathi G 2011-03-17 06:25:34 UTC
it happens with dht too.

Comment 3 kaushik 2011-03-18 08:31:32 UTC
*** Bug 2543 has been marked as a duplicate of this bug. ***

Comment 4 Vijay Bellur 2011-03-22 04:52:21 UTC
PATCH: http://patches.gluster.com/patch/6542 in master (mgmt/glusterd: Glusterfsd not restarted on changes to marker option.)

Comment 5 Vijay Bellur 2011-03-22 08:20:20 UTC
PATCH: http://patches.gluster.com/patch/6549 in master (features/marker: Donot fail init when both gsync and quota are not enabled.)

Comment 6 Vijay Bellur 2011-03-26 10:45:33 UTC
PATCH: http://patches.gluster.com/patch/6577 in master (features/marker: Handle fop's gracefully when none of the feaures are enabled.)

Comment 7 Lakshmipathi G 2011-03-28 04:41:02 UTC
tested with 3.2.0qa5 -its working.will check again with new releases and close this bug.

Comment 8 Anand Avati 2011-04-15 04:31:15 UTC
PATCH: http://patches.gluster.com/patch/6884 in master (glusterd/volgen: partially revert 50ab0ad4)

Comment 9 Lakshmipathi G 2011-04-15 07:51:33 UTC
When starting gsyncd,glfs-client doesn't get killed with 3.2.0qa12.