While mounting client over rdma, since remote port isn't mentioned it defaults to rdma default port no 6997. [2010-09-07 02:12:26.658688] E [rdma.c:4299:tcp_connect_finish] sep7-client-0: tcp connect to failed (Connection refused) [2010-09-07 02:12:26.658708] D [rdma.c:4219:rdma_handshake_pollerr] rpc-transport/rdma: sep7-client-0: peer disconnected, cleaning up [2010-09-07 02:12:26.658730] D [rpc-clnt.c:483:rpc_clnt_connection_cleanup] rpc-clnt: cleaning up state in transport object 0x4d07708 [2010-09-07 02:12:26.658747] E [afr-common.c:2643:afr_notify] sep7-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back u p. [2010-09-07 02:12:26.661426] D [name.c:146:client_fill_address_family] sep7-client-2: address-family not specified, guessing it to be inet/inet6 [2010-09-07 02:12:26.661447] D [name.c:218:af_inet_client_get_remote_sockaddr] sep7-client-2: option remote-port missing in volume sep7-client-2. Defaulting to 6997 [2010-09-07 02:12:26.661506] E [rdma.c:4299:tcp_connect_finish] sep7-client-1: tcp connect to failed (Connection refused) [2010-09-07 02:12:26.661524] D [rdma.c:4219:rdma_handshake_pollerr] rpc-transport/rdma: sep7-client-1: peer disconnected, cleaning up [2010-09-07 02:12:26.661545] D [rpc-clnt.c:483:rpc_clnt_connection_cleanup] rpc-clnt: cleaning up state in transport object 0x4ce2888 [2010-09-07 02:12:26.661565] E [afr-common.c:2643:afr_notify] sep7-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back u p. [2010-09-07 02:12:26.664253] D [name.c:146:client_fill_address_family] sep7-client-3: address-family not specified, guessing it to be inet/inet6 [2010-09-07 02:12:26.664255] E [rdma.c:4299:tcp_connect_finish] sep7-client-2: tcp connect to failed (Connection refused) [2010-09-07 02:12:26.664284] D [name.c:218:af_inet_client_get_remote_sockaddr] sep7-client-3: option remote-port missing in volume sep7-client-3. Defaulting to 6997 [2010-09-07 02:12:26.664310] D [rdma.c:4219:rdma_handshake_pollerr] rpc-transport/rdma: sep7-client-2: peer disconnected, cleaning up [2010-09-07 02:12:26.664360] D [rpc-clnt.c:483:rpc_clnt_connection_cleanup] rpc-clnt: cleaning up state in transport object 0x4cbda08 [2010-09-07 02:12:26.664377] E [afr-common.c:2643:afr_notify] sep7-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back u p. [2010-09-07 02:12:26.667189] E [rdma.c:4299:tcp_connect_finish] sep7-client-3: tcp connect to failed (Connection refused) [2010-09-07 02:12:26.667204] D [rdma.c:4219:rdma_handshake_pollerr] rpc-transport/rdma: sep7-client-3: peer disconnected, cleaning up [2010-09-07 02:12:26.667225] D [rpc-clnt.c:483:rpc_clnt_connection_cleanup] rpc-clnt: cleaning up state in transport object 0x4c961d8 [2010-09-07 02:12:26.667242] E [afr-common.c:2643:afr_notify] sep7-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back u p.
patch http://patches.gluster.com/patch/4643/ in combination of glusterd listening on default ports of rdma and socket will fix this issue. An example configuration is given below: volume management type mgmt/glusterd option working-directory /etc/glusterd option transport-type socket,rdma option transport.socket.listen-port 6969 option transport.rdma.listen-port 6997 end-volume
PATCH: http://patches.gluster.com/patch/4643 in master (rpc-transport/rdma: honour port argument sent in rdma_connect.)
Raghu, Do you need the below changes in glusterd.vol? I see that its not committed in mainline..
(In reply to comment #3) > Raghu, Do you need the below changes in glusterd.vol? I see that its not > committed in mainline.. Yes!! will do it :)
PATCH: http://patches.gluster.com/patch/4647 in master (mgmt/glusterd: make glusterd to listen on default ports of both socket and rdma transports.)