Description of problem: ======================= In a scenario where there are many snapshots taken on the volume and if another node is attached to the cluster, it takes a long time (times out) as all the snapshots have to be copied to the new node which is being attached to the cluster. User has to wait until all the snapshots are copied before gluster peer status shows the newly added node is in 'Peer in Cluster' state Also as the cluster is scaled up, peer probe times out and returns with errno (-1)as the frame times out and the newly added node remains in the state: 'Sent and Received peer request (Connected)' Version-Release number of selected component (if applicable): ============================================================ glusterfs 3.6.0.28 How reproducible: ================ always Steps to Reproduce: ================== 12 node cluster 6x2 dist repl volume 1.Fuse and NFS mount the volume and create some IO 2.Created ~170 snapshots 3.Attach another node to cluster [root@dhcp-8-29-222 ~]# time gluster peer probe 10.8.30.26 real 2m0.095s user 0m0.080s sys 0m0.030s gluster peer status shows the state of the node as 'Sent and Received peer request ' until all the snapshots are copied Hostname: 10.8.30.26 Uuid: 77d79c8d-c1b1-41a6-870b-3c51755cc285 State: Sent and Received peer request (Connected) [root@dhcp-8-30-26 ~]# less /var/lib/glusterd/snaps/ | wc -l 153 [root@dhcp-8-30-26 ~]# less /var/lib/glusterd/snaps/ | wc -l 153 [root@dhcp-8-30-26 ~]# less /var/lib/glusterd/snaps/ | wc -l 172 After all the snapshots are copied, the newly added node shows its state as 'Peer in Cluster (Connected)' ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Also as the cluster is scaled up with more nodes, peer probe times out and returns with errno -1 (though all the snapshots are copied to the newly added node) as the frame times out and gluster peer status shows the state of the node as 'Sent and Received peer request (Connected)' until the node is detached and attached again. --------------------Part of .cmd_log_history -------------------- [2014-09-09 11:07:14.704437] : peer probe 10.8.30.29 : FAILED : Probe returned with unknown errno -1 [2014-09-09 11:09:20.730677] : peer probe 10.8.30.30 : FAILED : Probe returned with unknown errno -1 ------------------------------------------------------------------- Actual results: ============== Attaching another node to the cluster which has many snapshots takes a long time and as the cluster is scaled up it times out and returns with errno -1 as the frame times out Expected results: ================ As the cluster is scaled up, peer probe should not take long and should complete successfully without timing out Additional info:
Current Gluster architecture does not support implementation of this feature. Therefore this feature request is deferred till Gluterd 2.0.