+++ This bug was initially created as a clone of Bug #1457731 +++ Description: ------------ Added bricks to a dist rep volume,ran rebalance. These are the rebalance ETAs at different intervals : [T4 > T3 > T2 > T1] **At time T1** [root@gqas014 ~]# gluster v rebalance butcher status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 63949 9.8GB 295287 0 0 in progress 0:34:57 server2 64644 9.9GB 300745 0 0 in progress 0:34:57 Estimated time left for rebalance to complete : 0:00:38 volume rebalance: butcher: success **At time T2** [root@server1 ~]# gluster v rebalance butcher status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 64010 9.8GB 295597 0 0 in progress 0:34:58 server2 64705 9.9GB 300918 0 0 in progress 0:34:58 Estimated time left for rebalance to complete : 0:01:09 **At Time T3** : [root@server1 ~]# gluster v rebalance butcher status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 68057 10.0GB 313569 0 0 in progress 0:36:46 server2 68904 10.2GB 319823 0 0 in progress 0:36:46 Estimated time left for rebalance to complete : 0:00:09 volume rebalance: butcher: success [root@server1 ~]# gluster v rebalance butcher status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 68110 10.0GB 313882 0 0 in progress 0:36:48 server2 68958 10.2GB 319948 0 0 in progress 0:36:48 Estimated time left for rebalance to complete : 0:01:10 volume rebalance: butcher: success **At time T4** // When it finally completed : [root@server1 ~]# gluster v rebalance butcher status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 74885 104.4GB 345001 0 0 completed 1:12:32 server2 74658 10.5GB 345747 0 0 completed 0:39:54 volume rebalance: butcher: success [root@server1 ~]# [root@server1 ~]# So at interval T1,it says ETA for completion is 38 seconds. At T2 it suddenly increased to slightly more than a minute. You can see the same thing happening at T3 interval. So,basically it keeps looping for a while at 1:10 minutes,counts down to 0 and starts with 1:10 again. This continued for another half an hour ,after which it finally completed( You can see the time diff in run time column accross the intervals). ##NUM_FILES## [root@gqac011 gluster-mount]# find . -mindepth 1 -type f | wc -l 352120 --- Additional comment from Nithya Balachandran on 2017-06-22 06:38:54 EDT --- RCA: The rebalance process calculates the file count once at the beginning and then uses the value throughout. If files are created during the rebalance , the number of files scanned may end up being less than the initially estimated number of files. In that case, rebalance used to just increment the number by 10K and continue. Based on the scan rate in the setup on which the bug was filed that works out to 1 min 10 s. Now the rebalance process will periodically update the file count. However, this need not make the estimates more accurate as the newly added files may not be processed if the parent dirs have already been processed.
REVIEW: https://review.gluster.org/17607 (cluster/dht: rebalance gets file count periodically) posted (#1) for review on master by N Balachandran (nbalacha)
COMMIT: https://review.gluster.org/17607 committed in master by Raghavendra G (rgowdapp) ------ commit d66fb14a952729caf51c8328448a548c4d198082 Author: N Balachandran <nbalacha> Date: Thu Jun 22 15:56:28 2017 +0530 cluster/dht: rebalance gets file count periodically The rebalance used to get the file count in the beginning and not update it. This caused estimates to fail if the number changed during the rebalance. The rebalance now updates the file count periodically. Change-Id: I1667ee69e8a1d7d6bc6bc2f060fad7f989d19ed4 BUG: 1464110 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/17607 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report. glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html [2] https://www.gluster.org/pipermail/gluster-users/