Description of problem: ============================== I did a removebrick operation to convert 2x2 to 1x2 , while IOs were going on from 3 different ganesha mounts. I noticed that at a later stage(may be >80% completed), the message of "The estimated time for rebalance to complete will be unavailable for the first 10 minutes." appears again. I thinks this comes when the rebalance estimated time is over, but rebalance as such is not yet completed Last login: Tue Aug 8 19:32:38 2017 from 10.70.35.77 [root@server1 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 5145 7.4MB 10594 0 0 in progress 0:06:38 server2 4142 21.7MB 8722 0 0 in progress 0:06:38 The estimated time for rebalance to complete will be unavailable for the first 10 minutes. volume rebalance: nrep2: success [root@server1 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 5993 31.3MB 11970 0 0 in progress 0:08:38 server2 5050 26.6MB 10415 0 0 in progress 0:08:38 The estimated time for rebalance to complete will be unavailable for the first 10 minutes. volume rebalance: nrep2: success [root@server1 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 8059 62.0MB 16022 0 0 in progress 0:13:13 server2 7208 76.2MB 14071 0 0 in progress 0:13:13 Estimated time left for rebalance to complete : 0:47:28 volume rebalance: nrep2: success [root@server1 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 10699 110.9MB 21188 0 0 in progress 0:19:58 server2 9949 119.4MB 16739 0 0 in progress 0:19:58 Estimated time left for rebalance to complete : 0:47:25 volume rebalance: nrep2: success [root@server1 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 16839 151.7MB 28114 0 0 in progress 0:33:23 server2 16754 184.3MB 27528 0 0 in progress 0:33:23 Estimated time left for rebalance to complete : 0:00:48 volume rebalance: nrep2: success [root@server1 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 20687 192.2MB 32058 0 0 in progress 0:39:16 server2 20965 189.6MB 32669 0 0 in progress 0:39:16 Estimated time left for rebalance to complete : 0:00:06 volume rebalance: nrep2: success ============== SEE FROM BELOW ================== [root@server1 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 21521 192.8MB 33069 0 0 in progress 0:40:28 server2 22456 189.6MB 35708 0 0 in progress 0:40:28 The estimated time for rebalance to complete will be unavailable for the first 10 minutes. volume rebalance: nrep2: success [root@server1 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 21669 192.8MB 33372 0 0 in progress 0:40:36 server2 22614 189.6MB 35708 0 0 in progress 0:40:36 The estimated time for rebalance to complete will be unavailable for the first 10 minutes. volume rebalance: nrep2: success [root@server1 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 21718 192.8MB 33372 0 0 in progress 0:40:40 server2 22667 189.6MB 36020 0 0 in progress 0:40:40 The estimated time for rebalance to complete will be unavailable for the first 10 minutes. volume rebalance: nrep2: success [root@server1 ~]# [root@server1 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 23842 194.1MB 37488 0 0 in progress 0:43:47 server2 23440 285.5MB 39635 0 0 completed 0:43:29 The estimated time for rebalance to complete will be unavailable for the first 10 minutes. volume rebalance: nrep2: success Version-Release number of selected component (if applicable): [root@server1 ~]# rpm -qa|grep gluster glusterfs-api-3.8.4-38.el7rhgs.x86_64 python-gluster-3.8.4-34.el7rhgs.noarch glusterfs-server-3.8.4-38.el7rhgs.x86_64 gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.4-16.el7rhgs.x86_64 glusterfs-3.8.4-38.el7rhgs.x86_64 glusterfs-cli-3.8.4-38.el7rhgs.x86_64 glusterfs-rdma-3.8.4-38.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.2.x86_64 vdsm-gluster-4.17.33-1.2.el7rhgs.noarch glusterfs-libs-3.8.4-38.el7rhgs.x86_64 glusterfs-fuse-3.8.4-38.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-38.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-38.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-38.el7rhgs.x86_64 Steps to Reproduce: 1.had a 1x2 volume add-brick to convert 2x2 and rebalance was done(with some files skipped) 2.did linux untar from one client, lookups from another client(going on till end) rename,move,chmod,chgrp from another client , but for only sometime, that too these operations were complete much before the rebalance was at this state. 3.observed rebalance eta Actual results: ========== again eta starts to show the initial 10 min wait message --- Additional comment from Worker Ant on 2017-11-07 00:32:20 EST --- COMMIT: https://review.gluster.org/18000 committed in master by ------------- cli: correct rebalance status elapsed check Check that elapsed time has crossed 10 mins for at least one rebalance process before displaying the estimates. Change-Id: Ib357a6f0d0125a178e94ede1e31514fdc6ce3593 BUG: 1479528 Signed-off-by: N Balachandran <nbalacha>
REVIEW: https://review.gluster.org/18698 (cli: correct rebalance status elapsed check) posted (#2) for review on release-3.12 by N Balachandran
COMMIT: https://review.gluster.org/18698 committed in release-3.12 by ------------- cli: correct rebalance status elapsed check Check that elapsed time has crossed 10 mins for at least one rebalance process before displaying the estimates. > BUG: 1479528 > Signed-off-by: N Balachandran <nbalacha> (cherry picked from commit 56aef68530b3bab27730aa62e4fbc513d3dba65f) Change-Id: Ib357a6f0d0125a178e94ede1e31514fdc6ce3593 BUG: 1511271 Signed-off-by: N Balachandran <nbalacha>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-glusterfs-3.12.3, please open a new bug report. glusterfs-glusterfs-3.12.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-devel/2017-November/053983.html [2] https://www.gluster.org/pipermail/gluster-users/