Description of problem: ============================== I did a removebrick operation to convert 2x2 to 1x2 , while IOs were going on from 3 different ganesha mounts. I noticed that at a later stage(may be >80% completed), the message of "The estimated time for rebalance to complete will be unavailable for the first 10 minutes." appears again. I thinks this comes when the rebalance estimated time is over, but rebalance as such is not yet completed Last login: Tue Aug 8 19:32:38 2017 from 10.70.35.77 [root@dhcp46-42 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 5145 7.4MB 10594 0 0 in progress 0:06:38 dhcp46-101.lab.eng.blr.redhat.com 4142 21.7MB 8722 0 0 in progress 0:06:38 The estimated time for rebalance to complete will be unavailable for the first 10 minutes. volume rebalance: nrep2: success [root@dhcp46-42 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 5993 31.3MB 11970 0 0 in progress 0:08:38 dhcp46-101.lab.eng.blr.redhat.com 5050 26.6MB 10415 0 0 in progress 0:08:38 The estimated time for rebalance to complete will be unavailable for the first 10 minutes. volume rebalance: nrep2: success [root@dhcp46-42 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 8059 62.0MB 16022 0 0 in progress 0:13:13 dhcp46-101.lab.eng.blr.redhat.com 7208 76.2MB 14071 0 0 in progress 0:13:13 Estimated time left for rebalance to complete : 0:47:28 volume rebalance: nrep2: success [root@dhcp46-42 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 10699 110.9MB 21188 0 0 in progress 0:19:58 dhcp46-101.lab.eng.blr.redhat.com 9949 119.4MB 16739 0 0 in progress 0:19:58 Estimated time left for rebalance to complete : 0:47:25 volume rebalance: nrep2: success [root@dhcp46-42 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 16839 151.7MB 28114 0 0 in progress 0:33:23 dhcp46-101.lab.eng.blr.redhat.com 16754 184.3MB 27528 0 0 in progress 0:33:23 Estimated time left for rebalance to complete : 0:00:48 volume rebalance: nrep2: success [root@dhcp46-42 ~]# [root@dhcp46-42 ~]# [root@dhcp46-42 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 20687 192.2MB 32058 0 0 in progress 0:39:16 dhcp46-101.lab.eng.blr.redhat.com 20965 189.6MB 32669 0 0 in progress 0:39:16 Estimated time left for rebalance to complete : 0:00:06 volume rebalance: nrep2: success [root@dhcp46-42 ~]# ============== SEE FROM BELOW ================== [root@dhcp46-42 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 21521 192.8MB 33069 0 0 in progress 0:40:28 dhcp46-101.lab.eng.blr.redhat.com 22456 189.6MB 35708 0 0 in progress 0:40:28 The estimated time for rebalance to complete will be unavailable for the first 10 minutes. volume rebalance: nrep2: success [root@dhcp46-42 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 21669 192.8MB 33372 0 0 in progress 0:40:36 dhcp46-101.lab.eng.blr.redhat.com 22614 189.6MB 35708 0 0 in progress 0:40:36 The estimated time for rebalance to complete will be unavailable for the first 10 minutes. volume rebalance: nrep2: success [root@dhcp46-42 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 21718 192.8MB 33372 0 0 in progress 0:40:40 dhcp46-101.lab.eng.blr.redhat.com 22667 189.6MB 36020 0 0 in progress 0:40:40 The estimated time for rebalance to complete will be unavailable for the first 10 minutes. volume rebalance: nrep2: success [root@dhcp46-42 ~]# [root@dhcp46-42 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 23842 194.1MB 37488 0 0 in progress 0:43:47 dhcp46-101.lab.eng.blr.redhat.com 23440 285.5MB 39635 0 0 completed 0:43:29 The estimated time for rebalance to complete will be unavailable for the first 10 minutes. volume rebalance: nrep2: success Version-Release number of selected component (if applicable): [root@dhcp46-42 ~]# rpm -qa|grep gluster glusterfs-api-3.8.4-38.el7rhgs.x86_64 python-gluster-3.8.4-34.el7rhgs.noarch glusterfs-server-3.8.4-38.el7rhgs.x86_64 gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.4-16.el7rhgs.x86_64 glusterfs-3.8.4-38.el7rhgs.x86_64 glusterfs-cli-3.8.4-38.el7rhgs.x86_64 glusterfs-rdma-3.8.4-38.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.2.x86_64 vdsm-gluster-4.17.33-1.2.el7rhgs.noarch glusterfs-libs-3.8.4-38.el7rhgs.x86_64 glusterfs-fuse-3.8.4-38.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-38.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-38.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-38.el7rhgs.x86_64 Steps to Reproduce: 1.had a 1x2 volume add-brick to convert 2x2 and rebalance was done(with some files skipped) 2.did linux untar from one client, lookups from another client(going on till end) rename,move,chmod,chgrp from another client , but for only sometime, that too these operations were complete much before the rebalance was at this state. 3.observed rebalance eta Actual results: ========== again eta starts to show the initial 10 min wait message
rebalance at end [root@dhcp46-42 ~]# gluster v rebal nrep2 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 23842 194.1MB 37488 0 0 completed 0:44:21 dhcp46-101.lab.eng.blr.redhat.com 23440 285.5MB 39635 0 0 completed 0:43:29 volume rebalance: nrep2: success
Is this reproducible?
Prasad, can you check this as part of your testing(comment#3, ie if this is reproducible)
Verified this BZ on glusterfs version 3.12.2-30. Followed the same steps as in the description, rebalance ETA displayed as expected. Moving this BZ to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3827