Bug 1457731

Summary: [Scale] : Rebalance ETA (towards the end) may be inaccurate,even on a moderately large data set.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ambarish <asoman>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED ERRATA QA Contact: Ambarish <asoman>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: amukherj, bturner, rhinduja, rhs-bugs, storage-qa-internal, tdesala
Target Milestone: ---   
Target Release: RHGS 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-31 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1464110 (view as bug list) Environment:
Last Closed: 2017-09-21 04:45:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1417151, 1464110    

Description Ambarish 2017-06-01 08:09:17 UTC
Description:
------------
 
Added bricks to a dist rep volume,ran rebalance.
 
These are the rebalance ETAs at different intervals :
 
[T4 > T3 > T2 > T1]
 
**At time T1**
 
 
[root@gqas014 ~]# gluster v rebalance butcher status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            63949         9.8GB        295287             0             0          in progress        0:34:57
      gqas015.sbu.lab.eng.bos.redhat.com            64644         9.9GB        300745             0             0          in progress        0:34:57
Estimated time left for rebalance to complete :        0:00:38
volume rebalance: butcher: success
 
 
**At time T2**
 
[root@gqas014 ~]# gluster v rebalance butcher status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            64010         9.8GB        295597             0             0          in progress        0:34:58
      gqas015.sbu.lab.eng.bos.redhat.com            64705         9.9GB        300918             0             0          in progress        0:34:58
Estimated time left for rebalance to complete :        0:01:09
 
 
**At Time T3** :
 
[root@gqas014 ~]# gluster v rebalance butcher status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            68057        10.0GB        313569             0             0          in progress        0:36:46
      gqas015.sbu.lab.eng.bos.redhat.com            68904        10.2GB        319823             0             0          in progress        0:36:46
Estimated time left for rebalance to complete :        0:00:09
volume rebalance: butcher: success
[root@gqas014 ~]# gluster v rebalance butcher status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            68110        10.0GB        313882             0             0          in progress        0:36:48
      gqas015.sbu.lab.eng.bos.redhat.com            68958        10.2GB        319948             0             0          in progress        0:36:48
Estimated time left for rebalance to complete :        0:01:10
volume rebalance: butcher: success
 
 
 
**At time T4** // When it finally completed :
 
[root@gqas014 ~]# gluster v rebalance butcher status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            74885       104.4GB        345001             0             0            completed        1:12:32
      gqas015.sbu.lab.eng.bos.redhat.com            74658        10.5GB        345747             0             0            completed        0:39:54
volume rebalance: butcher: success
[root@gqas014 ~]#
[root@gqas014 ~]#
 
 
 
So at interval T1,it says ETA for completion is 38 seconds.
 
At T2 it suddenly increased to slightly more than a minute.
 
You can see the same thing happening at T3 interval.
 
So,basically it keeps looping for a while at 1:10 minutes,counts down to 0 and starts with 1:10 again.
 
This continued for another half an hour ,after which it finally completed( You can see the time diff in run time column accross the intervals).
 
 
##NUM_FILES##
[root@gqac011 gluster-mount]# find . -mindepth 1 -type f | wc -l
 
352120

Comment 6 Atin Mukherjee 2017-06-22 13:14:24 UTC
upstream patch : https://review.gluster.org/#/c/17607/

Comment 18 errata-xmlrpc 2017-09-21 04:45:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 19 errata-xmlrpc 2017-09-21 04:58:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774