Bug 1457731

Summary:	[Scale] : Rebalance ETA (towards the end) may be inaccurate,even on a moderately large data set.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Ambarish <asoman>
Component:	distribute	Assignee:	Nithya Balachandran <nbalacha>
Status:	CLOSED ERRATA	QA Contact:	Ambarish <asoman>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.3	CC:	amukherj, bturner, rhinduja, rhs-bugs, storage-qa-internal, tdesala
Target Milestone:	---
Target Release:	RHGS 3.3.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.8.4-31	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1464110 (view as bug list)		Environment:
Last Closed:	2017-09-21 04:45:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1417151, 1464110

Description Ambarish 2017-06-01 08:09:17 UTC

Description:
------------
 
Added bricks to a dist rep volume,ran rebalance.
 
These are the rebalance ETAs at different intervals :
 
[T4 > T3 > T2 > T1]
 
**At time T1**
 
 
[root@gqas014 ~]# gluster v rebalance butcher status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            63949         9.8GB        295287             0             0          in progress        0:34:57
      gqas015.sbu.lab.eng.bos.redhat.com            64644         9.9GB        300745             0             0          in progress        0:34:57
Estimated time left for rebalance to complete :        0:00:38
volume rebalance: butcher: success
 
 
**At time T2**
 
[root@gqas014 ~]# gluster v rebalance butcher status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            64010         9.8GB        295597             0             0          in progress        0:34:58
      gqas015.sbu.lab.eng.bos.redhat.com            64705         9.9GB        300918             0             0          in progress        0:34:58
Estimated time left for rebalance to complete :        0:01:09
 
 
**At Time T3** :
 
[root@gqas014 ~]# gluster v rebalance butcher status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            68057        10.0GB        313569             0             0          in progress        0:36:46
      gqas015.sbu.lab.eng.bos.redhat.com            68904        10.2GB        319823             0             0          in progress        0:36:46
Estimated time left for rebalance to complete :        0:00:09
volume rebalance: butcher: success
[root@gqas014 ~]# gluster v rebalance butcher status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            68110        10.0GB        313882             0             0          in progress        0:36:48
      gqas015.sbu.lab.eng.bos.redhat.com            68958        10.2GB        319948             0             0          in progress        0:36:48
Estimated time left for rebalance to complete :        0:01:10
volume rebalance: butcher: success
 
 
 
**At time T4** // When it finally completed :
 
[root@gqas014 ~]# gluster v rebalance butcher status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            74885       104.4GB        345001             0             0            completed        1:12:32
      gqas015.sbu.lab.eng.bos.redhat.com            74658        10.5GB        345747             0             0            completed        0:39:54
volume rebalance: butcher: success
[root@gqas014 ~]#
[root@gqas014 ~]#
 
 
 
So at interval T1,it says ETA for completion is 38 seconds.
 
At T2 it suddenly increased to slightly more than a minute.
 
You can see the same thing happening at T3 interval.
 
So,basically it keeps looping for a while at 1:10 minutes,counts down to 0 and starts with 1:10 again.
 
This continued for another half an hour ,after which it finally completed( You can see the time diff in run time column accross the intervals).
 
 
##NUM_FILES##
[root@gqac011 gluster-mount]# find . -mindepth 1 -type f | wc -l
 
352120

Comment 6 Atin Mukherjee 2017-06-22 13:14:24 UTC

upstream patch : https://review.gluster.org/#/c/17607/

Comment 18 errata-xmlrpc 2017-09-21 04:45:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 19 errata-xmlrpc 2017-09-21 04:58:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774