Bug 1467209 - [Scale] : Rebalance ETA shows the initial estimate to be ~140 days,finishes within 18 hours though.
[Scale] : Rebalance ETA shows the initial estimate to be ~140 days,finishes w...
Status: MODIFIED
Product: GlusterFS
Classification: Community
Component: distribute (Show other bugs)
mainline
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Nithya Balachandran
:
Depends On: 1460936
Blocks: 1475192
  Show dependency treegraph
 
Reported: 2017-07-03 03:32 EDT by Nithya Balachandran
Modified: 2017-07-26 04:01 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1460936
: 1475192 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Comment 1 Nithya Balachandran 2017-07-03 03:33:15 EDT
--- Additional comment from Nithya Balachandran on 2017-06-14 02:15:27 EDT ---

The rebalance estimate feature works best when the files are of a uniform size.
This is not the case with this setup where the volume contains a mix of both large and small files.


From the logs, it looks like rebalance initially spent a lot of time migrating very large files:


1413 [2017-06-12 13:14:26.923797] I [MSGID: 109028] [dht-rebalance.c:4669:gf_defrag_status_get] 0-glusterfs: Files migrated: 2, size: 21474836480, lookups: 514, failures: 0, skipped: 0
1414 [2017-06-12 13:14:28.069317] I [dht-rebalance.c:4578:gf_defrag_status_get] 0-glusterfs: TIME: num_files_lookedup=514,elapsed time = 507.000000,rate_lookedup=1.013807
1415 [2017-06-12 13:14:28.069357] I [dht-rebalance.c:4581:gf_defrag_status_get] 0-glusterfs: TIME: Estimated total time to complete = 2929242 seconds
1416 [2017-06-12 13:14:28.069369] I [dht-rebalance.c:4584:gf_defrag_status_get] 0-glusterfs: TIME: Seconds left = 2928735 seconds


So far only 2 files have been migrated but initially calculated file count shows well over 200K files. Based on this the estimated time is roughly 140 days.  


As rebalance proceeds and starts processing the smaller files, the rate goes up and the estimated time goes down.

This starts roughly around :
[2017-06-12 14:41:47.655006] I [dht-rebalance.c:4578:gf_defrag_status_get] 0-glusterfs: TIME: num_files_lookedup=137397,elapsed time = 5746.000000,rate_lookedup=23.911765
[2017-06-12 14:41:47.655044] I [dht-rebalance.c:4581:gf_defrag_status_get] 0-glusterfs: TIME: Estimated total time to complete = 124193 seconds
[2017-06-12 14:41:47.655058] I [dht-rebalance.c:4584:gf_defrag_status_get] 0-glusterfs: TIME: Seconds left = 118447 seconds


and the estimated time now is roughly 1/20th the originally calculated time (roughly 32 hours).


As the rebalance proceed further,
[2017-06-13 03:23:00.853181] I [dht-rebalance.c:4578:gf_defrag_status_get] 0-glusterfs: TIME: num_files_lookedup=3557582,elapsed time = 51419.000000,rate_lookedup=69.188082
[2017-06-13 03:23:00.853216] I [dht-rebalance.c:4581:gf_defrag_status_get] 0-glusterfs: TIME: Estimated total time to complete = 51563 seconds
[2017-06-13 03:23:00.853227] I [dht-rebalance.c:4584:gf_defrag_status_get] 0-glusterfs: TIME: Seconds left = 144 seconds


The estimated time is now 51563 s (roughly 14 hours).
Comment 2 Worker Ant 2017-07-03 03:47:31 EDT
REVIEW: https://review.gluster.org/17668 (cluster/dht: Use size to calculate estimates) posted (#1) for review on master by N Balachandran (nbalacha@redhat.com)
Comment 3 Worker Ant 2017-07-04 09:23:36 EDT
REVIEW: https://review.gluster.org/17668 (cluster/dht: Use size to calculate estimates) posted (#2) for review on master by N Balachandran (nbalacha@redhat.com)
Comment 4 Worker Ant 2017-07-06 13:57:54 EDT
REVIEW: https://review.gluster.org/17668 (cluster/dht: Use size to calculate estimates) posted (#3) for review on master by N Balachandran (nbalacha@redhat.com)
Comment 5 Worker Ant 2017-07-06 23:34:02 EDT
REVIEW: https://review.gluster.org/17668 (cluster/dht: Use size to calculate estimates) posted (#4) for review on master by N Balachandran (nbalacha@redhat.com)
Comment 6 Worker Ant 2017-07-07 00:24:24 EDT
REVIEW: https://review.gluster.org/17668 (cluster/dht: Use size to calculate estimates) posted (#5) for review on master by N Balachandran (nbalacha@redhat.com)
Comment 7 Worker Ant 2017-07-07 01:54:13 EDT
REVIEW: https://review.gluster.org/17668 (cluster/dht: Use size to calculate estimates) posted (#6) for review on master by Atin Mukherjee (amukherj@redhat.com)
Comment 8 Worker Ant 2017-07-09 08:06:25 EDT
REVIEW: https://review.gluster.org/17668 (cluster/dht: Use size to calculate estimates) posted (#7) for review on master by N Balachandran (nbalacha@redhat.com)
Comment 9 Worker Ant 2017-07-10 10:35:38 EDT
COMMIT: https://review.gluster.org/17668 committed in master by Raghavendra G (rgowdapp@redhat.com) 
------
commit 9156a743aa76c955d18c9bfcb7c1a38ba00da890
Author: N Balachandran <nbalacha@redhat.com>
Date:   Mon Jul 3 13:13:35 2017 +0530

    cluster/dht: Use size to calculate estimates
    
    The earlier approach of using the number of files
    to determine when the rebalance would complete did
    not work well when file sizes differed widely.
    
    The new approach now gets the total data size and
    uses that information to determine how long
    the rebalance is expected to take.
    
    Change-Id: I84e80a0893efab72ff06130e4596fa71c9c8c868
    BUG: 1467209
    Signed-off-by: N Balachandran <nbalacha@redhat.com>
    Reviewed-on: https://review.gluster.org/17668
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: MOHIT AGRAWAL <moagrawa@redhat.com>
    Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Comment 10 Worker Ant 2017-07-25 05:08:23 EDT
REVIEW: https://review.gluster.org/17867 (cluster/dht: Update size processed for non-migrated files) posted (#1) for review on master by N Balachandran (nbalacha@redhat.com)
Comment 11 Worker Ant 2017-07-25 17:52:52 EDT
COMMIT: https://review.gluster.org/17867 committed in master by Jeff Darcy (jeff@pl.atyp.us) 
------
commit 24ab0ef44a1646223b59e33d0109d8424f8eddd0
Author: N Balachandran <nbalacha@redhat.com>
Date:   Tue Jul 25 14:28:00 2017 +0530

    cluster/dht: Update size processed for non-migrated files
    
    The size of non-migrated files was not added to the
    size_processed causing incorrect rebalance estimate
    calculations. This has been fixed.
    
    Change-Id: I9f338c44da22b856e9fdc6dc558f732ae9a22f15
    BUG: 1467209
    Signed-off-by: N Balachandran <nbalacha@redhat.com>
    Reviewed-on: https://review.gluster.org/17867
    Reviewed-by: Amar Tumballi <amarts@redhat.com>
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>

Note You need to log in before you can comment on or make changes to this bug.