Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1460894

Summary: Rebalance estimate time sometimes shows negative values
Product: [Community] GlusterFS Reporter: Nithya Balachandran <nbalacha>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.11CC: bugs, rhinduja, rhs-bugs, storage-qa-internal, tdesala
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.11.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1457985 Environment:
Last Closed: 2017-06-28 18:32:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1454602, 1457985, 1460914, 1475399    
Bug Blocks:    

Description Nithya Balachandran 2017-06-13 05:02:42 UTC
+++ This bug was initially created as a clone of Bug #1457985 +++

+++ This bug was initially created as a clone of Bug #1454602 +++

Description of problem:
=======================
On a cifs mount having a dataset of empty directories+ directories with files, started removing few bricks. When issued remove-brick status command, rebalance estimate time shows negative values. 

I have issued status for almost 21 times during remove-brick rebalance and every time it showed negative values. At the 22nd attempt, the rebalance estimate time showed positive values (at the point, rebalance ran for almost 24 mins) 

[root@server1 samba]# gluster v remove-brick distrep server1:/bricks/brick6/b6 server2:/bricks/brick6/b6 server3:/bricks/brick6/b6 server4:/bricks/brick6/b6 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                2         9.5KB             6             0             0            completed        0:15:16
       server1.redhat.com                0        0Bytes             0             0             0          in progress        0:21:32
        server2.redhat.com                0        0Bytes             0             0             0          in progress        0:00:00
       server3.redhat.com                0        0Bytes             0             0             0          in progress        0:21:21
Estimated time left for rebalance to complete : 2023406814:-21:-32


Version-Release number of selected component (if applicable):
3.8.4-25.el7rhgs.x86_64

How reproducible:
=================
1/1

Steps to Reproduce:
===================
1) Create a distributed-replicate volume and start it.
2) cifs mount the volume on a client.
3) Create a data set of empty directories+ directories with files.
4) Remove few bricks.
5) Keep running remove-brick status command and check "Estimated time left for rebalance to complete " output.

Actual results:
===============
Rebalance estimate time sometimes shows negative values.

Expected results:
=================
Rebalance estimate time should not show negative values.


From distrep-rebalance.log in sosreport-sysreg-prod.negativevalues-20170523062944:  


[2017-05-23 05:54:01.319951] I [dht-rebalance.c:4425:gf_defrag_status_get] 0-glusterfs: TIME: num_files_lookedup=0,elapsed time = 51.000000,rate_lookedup=0.000000
[2017-05-23 05:54:01.320001] I [dht-rebalance.c:4428:gf_defrag_status_get] 0-glusterfs: TIME: Estimated total time to complete = 0 seconds
[2017-05-23 05:54:01.320012] I [dht-rebalance.c:4431:gf_defrag_status_get] 0-glusterfs: TIME: Seconds left = 18446744073709551565 seconds


This skews the results causing the weird result seen.

Easily reproducible by running rebalance on a volume with only dirs (no files).

--- Additional comment from Worker Ant on 2017-06-01 13:00:56 EDT ---

REVIEW: https://review.gluster.org/17448 (cluster/dht: Include dirs in rebalance estimates) posted (#1) for review on master by N Balachandran (nbalacha)

--- Additional comment from Worker Ant on 2017-06-07 00:02:27 EDT ---

COMMIT: https://review.gluster.org/17448 committed in master by Raghavendra G (rgowdapp) 
------
commit c9860430a77f20ddfec532819542bb1d0187c06e
Author: N Balachandran <nbalacha>
Date:   Thu Jun 1 22:13:41 2017 +0530

    cluster/dht: Include dirs in rebalance estimates
    
    Empty directories were not being considered while
    calculating rebalance estimates leading to negative
    time-left values being displayed as part of the
    rebalance status.
    
    Change-Id: I48d41d702e72db30af10e6b87b628baa605afa98
    BUG: 1457985
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: https://review.gluster.org/17448
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Amar Tumballi <amarts>
    Reviewed-by: Raghavendra G <rgowdapp>

Comment 1 Worker Ant 2017-06-13 05:09:47 UTC
REVIEW: https://review.gluster.org/17527 (cluster/dht: Include dirs in rebalance estimates) posted (#1) for review on release-3.11 by N Balachandran (nbalacha)

Comment 2 Worker Ant 2017-06-13 14:19:01 UTC
COMMIT: https://review.gluster.org/17527 committed in release-3.11 by Shyamsundar Ranganathan (srangana) 
------
commit 1037b006ad818c862d7af9308ca02e5f83ebd02a
Author: N Balachandran <nbalacha>
Date:   Thu Jun 1 22:13:41 2017 +0530

    cluster/dht: Include dirs in rebalance estimates
    
    Empty directories were not being considered while
    calculating rebalance estimates leading to negative
    time-left values being displayed as part of the
    rebalance status.
    
    > BUG: 1457985
    > Signed-off-by: N Balachandran <nbalacha>
    > Reviewed-on: https://review.gluster.org/17448
    > Smoke: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Amar Tumballi <amarts>
    > Reviewed-by: Raghavendra G <rgowdapp>
    
    Change-Id: I48d41d702e72db30af10e6b87b628baa605afa98
    BUG: 1460894
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: https://review.gluster.org/17527
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>

Comment 3 Nithya Balachandran 2017-06-22 05:17:11 UTC
Moving this back to POST as there is another patch required.

Comment 4 Worker Ant 2017-06-22 05:31:52 UTC
REVIEW: https://review.gluster.org/17598 (cluster/dht: Additional checks for rebalance estimates) posted (#1) for review on release-3.11 by N Balachandran (nbalacha)

Comment 5 Worker Ant 2017-06-22 20:54:12 UTC
COMMIT: https://review.gluster.org/17598 committed in release-3.11 by Shyamsundar Ranganathan (srangana) 
------
commit 0aafbacc2b97ab010506d42b8c1f34f5c67bf258
Author: N Balachandran <nbalacha>
Date:   Mon Jun 19 11:50:28 2017 +0530

    cluster/dht: Additional checks for rebalance estimates
    
    The rebalance estimates calculation was not handling
    calculations correctly when no files had been processed,
    i.e., when rate_lookedup was 0.
    
    Now, the estimated time is set to 0 in such scenarios as
    there is no way for rebalance to figure out how long the
    process will take to complete without knowing the rate at
    which the files are being processed.
    
    > BUG: 1457985
    > Signed-off-by: N Balachandran <nbalacha>
    > Reviewed-on: https://review.gluster.org/17564
    > Smoke: Gluster Build System <jenkins.org>
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Amar Tumballi <amarts>
    > Reviewed-by: Raghavendra G <rgowdapp>
    
    Change-Id: I7b6378e297e1ba139852bcb2239adf2477336b5b
    BUG: 1460894
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: https://review.gluster.org/17598
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 6 Shyamsundar 2017-06-28 18:32:26 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.1, please open a new bug report.

glusterfs-3.11.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-June/000074.html
[2] https://www.gluster.org/pipermail/gluster-users/