1454602 – Rebalance estimate time sometimes shows negative values

Bug 1454602 - Rebalance estimate time sometimes shows negative values

Summary: Rebalance estimate time sometimes shows negative values

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.3.0
Assignee:	Nithya Balachandran
QA Contact:	Ambarish
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1417151 1457985 1460894 1460914 1475399
TreeView+	depends on / blocked

Reported:	2017-05-23 07:17 UTC by Prasad Desala
Modified:	2017-09-21 04:58 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.8.4-36
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1457985 (view as bug list)
Environment:
Last Closed:	2017-09-21 04:45:37 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:2774	0	normal	SHIPPED_LIVE	glusterfs bug fix and enhancement update	2017-09-21 08:16:29 UTC

Description Prasad Desala 2017-05-23 07:17:05 UTC

Description of problem:
=======================
On a cifs mount having a dataset of empty directories+ directories with files, started removing few bricks. When issued remove-brick status command, rebalance estimate time shows negative values. 

I have issued status for almost 21 times during remove-brick rebalance and every time it showed negative values. At the 22nd attempt, the rebalance estimate time showed positive values (at the point, rebalance ran for almost 24 mins) 

[root@dhcp47-127 samba]# gluster v remove-brick distrep 10.70.47.127:/bricks/brick6/b6 10.70.46.181:/bricks/brick6/b6 10.70.46.47:/bricks/brick6/b6 10.70.47.140:/bricks/brick6/b6 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                2         9.5KB             6             0             0            completed        0:15:16
       dhcp46-181.lab.eng.blr.redhat.com                0        0Bytes             0             0             0          in progress        0:21:32
        dhcp46-47.lab.eng.blr.redhat.com                0        0Bytes             0             0             0          in progress        0:00:00
       dhcp47-140.lab.eng.blr.redhat.com                0        0Bytes             0             0             0          in progress        0:21:21
Estimated time left for rebalance to complete : 2023406814:-21:-32


Version-Release number of selected component (if applicable):
3.8.4-25.el7rhgs.x86_64

How reproducible:
=================
1/1

Steps to Reproduce:
===================
1) Create a distributed-replicate volume and start it.
2) cifs mount the volume on a client.
3) Create a data set of empty directories+ directories with files.
4) Remove few bricks.
5) Keep running remove-brick status command and check "Estimated time left for rebalance to complete " output.

Actual results:
===============
Rebalance estimate time sometimes shows negative values.

Expected results:
=================
Rebalance estimate time should not show negative values.

Comment 7 Atin Mukherjee 2017-06-01 17:04:12 UTC

upstream patch : https://review.gluster.org/17448

Comment 10 Ambarish 2017-06-16 12:04:15 UTC

As updated in https://bugzilla.redhat.com/show_bug.cgi?id=1462181,I still see negative values for rebalance ETA on 3.8.4-28,just before it fails :

[root@gqas013 glusterfs]# gluster v rebalance testvol  status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0          in progress        0:00:00
      gqas005.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:00
      gqas006.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:00
      gqas008.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:00
volume rebalance: testvol: success
[root@gqas013 glusterfs]# gluster v rebalance testvol  status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0          in progress        0:00:03
      gqas005.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:00
      gqas006.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:00
      gqas008.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:00
Estimated time left for rebalance to complete : 2023406815:00:-3
volume rebalance: testvol: success
[root@gqas013 glusterfs]# 
[root@gqas013 glusterfs]# 
[root@gqas013 glusterfs]# gluster v rebalance testvol  status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0          in progress        0:00:06
      gqas005.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:01
      gqas006.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:01
      gqas008.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:01
Estimated time left for rebalance to complete : 2023406815:00:-1
volume rebalance: testvol: success
[root@gqas013 glusterfs]# gluster v rebalance testvol  status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0          in progress        0:00:08
      gqas005.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:03
      gqas006.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:03
      gqas008.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:03
Estimated time left for rebalance to complete : 2023406815:00:-3
volume rebalance: testvol: success
[root@gqas013 glusterfs]# gluster v rebalance testvol  status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0          in progress        0:00:09
      gqas005.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:04
      gqas006.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:04
      gqas008.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:04
Estimated time left for rebalance to complete : 2023406815:00:-4
volume rebalance: testvol: success
[root@gqas013 glusterfs]# gluster v rebalance testvol  status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0          in progress        0:00:10
      gqas005.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:05
      gqas006.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:05
      gqas008.sbu.lab.eng.bos.redhat.com                0        0Bytes             0             0             0          in progress        0:00:05
Estimated time left for rebalance to complete : 2023406815:00:-5
volume rebalance: testvol: success

rpm -qa|grep glus
glusterfs-3.8.4-28.el7rhgs.x86_64


I am moving this back to Dev for a relook.

Comment 15 Atin Mukherjee 2017-06-19 06:38:57 UTC

upstream patch : https://review.gluster.org/#/c/17564/

Comment 17 Nithya Balachandran 2017-06-22 06:03:28 UTC

Now, rebalance status will not show the estimate if the rebalance process cannot calculate the values. Scenarios where this can happen is when the rebalance process is unable to get the rate at which the files are processed (before a failure as in the test in comment#10)

Comment 22 Atin Mukherjee 2017-07-24 14:31:27 UTC

upstream patch : https://review.gluster.org/17863

Comment 23 Atin Mukherjee 2017-07-26 15:04:12 UTC

upstream 3.12 patch : https://review.gluster.org/17882
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/113576

Comment 25 Ambarish 2017-08-09 11:19:01 UTC

Neither Prasad nor I could hit in in our testing on latest downstream bits.

I am moving this BZ to Verified.

Will reopen again,if I hit it at a later time.

Comment 27 errata-xmlrpc 2017-09-21 04:45:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 28 errata-xmlrpc 2017-09-21 04:58:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Note You need to log in before you can comment on or make changes to this bug.