1479446 – Rebalance estimate(ETA) shows wrong details(as intial message of 10min wait reappears) when still in progress

Bug 1479446 - Rebalance estimate(ETA) shows wrong details(as intial message of 10min wait reappears) when still in progress

Summary: Rebalance estimate(ETA) shows wrong details(as intial message of 10min wait r...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.4.z Batch Update 2
Assignee:	Nithya Balachandran
QA Contact:	Prasad Desala
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1479528 1511271
TreeView+	depends on / blocked

Reported:	2017-08-08 14:50 UTC by Nag Pavan Chilakam
Modified:	2018-12-17 17:07 UTC (History)
CC List:	10 users (show)
Fixed In Version:	glusterfs-3.12.2-27
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1479528 (view as bug list)
Environment:
Last Closed:	2018-12-17 17:07:02 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:3827	0	None	None	None	2018-12-17 17:07:17 UTC

Description Nag Pavan Chilakam 2017-08-08 14:50:56 UTC

Description of problem:
==============================
I did a removebrick operation to convert 2x2 to 1x2 , while IOs were going on from 3 different ganesha mounts.

I noticed that at a later stage(may be >80% completed), the message of "The estimated time for rebalance to complete will be unavailable for the first 10 minutes." appears again. 

I thinks this comes when the rebalance estimated time is over, but rebalance as such is not yet completed 







Last login: Tue Aug  8 19:32:38 2017 from 10.70.35.77
[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost             5145         7.4MB         10594             0             0          in progress        0:06:38
       dhcp46-101.lab.eng.blr.redhat.com             4142        21.7MB          8722             0             0          in progress        0:06:38
The estimated time for rebalance to complete will be unavailable for the first 10 minutes.
volume rebalance: nrep2: success
[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost             5993        31.3MB         11970             0             0          in progress        0:08:38
       dhcp46-101.lab.eng.blr.redhat.com             5050        26.6MB         10415             0             0          in progress        0:08:38
The estimated time for rebalance to complete will be unavailable for the first 10 minutes.
volume rebalance: nrep2: success

[root@dhcp46-42 ~]# gluster v rebal nrep2 status                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost             8059        62.0MB         16022             0             0          in progress        0:13:13
       dhcp46-101.lab.eng.blr.redhat.com             7208        76.2MB         14071             0             0          in progress        0:13:13
Estimated time left for rebalance to complete :        0:47:28
volume rebalance: nrep2: success
[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            10699       110.9MB         21188             0             0          in progress        0:19:58
       dhcp46-101.lab.eng.blr.redhat.com             9949       119.4MB         16739             0             0          in progress        0:19:58
Estimated time left for rebalance to complete :        0:47:25
volume rebalance: nrep2: success

[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            16839       151.7MB         28114             0             0          in progress        0:33:23
       dhcp46-101.lab.eng.blr.redhat.com            16754       184.3MB         27528             0             0          in progress        0:33:23
Estimated time left for rebalance to complete :        0:00:48
volume rebalance: nrep2: success



[root@dhcp46-42 ~]# 
[root@dhcp46-42 ~]# 
[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            20687       192.2MB         32058             0             0          in progress        0:39:16
       dhcp46-101.lab.eng.blr.redhat.com            20965       189.6MB         32669             0             0          in progress        0:39:16
Estimated time left for rebalance to complete :        0:00:06
volume rebalance: nrep2: success
[root@dhcp46-42 ~]# 

============== SEE FROM BELOW ==================

[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            21521       192.8MB         33069             0             0          in progress        0:40:28
       dhcp46-101.lab.eng.blr.redhat.com            22456       189.6MB         35708             0             0          in progress        0:40:28
The estimated time for rebalance to complete will be unavailable for the first 10 minutes.
volume rebalance: nrep2: success
[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            21669       192.8MB         33372             0             0          in progress        0:40:36
       dhcp46-101.lab.eng.blr.redhat.com            22614       189.6MB         35708             0             0          in progress        0:40:36
The estimated time for rebalance to complete will be unavailable for the first 10 minutes.
volume rebalance: nrep2: success
[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            21718       192.8MB         33372             0             0          in progress        0:40:40
       dhcp46-101.lab.eng.blr.redhat.com            22667       189.6MB         36020             0             0          in progress        0:40:40
The estimated time for rebalance to complete will be unavailable for the first 10 minutes.
volume rebalance: nrep2: success
[root@dhcp46-42 ~]# 
[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            23842       194.1MB         37488             0             0          in progress        0:43:47
       dhcp46-101.lab.eng.blr.redhat.com            23440       285.5MB         39635             0             0            completed        0:43:29
The estimated time for rebalance to complete will be unavailable for the first 10 minutes.
volume rebalance: nrep2: success


Version-Release number of selected component (if applicable):
[root@dhcp46-42 ~]# rpm -qa|grep gluster
glusterfs-api-3.8.4-38.el7rhgs.x86_64
python-gluster-3.8.4-34.el7rhgs.noarch
glusterfs-server-3.8.4-38.el7rhgs.x86_64
gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.4-16.el7rhgs.x86_64
glusterfs-3.8.4-38.el7rhgs.x86_64
glusterfs-cli-3.8.4-38.el7rhgs.x86_64
glusterfs-rdma-3.8.4-38.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.2.x86_64
vdsm-gluster-4.17.33-1.2.el7rhgs.noarch
glusterfs-libs-3.8.4-38.el7rhgs.x86_64
glusterfs-fuse-3.8.4-38.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-38.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-38.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-38.el7rhgs.x86_64





Steps to Reproduce:
1.had a 1x2 volume add-brick to convert 2x2 and rebalance was done(with some files skipped)
2.did linux untar from one client, lookups from another client(going on till end)
rename,move,chmod,chgrp from another client , but for only sometime, that too these operations were complete much before the rebalance was at this state.

3.observed rebalance eta 

Actual results:
==========
again eta starts to show the initial 10 min wait message

Comment 2 Nag Pavan Chilakam 2017-08-08 14:52:04 UTC

rebalance at end 
[root@dhcp46-42 ~]# gluster v rebal nrep2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            23842       194.1MB         37488             0             0            completed        0:44:21
       dhcp46-101.lab.eng.blr.redhat.com            23440       285.5MB         39635             0             0            completed        0:43:29
volume rebalance: nrep2: success

Comment 3 Nithya Balachandran 2017-08-08 17:29:11 UTC

Is this reproducible?

Comment 6 Nag Pavan Chilakam 2018-04-05 09:19:14 UTC

Prasad, can you check this as part of your testing(comment#3, ie if this is reproducible)

Comment 17 Prasad Desala 2018-12-05 12:44:33 UTC

Verified this BZ on glusterfs version 3.12.2-30. Followed the same steps as in the description, rebalance ETA displayed as expected.

Moving this BZ to Verified.

Comment 18 errata-xmlrpc 2018-12-17 17:07:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3827

Note You need to log in before you can comment on or make changes to this bug.