1351021 – [DHT]: Rebalance info for remove brick operation is not showing after glusterd restart

Bug 1351021 - [DHT]: Rebalance info for remove brick operation is not showing after glusterd restart

Summary: [DHT]: Rebalance info for remove brick operation is not showing after glust...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Atin Mukherjee
QA Contact:
Docs Contact:
URL:
Whiteboard:	dht-remove-brick
Depends On:	1296796
Blocks:	1352771
TreeView+	depends on / blocked

Reported:	2016-06-29 05:10 UTC by krishnaram Karthick
Modified:	2017-03-27 18:27 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.9.0
Clone Of:	1296796
Clones:	1352771 (view as bug list)
Environment:
Last Closed:	2017-03-27 18:27:08 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description krishnaram Karthick 2016-06-29 05:10:52 UTC

+++ This bug was initially created as a clone of Bug #1296796 +++

Description of problem:
=======================
Had two node cluster (node-1 and node-2)  with Distributed volume (1*2), mounted it as fuse  and started IO, during IO in progress, started remove brick operation and restart glusterd on the node which is hosting the brick to remove,
after glusterd restart there is not rebalance info displaying like "Rebalanced-files,     size,       scanned" all the things it's showing as zeros.



Version-Release number of selected component (if applicable):
==============================================================
glusterfs-3.7.5-14


How reproducible:
=================
Always

Steps to Reproduce:
===================
1.Have a two node cluster (node-1 and node-2)
2.Create a Distributed volume using both the node bricks (1*2)
3.Mounted the volume as Fuse and start IO
4. When IO is in progress, start the remove brick of node-2.
5. Check the remove brick status // it will show the rebalance info
6. Stop and start the glusterd on node-2
7. Check the remove brick status again on both the nodes //it won't show the rebalance info.

Actual results:
===============
No rebalance info displaying after glusterd restart


Expected results:
=================
It should show Rebalance info even after glusterd restart.



Console log:
============

[root@dhcp42-84 ~]# gluster volume status
Status of volume: Dis
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.42.84:/bricks/brick0/abc0       49272     0          Y       2916 
Brick 10.70.42.84:/bricks/brick1/abc1       49273     0          Y       2935 
Brick 10.70.43.35:/bricks/brick0/abc2       49155     0          Y       30032
NFS Server on localhost                     2049      0          Y       3804 
NFS Server on 10.70.43.35                   2049      0          Y       30324
 
Task Status of Volume Dis
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp42-84 ~]# 
[root@dhcp42-84 ~]# 
[root@dhcp42-84 ~]# 
[root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 start
volume remove-brick start: success
ID: b2e6507e-838f-4cc4-9061-aa7ba84d9b30
[root@dhcp42-84 ~]# 
[root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                             10.70.43.35              102       411.8KB           275             0             0          in progress               4.00
[root@dhcp42-84 ~]# 
[root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                             10.70.43.35              140       978.1KB           340             0             0          in progress               6.00
[root@dhcp42-84 ~]# 

Stop and Start GlusterD:
========================
[root@dhcp43-35 ~]# systemctl stop glusterd
[root@dhcp43-35 ~]# 
[root@dhcp43-35 ~]# 
[root@dhcp43-35 ~]# systemctl start  glusterd




[root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
[root@dhcp42-84 ~]# 

[root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes             0             0             0          in progress               1.00
[root@dhcp42-84 ~]# 
[root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes             0             0             0          in progress               2.00
[root@dhcp42-84 ~]# 
[root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes             0             0             0          in progress               4.00

[root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes             0             0             0            completed              13.00
[root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes             0             0             0            completed              13.00
[root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes             0             0             0            completed              13.00
[root@dhcp42-84 ~]#



[root@dhcp42-84 ~]# gluster volume info
 
Volume Name: Dis-Rep
Type: Distributed-Replicate
Volume ID: 69667c02-408f-41a9-b83e-c1684e69ef03
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.42.84:/bricks/brick0/sbr00
Brick2: 10.70.42.84:/bricks/brick1/sbr11
Brick3: 10.70.43.35:/bricks/brick0/sbr22
Brick4: 10.70.43.35:/bricks/brick1/sbr33
Options Reconfigured:
performance.readdir-ahead: on
[root@dhcp42-84 ~]# 


[root@dhcp42-84 ~]# gluster volume status
Status of volume: Dis-Rep
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.42.84:/bricks/brick0/sbr00      49282     0          Y       3129 
Brick 10.70.42.84:/bricks/brick1/sbr11      49283     0          Y       3148 
Brick 10.70.43.35:/bricks/brick0/sbr22      49165     0          Y       7257 
Brick 10.70.43.35:/bricks/brick1/sbr33      49166     0          Y       7276 
NFS Server on localhost                     2049      0          Y       3170 
Self-heal Daemon on localhost               N/A       N/A        Y       3175 
NFS Server on 10.70.43.35                   2049      0          Y       7298 
Self-heal Daemon on 10.70.43.35             N/A       N/A        Y       7303 
 
Task Status of Volume Dis-Rep
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp42-84 ~]# 
[root@dhcp42-84 ~]# 
[root@dhcp42-84 ~]# 
[root@dhcp42-84 ~]# gluster volume 
unrecognized command
[root@dhcp42-84 ~]# 
[root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep  replica 2 10.70.43.35:/bricks/brick0/sbr22  10.70.43.35:/bricks/brick1/sbr33  start
volume remove-brick start: success
ID: 5ca18e2e-43c9-481f-ab5a-aae02240bb97
[root@dhcp42-84 ~]# 
[root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep  replica 2 10.70.43.35:/bricks/brick0/sbr22  10.70.43.35:/bricks/brick1/sbr33  status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                             10.70.43.35               50       335.1KB           200             0             0          in progress               4.00
[root@dhcp42-84 ~]# 
[root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep  replica 2 10.70.43.35:/bricks/brick0/sbr22  10.70.43.35:/bricks/brick1/sbr33  status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                             10.70.43.35              108       548.9KB           372             0             0          in progress               9.00
[root@dhcp42-84 ~]# 

<<<<<<<<<Stop and Start Glusterd>>>>>>>>>>

[root@dhcp43-35 ~]# systemctl stop glusterd
[root@dhcp43-35 ~]# 
[root@dhcp43-35 ~]# 
[root@dhcp43-35 ~]# systemctl start  glusterd
[root@dhcp43-35 ~]# 

<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>

[root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep  replica 2 10.70.43.35:/bricks/brick0/sbr22  10.70.43.35:/bricks/brick1/sbr33  status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes             0             0             0          in progress               0.00
[root@dhcp42-84 ~]# 
[root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep  replica 2 10.70.43.35:/bricks/brick0/sbr22  10.70.43.35:/bricks/brick1/sbr33  status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes             0             0             0          in progress               0.00
[root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep  replica 2 10.70.43.35:/bricks/brick0/sbr22  10.70.43.35:/bricks/brick1/sbr33  status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes             0             0             0          in progress               0.00
[root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep  replica 2 10.70.43.35:/bricks/brick0/sbr22  10.70.43.35:/bricks/brick1/sbr33  status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                             10.70.43.35                0        0Bytes             0             0             0          in progress               0.00
[root@dhcp42-84 ~]# 

Thanks



This bug exists for all types of volume. The issue is that only the rebalance status is stored in the node_state.info file. On restarting glusterd it is retrieved and displayed in the status. The other values like rebalance_files, scanned_files etc are not stored in the node_state.info file and hence not available for displaying in the status after restarting glusterd.

Comment 1 Vijay Bellur 2016-06-29 07:25:15 UTC

REVIEW: http://review.gluster.org/14827 (glusterd: glusterd must store all rebalance related information) posted (#1) for review on master by Sakshi Bansal

Comment 2 Vijay Bellur 2016-07-04 08:17:12 UTC

REVIEW: http://review.gluster.org/14827 (glusterd: glusterd must store all rebalance related information) posted (#2) for review on master by Sakshi Bansal

Comment 3 Vijay Bellur 2016-07-04 12:35:00 UTC

COMMIT: http://review.gluster.org/14827 committed in master by Atin Mukherjee (amukherj) 
------
commit 0cd287189e5e9f876022a8c6481195bdc63ce5f8
Author: Sakshi Bansal <sabansal>
Date:   Wed Jun 29 12:09:06 2016 +0530

    glusterd: glusterd must store all rebalance related information
    
    Change-Id: I8404b864a405411e3af2fbee46ca20330e656045
    BUG: 1351021
    Signed-off-by: Sakshi Bansal <sabansal>
    Reviewed-on: http://review.gluster.org/14827
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>

Comment 5 Shyamsundar 2017-03-27 18:27:08 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.0, please open a new bug report.

glusterfs-3.9.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2016-November/029281.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.