+++ This bug was initially created as a clone of Bug #1296796 +++ Description of problem: ======================= Had two node cluster (node-1 and node-2) with Distributed volume (1*2), mounted it as fuse and started IO, during IO in progress, started remove brick operation and restart glusterd on the node which is hosting the brick to remove, after glusterd restart there is not rebalance info displaying like "Rebalanced-files, size, scanned" all the things it's showing as zeros. Version-Release number of selected component (if applicable): ============================================================== glusterfs-3.7.5-14 How reproducible: ================= Always Steps to Reproduce: =================== 1.Have a two node cluster (node-1 and node-2) 2.Create a Distributed volume using both the node bricks (1*2) 3.Mounted the volume as Fuse and start IO 4. When IO is in progress, start the remove brick of node-2. 5. Check the remove brick status // it will show the rebalance info 6. Stop and start the glusterd on node-2 7. Check the remove brick status again on both the nodes //it won't show the rebalance info. Actual results: =============== No rebalance info displaying after glusterd restart Expected results: ================= It should show Rebalance info even after glusterd restart. Console log: ============ [root@dhcp42-84 ~]# gluster volume status Status of volume: Dis Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.42.84:/bricks/brick0/abc0 49272 0 Y 2916 Brick 10.70.42.84:/bricks/brick1/abc1 49273 0 Y 2935 Brick 10.70.43.35:/bricks/brick0/abc2 49155 0 Y 30032 NFS Server on localhost 2049 0 Y 3804 NFS Server on 10.70.43.35 2049 0 Y 30324 Task Status of Volume Dis ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 start volume remove-brick start: success ID: b2e6507e-838f-4cc4-9061-aa7ba84d9b30 [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.43.35 102 411.8KB 275 0 0 in progress 4.00 [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.43.35 140 978.1KB 340 0 0 in progress 6.00 [root@dhcp42-84 ~]# Stop and Start GlusterD: ======================== [root@dhcp43-35 ~]# systemctl stop glusterd [root@dhcp43-35 ~]# [root@dhcp43-35 ~]# [root@dhcp43-35 ~]# systemctl start glusterd [root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.43.35 0 0Bytes 0 0 0 in progress 1.00 [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.43.35 0 0Bytes 0 0 0 in progress 2.00 [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.43.35 0 0Bytes 0 0 0 in progress 4.00 [root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.43.35 0 0Bytes 0 0 0 completed 13.00 [root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.43.35 0 0Bytes 0 0 0 completed 13.00 [root@dhcp42-84 ~]# gluster volume remove-brick Dis 10.70.43.35:/bricks/brick0/abc2 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.43.35 0 0Bytes 0 0 0 completed 13.00 [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume info Volume Name: Dis-Rep Type: Distributed-Replicate Volume ID: 69667c02-408f-41a9-b83e-c1684e69ef03 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.42.84:/bricks/brick0/sbr00 Brick2: 10.70.42.84:/bricks/brick1/sbr11 Brick3: 10.70.43.35:/bricks/brick0/sbr22 Brick4: 10.70.43.35:/bricks/brick1/sbr33 Options Reconfigured: performance.readdir-ahead: on [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume status Status of volume: Dis-Rep Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.42.84:/bricks/brick0/sbr00 49282 0 Y 3129 Brick 10.70.42.84:/bricks/brick1/sbr11 49283 0 Y 3148 Brick 10.70.43.35:/bricks/brick0/sbr22 49165 0 Y 7257 Brick 10.70.43.35:/bricks/brick1/sbr33 49166 0 Y 7276 NFS Server on localhost 2049 0 Y 3170 Self-heal Daemon on localhost N/A N/A Y 3175 NFS Server on 10.70.43.35 2049 0 Y 7298 Self-heal Daemon on 10.70.43.35 N/A N/A Y 7303 Task Status of Volume Dis-Rep ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume unrecognized command [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2 10.70.43.35:/bricks/brick0/sbr22 10.70.43.35:/bricks/brick1/sbr33 start volume remove-brick start: success ID: 5ca18e2e-43c9-481f-ab5a-aae02240bb97 [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2 10.70.43.35:/bricks/brick0/sbr22 10.70.43.35:/bricks/brick1/sbr33 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.43.35 50 335.1KB 200 0 0 in progress 4.00 [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2 10.70.43.35:/bricks/brick0/sbr22 10.70.43.35:/bricks/brick1/sbr33 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.43.35 108 548.9KB 372 0 0 in progress 9.00 [root@dhcp42-84 ~]# <<<<<<<<<Stop and Start Glusterd>>>>>>>>>> [root@dhcp43-35 ~]# systemctl stop glusterd [root@dhcp43-35 ~]# [root@dhcp43-35 ~]# [root@dhcp43-35 ~]# systemctl start glusterd [root@dhcp43-35 ~]# <<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>> [root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2 10.70.43.35:/bricks/brick0/sbr22 10.70.43.35:/bricks/brick1/sbr33 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.43.35 0 0Bytes 0 0 0 in progress 0.00 [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2 10.70.43.35:/bricks/brick0/sbr22 10.70.43.35:/bricks/brick1/sbr33 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.43.35 0 0Bytes 0 0 0 in progress 0.00 [root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2 10.70.43.35:/bricks/brick0/sbr22 10.70.43.35:/bricks/brick1/sbr33 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.43.35 0 0Bytes 0 0 0 in progress 0.00 [root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2 10.70.43.35:/bricks/brick0/sbr22 10.70.43.35:/bricks/brick1/sbr33 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.43.35 0 0Bytes 0 0 0 in progress 0.00 [root@dhcp42-84 ~]# Thanks This bug exists for all types of volume. The issue is that only the rebalance status is stored in the node_state.info file. On restarting glusterd it is retrieved and displayed in the status. The other values like rebalance_files, scanned_files etc are not stored in the node_state.info file and hence not available for displaying in the status after restarting glusterd.
REVIEW: http://review.gluster.org/14827 (glusterd: glusterd must store all rebalance related information) posted (#1) for review on master by Sakshi Bansal
REVIEW: http://review.gluster.org/14827 (glusterd: glusterd must store all rebalance related information) posted (#2) for review on master by Sakshi Bansal
COMMIT: http://review.gluster.org/14827 committed in master by Atin Mukherjee (amukherj) ------ commit 0cd287189e5e9f876022a8c6481195bdc63ce5f8 Author: Sakshi Bansal <sabansal> Date: Wed Jun 29 12:09:06 2016 +0530 glusterd: glusterd must store all rebalance related information Change-Id: I8404b864a405411e3af2fbee46ca20330e656045 BUG: 1351021 Signed-off-by: Sakshi Bansal <sabansal> Reviewed-on: http://review.gluster.org/14827 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Atin Mukherjee <amukherj>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.0, please open a new bug report. glusterfs-3.9.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2016-November/029281.html [2] https://www.gluster.org/pipermail/gluster-users/