+++ This bug was initially created as a clone of Bug #1303125 +++ Description of problem: ======================= Have two node cluster with Distributed-Replica volume and mounted as fuse with enough data and started removing replica brick set which triggered rebalance, during rebalance in progress, restarted glusterd on a node from where data migration is happening, after that tried to commit the remove-brick, it's get committed even though data migration not completed. Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.5-17 How reproducible: ================= Every time Steps to Reproduce: ==================== 1.Have a two node cluster with Distributed-Replica volume (2 *2 ) 2.Mount the volume as Fuse and write enough data 3.Start replica brick set remove // will trigger the data migration 4.Using remove-brick status identify brick node from where data migration is happening. 5. Restart glusterd on the node identified in step-4 during rebalance in progress 6.Try to commit the remove-brick //commit will happen with out fail. Actual results: =============== remove-brick commit happens even though rebalance not completed. Expected results: ================= remove-brick commit should not happen when rebalance is in progress. Additional info: --- Additional comment from Byreddy on 2016-01-29 10:55:45 EST --- [root@dhcp42-84 ~]# gluster volume status Status of volume: Dis-Rep Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.42.84:/bricks/brick0/smp0 49157 0 Y 18500 Brick 10.70.43.6:/bricks/brick0/smp1 49162 0 Y 19368 Brick 10.70.42.84:/bricks/brick1/smp2 49158 0 Y 18519 Brick 10.70.43.6:/bricks/brick1/smp3 49163 0 Y 19387 NFS Server on localhost 2049 0 Y 18541 Self-heal Daemon on localhost N/A N/A Y 18546 NFS Server on 10.70.43.6 2049 0 Y 19409 Self-heal Daemon on 10.70.43.6 N/A N/A Y 19414 Task Status of Volume Dis-Rep ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster peer status Number of Peers: 1 Hostname: 10.70.43.6 Uuid: 2f8a267c-7e7c-488f-98b9-f816062aae58 State: Peer in Cluster (Connected) [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2 10.70.42.84:/bricks/brick1/smp2 10.70.43.6:/bricks/brick1/smp3 start volume remove-brick start: success ID: fd0164f8-2cba-4b25-b881-bbeb7b323695 [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2 10.70.42.84:/bricks/brick1/smp2 10.70.43.6:/bricks/brick1/smp3 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 59 351.4KB 417 0 0 in progress 7.00 10.70.43.6 0 0Bytes 0 0 0 in progress 7.00 [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2 10.70.42.84:/bricks/brick1/smp2 10.70.43.6:/bricks/brick1/smp3 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 93 511.0KB 627 0 0 in progress 11.00 10.70.43.6 0 0Bytes 0 0 0 in progress 11.00 [root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2 10.70.42.84:/bricks/brick1/smp2 10.70.43.6:/bricks/brick1/smp3 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 113 569.2KB 710 0 0 in progress 13.00 10.70.43.6 0 0Bytes 0 0 0 completed 12.00 [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2 10.70.42.84:/bricks/brick1/smp2 10.70.43.6:/bricks/brick1/smp3 commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: failed: use 'force' option as migration is in progress [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# systemctl restart glusterd [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2 10.70.42.84:/bricks/brick1/smp2 10.70.43.6:/bricks/brick1/smp3 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 0 in progress 0.00 10.70.43.6 0 0Bytes 0 0 0 completed 12.00 [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume remove-brick Dis-Rep replica 2 10.70.42.84:/bricks/brick1/smp2 10.70.43.6:/bricks/brick1/smp3 commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success Check the removed bricks to ensure all files are migrated. If files with data are found on the brick path, copy them via a gluster mount point before re-purposing the removed brick. [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# [root@dhcp42-84 ~]# gluster volume status Status of volume: Dis-Rep Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.42.84:/bricks/brick0/smp0 49157 0 Y 18500 Brick 10.70.43.6:/bricks/brick0/smp1 49162 0 Y 19368 NFS Server on localhost 2049 0 Y 19014 Self-heal Daemon on localhost N/A N/A Y 19022 NFS Server on 10.70.43.6 2049 0 Y 19582 Self-heal Daemon on 10.70.43.6 N/A N/A Y 19590 Task Status of Volume Dis-Rep ------------------------------------------------------------------------------ There are no active volume tasks
REVIEW: http://review.gluster.org/13323 (glusterd: set decommission_is_in_progress flag for inprogress remove-brick op on glusterd restart) posted (#1) for review on master by Atin Mukherjee (amukherj)
REVIEW: http://review.gluster.org/13323 (glusterd: set decommission_is_in_progress flag for inprogress remove-brick op on glusterd restart) posted (#2) for review on master by Atin Mukherjee (amukherj)
REVIEW: http://review.gluster.org/13323 (glusterd: set decommission_is_in_progress flag for inprogress remove-brick op on glusterd restart) posted (#3) for review on master by Atin Mukherjee (amukherj)
COMMIT: http://review.gluster.org/13323 committed in master by Atin Mukherjee (amukherj) ------ commit 3ca140f011faa9d92a4b3889607fefa33ae6de76 Author: Atin Mukherjee <amukherj> Date: Sat Jan 30 08:47:35 2016 +0530 glusterd: set decommission_is_in_progress flag for inprogress remove-brick op on glusterd restart While remove brick is in progress, if glusterd is restarted since decommission flag is not persisted in the store the same value is not retained back resulting in glusterd not blocking remove brick commit when rebalance is already in progress. Change-Id: Ibbf12f3792d65ab1293fad1e368568be141a1cd6 BUG: 1303269 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: http://review.gluster.org/13323 Smoke: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Gaurav Kumar Garg <ggarg> Reviewed-by: mohammed rafi kc <rkavunga>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user