+++ This bug was initially created as a clone of Bug #1258833 +++ Description of problem: ===================== When attaching a tier make a check to see if any rebalance operations are pending. For example, I had a remove-brick operation completed, but commit was not yet done. Now I was able to attach tier. Here There is a deadlock created as the tier deamon doesnt start by itself on attach tier as the remove brick is not commited, nor can i do a commit of remove-brick as it is a tier volume. So, make sure you add a check before going ahead of attach-tier Version-Release number of selected component (if applicable): ============================================================= [root@nag-manual-node1 glusterfs]# gluster --version glusterfs 3.7.3 built on Aug 27 2015 01:23:05 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@nag-manual-node1 glusterfs]# rpm -qa|grep gluster glusterfs-libs-3.7.3-0.82.git6c4096f.el6.x86_64 glusterfs-fuse-3.7.3-0.82.git6c4096f.el6.x86_64 glusterfs-server-3.7.3-0.82.git6c4096f.el6.x86_64 glusterfs-3.7.3-0.82.git6c4096f.el6.x86_64 glusterfs-api-3.7.3-0.82.git6c4096f.el6.x86_64 glusterfs-cli-3.7.3-0.82.git6c4096f.el6.x86_64 python-gluster-3.7.3-0.82.git6c4096f.el6.noarch glusterfs-client-xlators-3.7.3-0.82.git6c4096f.el6.x86_64 How reproducible: ==================== very easily Steps to Reproduce: =================== 1.create a distribute vol with say 4 bricks 2.now issue a remove brick and wait for it to complete 3.Now without commiting the remove brick, go ahead and attach tier 4. Now due to this the tier deamon doesnt trigger as commit is pending Nor can i commit the remove brick due to it being a tier vol. Hence deadlock Expected results: =================== disallow attach tier if there are any rebalance operations are pending. CLI LOG: ======= [root@nag-manual-node1 glusterfs]# gluster v create rebal 10.70.46.84:/rhs/brick1/rebal 10.70.46.36:/rhs/brick1/rebal 10.70.46.36:/rhs/brick2/rebal volume create: rebal: success: please start the volume to access data [root@nag-manual-node1 glusterfs]# gluster v start rebal volume start: rebal: success [root@nag-manual-node1 glusterfs]# gluster v info rebal Volume Name: rebal Type: Distribute Volume ID: 3e272970-b319-4a35-a8cd-6845190761ee Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: 10.70.46.84:/rhs/brick1/rebal Brick2: 10.70.46.36:/rhs/brick1/rebal Brick3: 10.70.46.36:/rhs/brick2/rebal Options Reconfigured: performance.readdir-ahead: on [root@nag-manual-node1 glusterfs]# gluster v remove-brick rebal 10.70.46.36:/rhs/brick2/rebal start volume remove-brick start: success ID: 464ee968-e3a4-41f0-89f7-6d6ec4ea1a62 [root@nag-manual-node1 glusterfs]# gluster v remove-brick rebal 10.70.46.36:/rhs/brick2/rebal status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.46.36 0 0Bytes 0 0 0 completed 0.00 [root@nag-manual-node1 glusterfs]# gluster v status rebal Status of volume: rebal Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.84:/rhs/brick1/rebal 49187 0 Y 7849 Brick 10.70.46.36:/rhs/brick1/rebal 49186 0 Y 32414 Brick 10.70.46.36:/rhs/brick2/rebal 49187 0 Y 32432 NFS Server on localhost 2049 0 Y 7972 NFS Server on 10.70.46.36 2049 0 Y 32452 Task Status of Volume rebal ------------------------------------------------------------------------------ Task : Remove brick ID : 464ee968-e3a4-41f0-89f7-6d6ec4ea1a62 Removed bricks: 10.70.46.36:/rhs/brick2/rebal Status : completed [root@nag-manual-node1 glusterfs]# gluster v info rebal Volume Name: rebal Type: Distribute Volume ID: 3e272970-b319-4a35-a8cd-6845190761ee Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: 10.70.46.84:/rhs/brick1/rebal Brick2: 10.70.46.36:/rhs/brick1/rebal Brick3: 10.70.46.36:/rhs/brick2/rebal Options Reconfigured: performance.readdir-ahead: on [root@nag-manual-node1 glusterfs]# gluster v attach-tier rebal 10.70.46.84:/rhs/brick4/rebalhot 10.70.46.36:/rhs/brick4/rebalhot Attach tier is recommended only for testing purposes in this release. Do you want to continue? (y/n) y volume attach-tier: success volume rebalance: rebal: failed: A remove-brick task on volume rebal is not yet committed. Either commit or stop the remove-brick task. Failed to run tier start. Please execute tier start command explictly Usage : gluster volume rebalance <volname> tier start [root@nag-manual-node1 glusterfs]# gluster v info rebal Volume Name: rebal Type: Tier Volume ID: 3e272970-b319-4a35-a8cd-6845190761ee Status: Started Number of Bricks: 5 Transport-type: tcp Hot Tier : Hot Tier Type : Distribute Number of Bricks: 2 Brick1: 10.70.46.36:/rhs/brick4/rebalhot Brick2: 10.70.46.84:/rhs/brick4/rebalhot Cold Tier: Cold Tier Type : Distribute Number of Bricks: 3 Brick3: 10.70.46.84:/rhs/brick1/rebal Brick4: 10.70.46.36:/rhs/brick1/rebal Brick5: 10.70.46.36:/rhs/brick2/rebal Options Reconfigured: performance.readdir-ahead: on [root@nag-manual-node1 glusterfs]# gluster v status rebal Status of volume: rebal Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick 10.70.46.36:/rhs/brick4/rebalhot 49188 0 Y 32571 Brick 10.70.46.84:/rhs/brick4/rebalhot 49188 0 Y 8027 Cold Bricks: Brick 10.70.46.84:/rhs/brick1/rebal 49187 0 Y 7849 Brick 10.70.46.36:/rhs/brick1/rebal 49186 0 Y 32414 Brick 10.70.46.36:/rhs/brick2/rebal 49187 0 Y 32432 NFS Server on localhost 2049 0 Y 8047 NFS Server on 10.70.46.36 2049 0 Y 32590 Task Status of Volume rebal ------------------------------------------------------------------------------ Task : Remove brick ID : 464ee968-e3a4-41f0-89f7-6d6ec4ea1a62 Removed bricks: 10.70.46.36:/rhs/brick2/rebal Status : completed [root@nag-manual-node1 glusterfs]# gluster v rebal rebal status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.46.36 0 0Bytes 0 0 0 completed 0.00 volume rebalance: rebal: success: [root@nag-manual-node1 glusterfs]# gluster v rebal rebal tier status Node Promoted files Demoted files Status --------- --------- --------- --------- localhost 0 0 not started 10.70.46.36 0 0 completed root@nag-manual-node1 glusterfs]# gluster v rebalance rebal tier start volume rebalance: rebal: failed: A remove-brick task on volume rebal is not yet committed. Either commit or stop the remove-brick task. [root@nag-manual-node1 glusterfs]# gluster v rebalance rebal tier status [root@nag-manual-node1 glusterfs]# gluster v remove-brick rebal 10.70.46.36:/rhs/brick2/rebal commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: failed: Removing brick from a Tier volume is not allowed --- Additional comment from nchilaka on 2015-09-01 08:07:20 EDT --- Workaround: ========== >do a detach tier commit forcefully >do a remove brick commit forcefully(though the remove brick operation doesnt show up anymore in the vol status or rebalance status >reattach the tier [root@nag-manual-node1 glusterfs]# gluster v detach-tier rebal commit volume detach-tier commit: failed: Brick 10.70.46.84:/rhs/brick4/rebalhot is not decommissioned. Use start or force option [root@nag-manual-node1 glusterfs]# gluster v detach-tier rebal commit force volume detach-tier commit force: success [root@nag-manual-node1 glusterfs]# gluster v info rebal Volume Name: rebal Type: Distribute Volume ID: 3e272970-b319-4a35-a8cd-6845190761ee Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: 10.70.46.84:/rhs/brick1/rebal Brick2: 10.70.46.36:/rhs/brick1/rebal Brick3: 10.70.46.36:/rhs/brick2/rebal Options Reconfigured: performance.readdir-ahead: on [root@nag-manual-node1 glusterfs]# gluster v status rebal Status of volume: rebal Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.84:/rhs/brick1/rebal 49187 0 Y 7849 Brick 10.70.46.36:/rhs/brick1/rebal 49186 0 Y 32414 Brick 10.70.46.36:/rhs/brick2/rebal 49187 0 Y 32432 NFS Server on localhost 2049 0 Y 8455 NFS Server on 10.70.46.36 2049 0 Y 402 Task Status of Volume rebal ------------------------------------------------------------------------------ There are no active volume tasks [root@nag-manual-node1 glusterfs]# gluster v rebal rebal status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- volume rebalance: rebal: success: [root@nag-manual-node1 glusterfs]# gluster v remove-brick rebal 10.70.46.36:/rhs/brick2/rebal commit Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit: success Check the removed bricks to ensure all files are migrated. If files with data are found on the brick path, copy them via a gluster mount point before re-purposing the removed brick. --- Additional comment from Mohammed Rafi KC on 2015-09-10 04:50:15 EDT --- Nag, Thanks for catching this bug. Good work
REVIEW: http://review.gluster.org/12148 (Tier/glusterd: Do not allow attach-tier if remove-brick is not committed) posted (#1) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/12148 (Tier/glusterd: Do not allow attach-tier if remove-brick is not committed) posted (#2) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/12148 (Tier/glusterd: Do not allow attach-tier if remove-brick is not committed) posted (#3) for review on master by mohammed rafi kc (rkavunga)
COMMIT: http://review.gluster.org/12148 committed in master by Dan Lambright (dlambrig) ------ commit bc11be7864eb7f22ad6b529e95bac5a2833f5a01 Author: Mohammed Rafi KC <rkavunga> Date: Thu Sep 10 14:19:06 2015 +0530 Tier/glusterd: Do not allow attach-tier if remove-brick is not committed When attaching a tier, if there is a pending remove-brick task, then should not allow attach-tier. Since we are not supporting add/remove brick on a tiered volume, we won't able to commit pending remove-brick after attaching the tier Change-Id: Ib434e2e6bc75f0908762f087ad1ca711e6b62818 BUG: 1261819 Signed-off-by: Mohammed Rafi KC <rkavunga> Reviewed-on: http://review.gluster.org/12148 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Dan Lambright <dlambrig> Tested-by: Dan Lambright <dlambrig>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user