Description of problem: ====================== While executing commands like quota on, attach-tier, detach tier etc on a cluster with one tiered volume atleast, there are errors observed like updating the tables on other nodes of clusters. Some examples are: 1)volume remove-brick unknown: failed: Commit failed on localhost. Please check the log file for more details. 2) Sometimes when a command like detach tier or quota disable is issued on a multi node cluster, the command gets executed only on the local node and fails to get updated in the tables or graphs of other nodes. We have seen this issue even on non tiered volume sometimes , but can be seen after any tiering commands have been executed on that cluster. There seems to be a issue with management deamon b/w nodes In more detail, I issued a detach tier command from one node's cli, and following is the o/p seen from both the nodes resepctive cli: (local node, where i have been executing all the commands so far) [root@rhs-client6 glusterd]# gluster v info disperse Volume Name: disperse Type: Disperse Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a Status: Started Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: yarrow:/yarrow_200G_7/disperse Brick2: yarrow:/yarrow_200G_8/disperse Brick3: rhs-client6:/brick15/disperse [root@yarrow glusterd]# gluster v info disperse Volume Name: disperse Type: Tier Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a Status: Started Number of Bricks: 2 x 2 = 5 Transport-type: tcp Bricks: Brick1: rhs-client6:/brick16/disperse Brick2: yarrow:/yarrow_ssd_75G_2/disperse Brick3: yarrow:/yarrow_200G_7/disperse Brick4: yarrow:/yarrow_200G_8/disperse Brick5: rhs-client6:/brick15/disperse It can be clearly seen that the other nodes havent been updated. Version-Release number of selected component (if applicable): ============================================================ [root@rhs-client6 glusterd]# gluster --version glusterfs 3.7dev built on Apr 13 2015 07:14:27 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@rhs-client6 glusterd]# rpm -qa|grep gluster glusterfs-api-3.7dev-0.994.gitf522001.el6.x86_64 glusterfs-libs-3.7dev-0.994.gitf522001.el6.x86_64 glusterfs-fuse-3.7dev-0.994.gitf522001.el6.x86_64 glusterfs-3.7dev-0.994.gitf522001.el6.x86_64 glusterfs-cli-3.7dev-0.994.gitf522001.el6.x86_64 glusterfs-server-3.7dev-0.994.gitf522001.el6.x86_64 How reproducible: ================ quite easily Steps to Reproduce: ================== 1.Install latest nightly 2.create a cluster with atleast two nodes 3.create a tired volume 4. try to enable and then disable quotas and we can see the issue or else sometimes even detach tier can reproduce the issue
CLI executed logs: ================== [root@yarrow glusterfs]# gluster v info disperse Volume Name: disperse Type: Disperse Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a Status: Created Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: yarrow:/yarrow_200G_7/disperse Brick2: yarrow:/yarrow_200G_8/disperse Brick3: rhs-client6:/brick15/disperse [root@yarrow glusterfs]# gluster v start disperse volume start: disperse: success [root@yarrow glusterfs]# gluster v attach-tier disperse replica 2 yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse volume add-brick: failed: Commit failed on localhost. Please check the log file for more details. [root@yarrow glusterfs]# gluster v info disperse Volume Name: disperse Type: Tier Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a Status: Started Number of Bricks: 2 x 2 = 5 Transport-type: tcp Bricks: Brick1: rhs-client6:/brick16/disperse Brick2: yarrow:/yarrow_ssd_75G_2/disperse Brick3: yarrow:/yarrow_200G_7/disperse Brick4: yarrow:/yarrow_200G_8/disperse Brick5: rhs-client6:/brick15/disperse [root@yarrow glusterfs]# gluster v detach-tier disperse volume remove-brick unknown: failed: Commit failed on localhost. Please check the log file for more details. [root@yarrow glusterfs]# gluster v info disperse Volume Name: disperse Type: Distributed-Disperse Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a Status: Started Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: yarrow:/yarrow_200G_7/disperse Brick2: yarrow:/yarrow_200G_8/disperse Brick3: rhs-client6:/brick15/disperse [root@yarrow glusterfs]# gluster v attach-tier disperse replica 2 yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse volume add-brick: failed: /yarrow_ssd_75G_2/disperse is already part of a volume [root@yarrow glusterfs]# gluster v attach-tier disperse replica 2 yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse force volume add-brick: failed: /yarrow_ssd_75G_2/disperse is already part of a volume [root@yarrow glusterfs]# gluster v info disperse Volume Name: disperse Type: Tier Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a Status: Started Number of Bricks: 2 x 2 = 5 Transport-type: tcp Bricks: Brick1: rhs-client6:/brick16/disperse Brick2: yarrow:/yarrow_ssd_75G_2/disperse Brick3: yarrow:/yarrow_200G_7/disperse Brick4: yarrow:/yarrow_200G_8/disperse Brick5: rhs-client6:/brick15/disperse [root@yarrow glusterfs]# gluster v detach-tier disperse volume remove-brick unknown: failed: Commit failed on localhost. Please check the log file for more details. [root@yarrow glusterfs]# gluster v attach-tier disperse replica 2 yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse force volume add-brick: failed: /yarrow_ssd_75G_2/disperse is already part of a volume [root@yarrow glusterfs]#
sosreports@rhsqe-repo:/home/repo/sosreports/1211264
We have submitted fix 10108, which is not merged. The issues with detach-tier may no longer exist (I do not see them). Returning to QE to retest.
Dan, We are stilling seeing issues with glusted communication b/w nodes with a tiered volume as of 28th April. Kindly put it "ON_QA" only when the fix is availble for testing
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#1) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#2) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#3) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#4) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#5) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#6) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#7) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#8) for review on master by Vijay Bellur (vbellur)
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#9) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/10768 (tiering: Correct errors in cli and glusterd) posted (#1) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/10768 (tiering: Correct errors in cli and glusterd) posted (#2) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/10768 (tiering: Correct errors in cli and glusterd) posted (#3) for review on master by Vijay Bellur (vbellur)
REVIEW: http://review.gluster.org/10768 (tiering: Correct errors in cli and glusterd) posted (#4) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#10) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/10768 (tiering: Correct errors in cli and glusterd) posted (#5) for review on master by mohammed rafi kc (rkavunga)
COMMIT: http://review.gluster.org/10449 committed in master by Kaushal M (kaushal) ------ commit d133071e7ced1794e09ffe4ef8cb14cf5b9f7e75 Author: Mohammed Rafi KC <rkavunga> Date: Wed Apr 29 12:00:40 2015 +0530 glusterd/tiering: Exchange tier info during glusted handshake Change-Id: Ibc2f8eeb32d3e5dfd6945ca8a6d5f0f80a78ebac BUG: 1211264 Signed-off-by: Mohammed Rafi KC <rkavunga> Reviewed-on: http://review.gluster.org/10449 Reviewed-by: Kaushal M <kaushal> Tested-by: Gluster Build System <jenkins.com> Tested-by: NetBSD Build System
COMMIT: http://review.gluster.org/10768 committed in master by Kaushal M (kaushal) ------ commit f1fb71bbf677be40b7bab997221f832c7fa7527a Author: Mohammed Rafi KC <rkavunga> Date: Wed May 13 16:53:22 2015 +0530 tiering: Correct errors in cli and glusterd Problem 1: volume info shows Cold Bricks instead of Tier type eg: Volume Name: patchy2 Type: Tier Volume ID: 28c25b8d-b8a1-45dc-b4b7-cbd0b344f58f Status: Started Number of Bricks: 3 Transport-type: tcp Hot Tier : Hot Tier Type : Distribute Number of Bricks: 1 Brick1: 10.70.1.35:/home/brick43 Cold Bricks: Cold Tier Type : Distribute Number of Bricks: 2 Brick2: 10.70.1.35:/home/brick19 Brick3: 10.70.1.35:/home/brick16 Options Reconfigured: Problem 2: Detach-tier sending enums of Rebalance detach-tier has it's own Enum to send with detach-tier command, using that enums will make more appropriate. Problem 3: Wrongly sets hot_brick count during the dictionary copying for response Change-Id: Icc054a999a679456881bc70511470d32ff8a86e4 BUG: 1211264 Signed-off-by: Mohammed Rafi KC <rkavunga> Reviewed-on: http://review.gluster.org/10768 Reviewed-by: Atin Mukherjee <amukherj> Reviewed-by: Kaushal M <kaushal> Tested-by: Gluster Build System <jenkins.com> Tested-by: NetBSD Build System
Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well.
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user