+++ This bug was initially created as a clone of Bug #1225716 +++ +++ This bug was initially created as a clone of Bug #1201205 +++ Description of problem: Remove brick command execution displays success even after, one of the bricks down. But gluster v status <vol> shows remove-brick process failed and so the rebalance log messages. BUild found: [root@rhsauto032 ~]# rpm -qa | grep gluster gluster-nagios-common-0.1.4-1.el6rhs.noarch glusterfs-3.6.0.50-1.el6rhs.x86_64 glusterfs-server-3.6.0.50-1.el6rhs.x86_64 gluster-nagios-addons-0.1.14-1.el6rhs.x86_64 samba-glusterfs-3.6.509-169.4.el6rhs.x86_64 glusterfs-libs-3.6.0.50-1.el6rhs.x86_64 glusterfs-api-3.6.0.50-1.el6rhs.x86_64 glusterfs-cli-3.6.0.50-1.el6rhs.x86_64 glusterfs-geo-replication-3.6.0.50-1.el6rhs.x86_64 vdsm-gluster-4.14.7.3-1.el6rhs.noarch glusterfs-fuse-3.6.0.50-1.el6rhs.x86_64 glusterfs-rdma-3.6.0.50-1.el6rhs.x86_64 [root@rhsauto032 ~]# [root@rhsauto032 ~]# glusterfs --version glusterfs 3.6.0.50 built on Mar 6 2015 11:04:46 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. [root@rhsauto032 ~]# Reproducible: continuously. Steps: 1. create a distribute volume with N brick (N > 3) 2. bring down one of the brick 3. initiate remove-brick Expected result: Remove brick should not start output from the test: [root@rhsauto032 ~]# gluster v info dist Volume Name: dist Type: Distribute Volume ID: 6725427c-e363-4695-a4ac-65ec65ab0997 Status: Started Snap Volume: no Number of Bricks: 5 Transport-type: tcp Bricks: Brick1: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d1 Brick2: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d2 Brick3: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d3 Brick4: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d4 Brick5: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d0_2 Options Reconfigured: performance.readdir-ahead: on auto-delete: disable snap-max-soft-limit: 90 snap-max-hard-limit: 256 [root@rhsauto032 ~]# gluster v status dist Status of volume: dist Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d1 49214 0 Y 14069 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d2 49215 0 Y 14078 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d3 49216 0 Y 14084 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d4 49217 0 Y 14093 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d0_2 49219 0 Y 14102 NFS Server on localhost 2049 0 Y 31543 NFS Server on rhsauto040.lab.eng.blr.redhat .com 2049 0 Y 16743 NFS Server on rhsauto034.lab.eng.blr.redhat .com 2049 0 Y 25003 Task Status of Volume dist ------------------------------------------------------------------------------ There are no active volume tasks [root@rhsauto032 ~]# kill -9 14069 [root@rhsauto032 ~]# gluster v status dist Status of volume: dist Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d1 N/A N/A N 14069 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d2 49215 0 Y 14078 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d3 49216 0 Y 14084 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d4 49217 0 Y 14093 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d0_2 49219 0 Y 14102 NFS Server on localhost 2049 0 Y 31543 NFS Server on rhsauto040.lab.eng.blr.redhat .com 2049 0 Y 16743 NFS Server on rhsauto034.lab.eng.blr.redhat .com 2049 0 Y 25003 Task Status of Volume dist ------------------------------------------------------------------------------ There are no active volume tasks [root@rhsauto032 ~]# gluster v remove-brick dist rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d1 start volume remove-brick start: success ID: a3e82c82-c2ba-4c02-b09d-c3414246c0d4 [root@rhsauto032 ~]# gluster v status dist Status of volume: dist Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d1 N/A N/A N 14069 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d2 49215 0 Y 14078 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d3 49216 0 Y 14084 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d4 49217 0 Y 14093 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d0_2 49219 0 Y 14102 NFS Server on localhost 2049 0 Y 31543 NFS Server on rhsauto040.lab.eng.blr.redhat .com 2049 0 Y 20221 NFS Server on rhsauto034.lab.eng.blr.redhat .com 2049 0 Y 28515 Task Status of Volume dist ------------------------------------------------------------------------------ Task : Remove brick ID : a3e82c82-c2ba-4c02-b09d-c3414246c0d4glust Removed bricks: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d1 Status : failed [root@rhsauto032 ~]# Log messages: [2015-03-11 02:00:28.209263] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-03-11 02:00:33.232184] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'node-uuid' for volume 'dist-dht' with value '86741341-4584-4a10-ac2a-32cf9230c967' [2015-03-11 02:00:33.232210] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'rebalance-cmd' for volume 'dist-dht' with value '5' [2015-03-11 02:00:33.232223] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'readdir-optimize' for volume 'dist-dht' with value 'on' [2015-03-11 02:00:33.232235] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'assert-no-child-down' for volume 'dist-dht' with value 'yes' [2015-03-11 02:00:33.232246] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'lookup-unhashed' for volume 'dist-dht' with value 'yes' [2015-03-11 02:00:33.232258] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'use-readdirp' for volume 'dist-dht' with value 'yes' [2015-03-11 02:00:33.233257] I [dht-shared.c:272:dht_parse_decommissioned_bricks] 0-dist-dht: decommissioning subvolume dist-client-1 [2015-03-11 02:00:33.233380] I [dht-shared.c:337:dht_init_regex] 0-dist-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$ [2015-03-11 02:00:33.236568] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2015-03-11 02:00:33.239365] I [client.c:2350:notify] 0-dist-client-1: parent translators are ready, attempting connect on transport [2015-03-11 02:00:33.244180] I [client.c:2350:notify] 0-dist-client-2: parent translators are ready, attempting connect on transport [2015-03-11 02:00:33.244920] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-dist-client-1: changing port to 49214 (from 0) [2015-03-11 02:00:33.251076] I [client.c:2350:notify] 0-dist-client-3: parent translators are ready, attempting connect on transport [2015-03-11 02:00:33.254459] E [socket.c:2213:socket_connect_finish] 0-dist-client-1: connection to 10.70.37.7:49214 failed (Connection refused) [2015-03-11 02:00:33.254506] W [dht-common.c:6044:dht_notify] 0-dist-dht: Received CHILD_DOWN. Exiting [2015-03-11 02:00:33.254767] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-dist-client-2: changing port to 49215 (from 0) [2015-03-11 02:00:33.258997] I [client.c:2350:notify] 0-dist-client-4: parent translators are ready, attempting connect on transport [2015-03-11 02:00:33.262513] I [client-handshake.c:1412:select_server_supported_programs] 0-dist-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-03-11 02:00:33.262886] I [client-handshake.c:1200:client_setvolume_cbk] 0-dist-client-2: Connected to dist-client-2, attached to remote volume '/rhs/brick1/d2'. [2015-03-11 02:00:33.262911] I [client-handshake.c:1210:client_setvolume_cbk] 0-dist-client-2: Server and Client lk-version numbers are not same, reopening the fds [2015-03-11 02:00:33.263263] I [client-handshake.c:187:client_set_lk_version_cbk] 0-dist-client-2: Server lk version = 1 [2015-03-11 02:00:33.263661] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-dist-client-3: changing port to 49216 (from 0) --- Additional comment from Anand Avati on 2015-05-28 10:52:08 IST --- REVIEW: http://review.gluster.org/10954 (dht: check if all bricks are started before performing remove-brick) posted (#1) for review on master by Sakshi Bansal (sabansal) --- Additional comment from Anand Avati on 2015-06-08 18:33:50 IST --- REVIEW: http://review.gluster.org/10954 (dht : check if all bricks are started before performing remove-brick) posted (#2) for review on master by Sakshi Bansal (sabansal) --- Additional comment from Anand Avati on 2015-08-29 08:45:00 IST --- REVIEW: http://review.gluster.org/10954 (glusterd : check if all bricks are started before performing remove-brick) posted (#3) for review on master by Sakshi Bansal (sabansal) --- Additional comment from Anand Avati on 2015-09-01 11:10:42 IST --- REVIEW: http://review.gluster.org/10954 (glusterd: check if all bricks are started before performing remove-brick) posted (#4) for review on master by Sakshi Bansal (sabansal) --- Additional comment from Vijay Bellur on 2015-09-03 15:18:01 IST --- REVIEW: http://review.gluster.org/10954 (glusterd : check if all bricks are started before performing remove-brick) posted (#5) for review on master by Sakshi Bansal (sabansal) --- Additional comment from Vijay Bellur on 2016-01-07 13:21:14 IST --- REVIEW: http://review.gluster.org/13191 (glusterd: remove-brick commit getting executed before migration has completed) posted (#1) for review on master by Sakshi Bansal --- Additional comment from Vijay Bellur on 2016-01-07 15:56:12 IST --- REVIEW: http://review.gluster.org/13191 (tests: remove-brick commit getting executed before migration has completed) posted (#2) for review on master by Sakshi Bansal --- Additional comment from Vijay Bellur on 2016-01-12 17:04:29 IST --- REVIEW: http://review.gluster.org/13191 (tests: remove-brick commit getting executed before migration has completed) posted (#3) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Vijay Bellur on 2016-01-28 11:51:00 IST --- REVIEW: http://review.gluster.org/13191 (tests: remove-brick commit getting executed before migration has completed) posted (#4) for review on master by Sakshi Bansal --- Additional comment from Vijay Bellur on 2016-02-02 16:10:31 IST --- REVIEW: http://review.gluster.org/13191 (tests: remove-brick commit getting executed before migration has completed) posted (#5) for review on master by Sakshi Bansal --- Additional comment from Vijay Bellur on 2016-02-24 18:45:52 IST --- COMMIT: http://review.gluster.org/13191 committed in master by Raghavendra Talur (rtalur) ------ commit 6209e227f86025ff9591d78e69c4758b62271a04 Author: Sakshi Bansal <sabansal> Date: Thu Jan 7 13:09:58 2016 +0530 tests: remove-brick commit getting executed before migration has completed Remove brick commit will fail when it is executed while rebalance is in progress. Hence added a rebalance timeout check before remove-brick commit to enusre that rebalance has completed. Change-Id: Ic12f97cbba417ce8cddb35ae973f2bc9bde0fc80 BUG: 1225716 Signed-off-by: Sakshi Bansal <sabansal> Reviewed-on: http://review.gluster.org/13191 Reviewed-by: Gaurav Kumar Garg <ggarg> Smoke: Gluster Build System <jenkins.com> CentOS-regression: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Raghavendra Talur <rtalur>
REVIEW: http://review.gluster.org/13511 (tests: remove-brick commit getting executed before migration has completed) posted (#2) for review on release-3.7 by Raghavendra Talur (rtalur)
COMMIT: http://review.gluster.org/13511 committed in release-3.7 by Raghavendra Talur (rtalur) ------ commit 31770ace1baf603f486b7e65ce6f78eff7a20e15 Author: Sakshi Bansal <sabansal> Date: Thu Jan 7 13:09:58 2016 +0530 tests: remove-brick commit getting executed before migration has completed Backport of http://review.gluster.org/13191 Remove brick commit will fail when it is executed while rebalance is in progress. Hence added a rebalance timeout check before remove-brick commit to enusre that rebalance has completed. Change-Id: I5f388b88a68d19f8d2f52937afb771b95be6deaf BUG: 1311572 Signed-off-by: Sakshi Bansal <sabansal> Reviewed-on: http://review.gluster.org/13511 Tested-by: Raghavendra Talur <rtalur> Smoke: Gluster Build System <jenkins.com> CentOS-regression: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Raghavendra Talur <rtalur>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.9, please open a new bug report. glusterfs-3.7.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-users/2016-March/025922.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user