Description of problem: Remove brick command execution displays success even after, one of the bricks down. But gluster v status <vol> shows remove-brick process failed and so the rebalance log messages. BUild found: [root@rhsauto032 ~]# rpm -qa | grep gluster gluster-nagios-common-0.1.4-1.el6rhs.noarch glusterfs-3.6.0.50-1.el6rhs.x86_64 glusterfs-server-3.6.0.50-1.el6rhs.x86_64 gluster-nagios-addons-0.1.14-1.el6rhs.x86_64 samba-glusterfs-3.6.509-169.4.el6rhs.x86_64 glusterfs-libs-3.6.0.50-1.el6rhs.x86_64 glusterfs-api-3.6.0.50-1.el6rhs.x86_64 glusterfs-cli-3.6.0.50-1.el6rhs.x86_64 glusterfs-geo-replication-3.6.0.50-1.el6rhs.x86_64 vdsm-gluster-4.14.7.3-1.el6rhs.noarch glusterfs-fuse-3.6.0.50-1.el6rhs.x86_64 glusterfs-rdma-3.6.0.50-1.el6rhs.x86_64 [root@rhsauto032 ~]# [root@rhsauto032 ~]# glusterfs --version glusterfs 3.6.0.50 built on Mar 6 2015 11:04:46 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. [root@rhsauto032 ~]# Reproducible: continuously. Steps: 1. create a distribute volume with N brick (N > 3) 2. bring down one of the brick 3. initiate remove-brick Expected result: Remove brick should not start output from the test: [root@rhsauto032 ~]# gluster v info dist Volume Name: dist Type: Distribute Volume ID: 6725427c-e363-4695-a4ac-65ec65ab0997 Status: Started Snap Volume: no Number of Bricks: 5 Transport-type: tcp Bricks: Brick1: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d1 Brick2: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d2 Brick3: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d3 Brick4: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d4 Brick5: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d0_2 Options Reconfigured: performance.readdir-ahead: on auto-delete: disable snap-max-soft-limit: 90 snap-max-hard-limit: 256 [root@rhsauto032 ~]# gluster v status dist Status of volume: dist Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d1 49214 0 Y 14069 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d2 49215 0 Y 14078 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d3 49216 0 Y 14084 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d4 49217 0 Y 14093 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d0_2 49219 0 Y 14102 NFS Server on localhost 2049 0 Y 31543 NFS Server on rhsauto040.lab.eng.blr.redhat .com 2049 0 Y 16743 NFS Server on rhsauto034.lab.eng.blr.redhat .com 2049 0 Y 25003 Task Status of Volume dist ------------------------------------------------------------------------------ There are no active volume tasks [root@rhsauto032 ~]# kill -9 14069 [root@rhsauto032 ~]# gluster v status dist Status of volume: dist Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d1 N/A N/A N 14069 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d2 49215 0 Y 14078 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d3 49216 0 Y 14084 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d4 49217 0 Y 14093 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d0_2 49219 0 Y 14102 NFS Server on localhost 2049 0 Y 31543 NFS Server on rhsauto040.lab.eng.blr.redhat .com 2049 0 Y 16743 NFS Server on rhsauto034.lab.eng.blr.redhat .com 2049 0 Y 25003 Task Status of Volume dist ------------------------------------------------------------------------------ There are no active volume tasks [root@rhsauto032 ~]# gluster v remove-brick dist rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d1 start volume remove-brick start: success ID: a3e82c82-c2ba-4c02-b09d-c3414246c0d4 [root@rhsauto032 ~]# gluster v status dist Status of volume: dist Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d1 N/A N/A N 14069 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d2 49215 0 Y 14078 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d3 49216 0 Y 14084 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d4 49217 0 Y 14093 Brick rhsauto032.lab.eng.blr.redhat.com:/rh s/brick1/d0_2 49219 0 Y 14102 NFS Server on localhost 2049 0 Y 31543 NFS Server on rhsauto040.lab.eng.blr.redhat .com 2049 0 Y 20221 NFS Server on rhsauto034.lab.eng.blr.redhat .com 2049 0 Y 28515 Task Status of Volume dist ------------------------------------------------------------------------------ Task : Remove brick ID : a3e82c82-c2ba-4c02-b09d-c3414246c0d4glust Removed bricks: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d1 Status : failed [root@rhsauto032 ~]# Log messages: [2015-03-11 02:00:28.209263] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-03-11 02:00:33.232184] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'node-uuid' for volume 'dist-dht' with value '86741341-4584-4a10-ac2a-32cf9230c967' [2015-03-11 02:00:33.232210] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'rebalance-cmd' for volume 'dist-dht' with value '5' [2015-03-11 02:00:33.232223] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'readdir-optimize' for volume 'dist-dht' with value 'on' [2015-03-11 02:00:33.232235] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'assert-no-child-down' for volume 'dist-dht' with value 'yes' [2015-03-11 02:00:33.232246] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'lookup-unhashed' for volume 'dist-dht' with value 'yes' [2015-03-11 02:00:33.232258] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'use-readdirp' for volume 'dist-dht' with value 'yes' [2015-03-11 02:00:33.233257] I [dht-shared.c:272:dht_parse_decommissioned_bricks] 0-dist-dht: decommissioning subvolume dist-client-1 [2015-03-11 02:00:33.233380] I [dht-shared.c:337:dht_init_regex] 0-dist-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$ [2015-03-11 02:00:33.236568] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2015-03-11 02:00:33.239365] I [client.c:2350:notify] 0-dist-client-1: parent translators are ready, attempting connect on transport [2015-03-11 02:00:33.244180] I [client.c:2350:notify] 0-dist-client-2: parent translators are ready, attempting connect on transport [2015-03-11 02:00:33.244920] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-dist-client-1: changing port to 49214 (from 0) [2015-03-11 02:00:33.251076] I [client.c:2350:notify] 0-dist-client-3: parent translators are ready, attempting connect on transport [2015-03-11 02:00:33.254459] E [socket.c:2213:socket_connect_finish] 0-dist-client-1: connection to 10.70.37.7:49214 failed (Connection refused) [2015-03-11 02:00:33.254506] W [dht-common.c:6044:dht_notify] 0-dist-dht: Received CHILD_DOWN. Exiting [2015-03-11 02:00:33.254767] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-dist-client-2: changing port to 49215 (from 0) [2015-03-11 02:00:33.258997] I [client.c:2350:notify] 0-dist-client-4: parent translators are ready, attempting connect on transport [2015-03-11 02:00:33.262513] I [client-handshake.c:1412:select_server_supported_programs] 0-dist-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-03-11 02:00:33.262886] I [client-handshake.c:1200:client_setvolume_cbk] 0-dist-client-2: Connected to dist-client-2, attached to remote volume '/rhs/brick1/d2'. [2015-03-11 02:00:33.262911] I [client-handshake.c:1210:client_setvolume_cbk] 0-dist-client-2: Server and Client lk-version numbers are not same, reopening the fds [2015-03-11 02:00:33.263263] I [client-handshake.c:187:client_set_lk_version_cbk] 0-dist-client-2: Server lk version = 1 [2015-03-11 02:00:33.263661] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-dist-client-3: changing port to 49216 (from 0)
Hi Sakshi, Pls can you put here the patch link which fixed this issue. Thanks
(In reply to Byreddy from comment #4) > Hi Sakshi, > > Pls can you put here the patch link which fixed this issue. > > Thanks Upstream patch : http://review.gluster.org/#/c/10954/ 3.7 branch : http://review.gluster.org/#/c/13306/
Used the glusterfs-3.7.9-10 version to verify this bug, Expected from this bug is, if any volume brick is down then remove-brick operation should fail with proper error message BUT Fix is working only if i try to remove offline brick, Not working when i tried remove Online brick in the volume which is having offline brick as well. Pls check the console log output in the next comment for above explanation and moving this bug to Assigned state.