Bug 1201205 - Remove brick command execution displays success even after, one of the bricks down.
Summary: Remove brick command execution displays success even after, one of the bricks...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
high
low
Target Milestone: ---
: ---
Assignee: Nithya Balachandran
QA Contact: Prasad Desala
URL:
Whiteboard:
Depends On:
Blocks: 1302528
TreeView+ depends on / blocked
 
Reported: 2015-03-12 10:21 UTC by Triveni Rao
Modified: 2018-03-14 06:58 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1225716 1302528 (view as bug list)
Environment:
Last Closed: 2018-03-14 06:58:24 UTC
Embargoed:


Attachments (Terms of Use)

Description Triveni Rao 2015-03-12 10:21:59 UTC
Description of problem:

Remove brick command execution displays success even after, one of the bricks down. But gluster v status <vol> shows remove-brick process failed and so the rebalance log messages.

BUild found:

[root@rhsauto032 ~]# rpm -qa | grep gluster
gluster-nagios-common-0.1.4-1.el6rhs.noarch
glusterfs-3.6.0.50-1.el6rhs.x86_64
glusterfs-server-3.6.0.50-1.el6rhs.x86_64
gluster-nagios-addons-0.1.14-1.el6rhs.x86_64
samba-glusterfs-3.6.509-169.4.el6rhs.x86_64
glusterfs-libs-3.6.0.50-1.el6rhs.x86_64
glusterfs-api-3.6.0.50-1.el6rhs.x86_64
glusterfs-cli-3.6.0.50-1.el6rhs.x86_64
glusterfs-geo-replication-3.6.0.50-1.el6rhs.x86_64
vdsm-gluster-4.14.7.3-1.el6rhs.noarch
glusterfs-fuse-3.6.0.50-1.el6rhs.x86_64
glusterfs-rdma-3.6.0.50-1.el6rhs.x86_64
[root@rhsauto032 ~]# 


[root@rhsauto032 ~]# glusterfs --version
glusterfs 3.6.0.50 built on Mar  6 2015 11:04:46
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
[root@rhsauto032 ~]# 

Reproducible: continuously.

Steps:

1. create a distribute volume with N brick (N > 3)

2. bring down one of the brick

3. initiate remove-brick

Expected result:
Remove brick should not start


output from the test:

[root@rhsauto032 ~]# gluster v info dist
 
Volume Name: dist
Type: Distribute
Volume ID: 6725427c-e363-4695-a4ac-65ec65ab0997
Status: Started
Snap Volume: no
Number of Bricks: 5
Transport-type: tcp
Bricks:
Brick1: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d1
Brick2: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d2
Brick3: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d3
Brick4: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d4
Brick5: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d0_2
Options Reconfigured:
performance.readdir-ahead: on
auto-delete: disable
snap-max-soft-limit: 90
snap-max-hard-limit: 256
[root@rhsauto032 ~]# gluster v status dist
Status of volume: dist
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d1                                 49214     0          Y       14069
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d2                                 49215     0          Y       14078
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d3                                 49216     0          Y       14084
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d4                                 49217     0          Y       14093
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d0_2                               49219     0          Y       14102
NFS Server on localhost                     2049      0          Y       31543
NFS Server on rhsauto040.lab.eng.blr.redhat
.com                                        2049      0          Y       16743
NFS Server on rhsauto034.lab.eng.blr.redhat
.com                                        2049      0          Y       25003
 
Task Status of Volume dist
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@rhsauto032 ~]# kill -9 14069
[root@rhsauto032 ~]# gluster v status dist
Status of volume: dist
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d1                                 N/A       N/A        N       14069
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d2                                 49215     0          Y       14078
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d3                                 49216     0          Y       14084
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d4                                 49217     0          Y       14093
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d0_2                               49219     0          Y       14102
NFS Server on localhost                     2049      0          Y       31543
NFS Server on rhsauto040.lab.eng.blr.redhat
.com                                        2049      0          Y       16743
NFS Server on rhsauto034.lab.eng.blr.redhat
.com                                        2049      0          Y       25003
 
Task Status of Volume dist
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@rhsauto032 ~]# gluster v remove-brick dist rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d1 start
volume remove-brick start: success
ID: a3e82c82-c2ba-4c02-b09d-c3414246c0d4


[root@rhsauto032 ~]# gluster v status dist
Status of volume: dist
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d1                                 N/A       N/A        N       14069
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d2                                 49215     0          Y       14078
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d3                                 49216     0          Y       14084
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d4                                 49217     0          Y       14093
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d0_2                               49219     0          Y       14102
NFS Server on localhost                     2049      0          Y       31543
NFS Server on rhsauto040.lab.eng.blr.redhat
.com                                        2049      0          Y       20221
NFS Server on rhsauto034.lab.eng.blr.redhat
.com                                        2049      0          Y       28515
 
Task Status of Volume dist
------------------------------------------------------------------------------
Task                 : Remove brick        
ID                   : a3e82c82-c2ba-4c02-b09d-c3414246c0d4glust
Removed bricks:     
rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d1
Status               : failed              
 
[root@rhsauto032 ~]# 


Log messages:

[2015-03-11 02:00:28.209263] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2015-03-11 02:00:33.232184] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'node-uuid' for volume 'dist-dht' with value '86741341-4584-4a10-ac2a-32cf9230c967'
[2015-03-11 02:00:33.232210] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'rebalance-cmd' for volume 'dist-dht' with value '5'
[2015-03-11 02:00:33.232223] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'readdir-optimize' for volume 'dist-dht' with value 'on'
[2015-03-11 02:00:33.232235] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'assert-no-child-down' for volume 'dist-dht' with value 'yes'
[2015-03-11 02:00:33.232246] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'lookup-unhashed' for volume 'dist-dht' with value 'yes'
[2015-03-11 02:00:33.232258] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht: adding option 'use-readdirp' for volume 'dist-dht' with value 'yes'
[2015-03-11 02:00:33.233257] I [dht-shared.c:272:dht_parse_decommissioned_bricks] 0-dist-dht: decommissioning subvolume dist-client-1
[2015-03-11 02:00:33.233380] I [dht-shared.c:337:dht_init_regex] 0-dist-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
[2015-03-11 02:00:33.236568] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2015-03-11 02:00:33.239365] I [client.c:2350:notify] 0-dist-client-1: parent translators are ready, attempting connect on transport
[2015-03-11 02:00:33.244180] I [client.c:2350:notify] 0-dist-client-2: parent translators are ready, attempting connect on transport
[2015-03-11 02:00:33.244920] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-dist-client-1: changing port to 49214 (from 0)
[2015-03-11 02:00:33.251076] I [client.c:2350:notify] 0-dist-client-3: parent translators are ready, attempting connect on transport
[2015-03-11 02:00:33.254459] E [socket.c:2213:socket_connect_finish] 0-dist-client-1: connection to 10.70.37.7:49214 failed (Connection refused)
[2015-03-11 02:00:33.254506] W [dht-common.c:6044:dht_notify] 0-dist-dht: Received CHILD_DOWN. Exiting
[2015-03-11 02:00:33.254767] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-dist-client-2: changing port to 49215 (from 0)
[2015-03-11 02:00:33.258997] I [client.c:2350:notify] 0-dist-client-4: parent translators are ready, attempting connect on transport
[2015-03-11 02:00:33.262513] I [client-handshake.c:1412:select_server_supported_programs] 0-dist-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-03-11 02:00:33.262886] I [client-handshake.c:1200:client_setvolume_cbk] 0-dist-client-2: Connected to dist-client-2, attached to remote volume '/rhs/brick1/d2'.
[2015-03-11 02:00:33.262911] I [client-handshake.c:1210:client_setvolume_cbk] 0-dist-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2015-03-11 02:00:33.263263] I [client-handshake.c:187:client_set_lk_version_cbk] 0-dist-client-2: Server lk version = 1
[2015-03-11 02:00:33.263661] I [rpc-clnt.c:1759:rpc_clnt_reconfig] 0-dist-client-3: changing port to 49216 (from 0)

Comment 4 Byreddy 2016-07-20 08:49:50 UTC
Hi Sakshi,

Pls can you put here the patch link which fixed this issue.

Thanks

Comment 5 Sakshi 2016-07-20 08:56:09 UTC
(In reply to Byreddy from comment #4)
> Hi Sakshi,
> 
> Pls can you put here the patch link which fixed this issue.
> 
> Thanks

Upstream patch : http://review.gluster.org/#/c/10954/
3.7 branch : http://review.gluster.org/#/c/13306/

Comment 6 Byreddy 2016-07-22 06:00:53 UTC
Used the glusterfs-3.7.9-10  version  to verify this bug,

Expected from this bug is, if any volume brick is down then remove-brick operation should fail with proper error message BUT Fix is working only if i try to remove offline brick, Not working when i tried remove Online brick in the volume which is having offline brick as well.



Pls check the console log output in the next comment for above explanation and moving this bug to Assigned state.


Note You need to log in before you can comment on or make changes to this bug.