Bug 878872 - cannot replace brick in distributed replicated volume
cannot replace brick in distributed replicated volume
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
2.0
x86_64 Linux
high Severity unspecified
: ---
: ---
Assigned To: krishnan parthasarathi
shylesh
:
Depends On: 875412 877522
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-21 07:56 EST by Vidya Sakar
Modified: 2015-11-03 18:05 EST (History)
8 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0qa6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 875412
Environment:
Last Closed: 2013-09-23 18:39:20 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Vidya Sakar 2012-11-21 07:56:58 EST
+++ This bug was initially created as a clone of Bug #875412 +++

Description of problem:
unable to replace a brick in an online 4 node gluster volume
4 CentOS nodes serving a distributed replicated volume
attempt to replace one node with an Ubuntu node using replace-brick directive

Version-Release number of selected component (if applicable):
gluster 3.3.1 CentOS 6.3
gluster 3.3.1 Ubuntu Server 12.10

How reproducible:
occurs on all nodes

Steps to Reproduce:
1. gluster peer probe ubus01.node
2. gluster volume replace-brick gvol_1 ceno1.node:/exports/lv01 ubus01.node:/exports/lv01 start 
  
Actual results:
/exports/lv01 or a prefix of it is already part of a volume

Expected results:
data should be migrated from node ceno1.node to node ubus01.node

Additional info:
this is the first time the node is being added to the volume
all bricks are ext4 with mount options noatime,user_xattrs

--- Additional comment from krishnan parthasarathi on 2012-11-12 01:38:48 EST ---

Ricor,
Could you paste output of "gluster peer status" from all the nodes?
Please attach the glusterd log files from ubus01.node and ceno1.node.

--- Additional comment from  on 2012-11-13 09:44:44 EST ---

okay.

i had to replace one of the centos nodes (treating it as a FAILED server because glusterd refused to start up healthily after a restart).

that process was completed rather smoothly, so there is now one working ubuntu node in volume.

*******************
gluster volume info
*******************
Volume Name: disrep-vol
Type: Distributed-Replicate
Volume ID: 2dfe4f23-8d10-4a88-85e2-97d3e72c13c4
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: ceno4.node:/exports/vol_01
Brick2: ubus02.node:/exports/vol_01
Brick3: ceno2.node:/exports/vol_01
Brick4: ceno1.node:/exports/vol_01
Options Reconfigured:
performance.cache-size: 64MB
auth.allow: 172.16.100.2,172.16.100.3,172.16.100.20,172.16.100.21,172.16.100.22,172.16.100.23,172.16.100.24,172.16.100.25
nfs.addr-namelookup: off
nfs.rpc-auth-allow: 172.16.100.2,172.16.100.3,172.16.100.20,172.16.100.21,172.16.100.22,172.16.100.23,172.16.100.24
nfs.disable: off
nfs.ports-insecure: on


***********************
gluster peer status
***********************
**ceno1.node
Number of Peers: 4

Hostname: 172.16.100.21
Uuid: c5c4c4fc-a22d-4f11-a42f-0d1f4ef7af70
State: Peer in Cluster (Connected)

Hostname: ceno2.node
Uuid: fa7005c3-a929-4110-b1be-ccd206000a67
State: Peer in Cluster (Connected)

Hostname: ceno4.node
Uuid: 6afbf15a-4294-44a0-b351-5c45b8142513
State: Peer in Cluster (Connected)

Hostname: ubus01.node
Uuid: 2547029f-15c1-4349-ba61-7bd9e226110d
State: Peer in Cluster (Connected)


**ceno4.node
Number of Peers: 4

Hostname: ceno1.node
Uuid: 85a362dd-79eb-48cf-80bd-f675617ad01e
State: Peer in Cluster (Connected)

Hostname: ceno2.node
Uuid: fa7005c3-a929-4110-b1be-ccd206000a67
State: Peer in Cluster (Connected)

Hostname: 172.16.100.21
Uuid: c5c4c4fc-a22d-4f11-a42f-0d1f4ef7af70
State: Peer in Cluster (Connected)

Hostname: ubus01.node
Uuid: 2547029f-15c1-4349-ba61-7bd9e226110d
State: Peer in Cluster (Connected)


**ubus02.node
Number of Peers: 4

Hostname: ceno1.node
Uuid: 85a362dd-79eb-48cf-80bd-f675617ad01e
State: Peer in Cluster (Connected)

Hostname: ceno4.node
Uuid: 6afbf15a-4294-44a0-b351-5c45b8142513
State: Peer in Cluster (Connected)

Hostname: ceno2.node
Uuid: fa7005c3-a929-4110-b1be-ccd206000a67
State: Peer in Cluster (Connected)

Hostname: ubus01.node
Uuid: 2547029f-15c1-4349-ba61-7bd9e226110d
State: Peer in Cluster (Connected)


**ubus01.node
Number of Peers: 4

Hostname: ceno2.node
Uuid: fa7005c3-a929-4110-b1be-ccd206000a67
State: Peer in Cluster (Connected)

Hostname: ceno1.node
Uuid: 85a362dd-79eb-48cf-80bd-f675617ad01e
State: Peer in Cluster (Connected)

Hostname: ceno4.node
Uuid: 6afbf15a-4294-44a0-b351-5c45b8142513
State: Peer in Cluster (Connected)

Hostname: 172.16.100.21
Uuid: c5c4c4fc-a22d-4f11-a42f-0d1f4ef7af70
State: Peer in Cluster (Connected)


**ceno2.node
Number of Peers: 4

Hostname: ceno1.node
Uuid: 85a362dd-79eb-48cf-80bd-f675617ad01e
State: Peer in Cluster (Connected)

Hostname: ceno4.node
Uuid: 6afbf15a-4294-44a0-b351-5c45b8142513
State: Peer in Cluster (Connected)

Hostname: 172.16.100.21
Uuid: c5c4c4fc-a22d-4f11-a42f-0d1f4ef7af70
State: Peer in Cluster (Connected)

Hostname: ubus01.node
Uuid: 2547029f-15c1-4349-ba61-7bd9e226110d
State: Peer in Cluster (Connected)


***********************
gluster volume replace-brick disrep-vol ceno1.node:/exports/vol_01 ubus01.node:/exports/vol_01 start
***********************

contents of /var/log/glusterfs/etc-glusterfs-glusterd.vol.log

******ubus01.node
[2012-11-12 16:21:31.013000] I [glusterd-handler.c:502:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:31.013103] I [glusterd-utils.c:285:glusterd_lock] 0-glusterd: Cluster lock held by fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:31.013172] I [glusterd-handler.c:1322:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0
[2012-11-12 16:21:31.013956] I [glusterd-handler.c:547:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:31.014092] I [glusterd-utils.c:857:glusterd_volume_brickinfo_get_by_brick] 0-: brick: ceno1.node:/exports/vol_01
[2012-11-12 16:21:31.014168] I [glusterd-utils.c:814:glusterd_volume_brickinfo_get] 0-management: Found brick
[2012-11-12 16:21:31.014870] E [glusterd-utils.c:4490:glusterd_is_path_in_use] 0-management: /exports/vol_01 or a prefix of it is already part of a volume
[2012-11-12 16:21:31.014927] E [glusterd-op-sm.c:2716:glusterd_op_ac_stage_op] 0-: Validate failed: -1
[2012-11-12 16:21:31.015010] I [glusterd-handler.c:1423:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0
[2012-11-12 16:21:31.015233] I [glusterd-handler.c:1366:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:31.015322] I [glusterd-handler.c:1342:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0


******ceno1.node
[2012-11-12 16:21:32.217035] I [glusterd-handler.c:502:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:32.217165] I [glusterd-utils.c:285:glusterd_lock] 0-glusterd: Cluster lock held by fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:32.217232] I [glusterd-handler.c:1322:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0
[2012-11-12 16:21:32.217973] I [glusterd-handler.c:547:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:32.218084] I [glusterd-utils.c:857:glusterd_volume_brickinfo_get_by_brick] 0-: brick: ceno1.node:/exports/vol_01
[2012-11-12 16:21:32.218478] I [glusterd-utils.c:814:glusterd_volume_brickinfo_get] 0-management: Found brick
[2012-11-12 16:21:32.218981] I [glusterd-handler.c:1423:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0
[2012-11-12 16:21:32.219282] I [glusterd-handler.c:1366:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:32.219336] I [glusterd-handler.c:1342:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0



****ubus01.node
sudo ls -la /exports/vol_01/
total 24
drwxr-xr-x 3 root root  4096 Dec 31  2006 .
drwxr-xr-x 3 root root  4096 Nov 12 16:05 ..
drwx------ 2 root root 16384 Dec 31  2006 lost+found
Comment 2 Amar Tumballi 2013-01-21 05:17:38 EST
This bug is fixed with master, for flag (pm-rhel: rhs‑2.1.0+) by bug 877522. not there in 2.0.z branch yet.
Comment 3 krishnan parthasarathi 2013-01-28 05:40:19 EST
I tried recreating the issue as follows,

1) Created a volume with 2 bricks in a single node.
2) Added another node to the cluster.
3) Performed replace-brick on the volume such that one of the bricks
   from the first peer was replaced with another brick, with the same path,
   to the second peer.
4) Replace-brick returned successfully.

I would recommend verification of this on the latest build of rhs-2.0z that is available.
Comment 6 Scott Haines 2013-03-08 15:56:22 EST
Per 03/05 email exchange w/ PM, targeting for Arches.
Comment 7 Scott Haines 2013-04-11 12:36:25 EDT
Per 04-10-2013 Storage bug triage meeting, targeting for Big Bend.
Comment 8 shylesh 2013-07-11 02:30:24 EDT
Verified on 3.4.0.12rhs.beta3-1.el6rhs.x86_64
Comment 9 Scott Haines 2013-09-23 18:39:20 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html
Comment 10 Scott Haines 2013-09-23 18:43:43 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.