Bug 878872

Summary:	cannot replace brick in distributed replicated volume
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Vidya Sakar <vinaraya>
Component:	glusterd	Assignee:	krishnan parthasarathi <kparthas>
Status:	CLOSED ERRATA	QA Contact:	shylesh <shmohan>
Severity:	unspecified	Docs Contact:
Priority:	high
Version:	2.0	CC:	amarts, gluster-bugs, nsathyan, rfortier, rhs-bugs, ricor.bz, shaines, vbellur
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.4.0qa6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	875412	Environment:
Last Closed:	2013-09-23 22:39:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	875412, 877522
Bug Blocks:

Description Vidya Sakar 2012-11-21 12:56:58 UTC

+++ This bug was initially created as a clone of Bug #875412 +++

Description of problem:
unable to replace a brick in an online 4 node gluster volume
4 CentOS nodes serving a distributed replicated volume
attempt to replace one node with an Ubuntu node using replace-brick directive

Version-Release number of selected component (if applicable):
gluster 3.3.1 CentOS 6.3
gluster 3.3.1 Ubuntu Server 12.10

How reproducible:
occurs on all nodes

Steps to Reproduce:
1. gluster peer probe ubus01.node
2. gluster volume replace-brick gvol_1 ceno1.node:/exports/lv01 ubus01.node:/exports/lv01 start 
  
Actual results:
/exports/lv01 or a prefix of it is already part of a volume

Expected results:
data should be migrated from node ceno1.node to node ubus01.node

Additional info:
this is the first time the node is being added to the volume
all bricks are ext4 with mount options noatime,user_xattrs

--- Additional comment from krishnan parthasarathi on 2012-11-12 01:38:48 EST ---

Ricor,
Could you paste output of "gluster peer status" from all the nodes?
Please attach the glusterd log files from ubus01.node and ceno1.node.

--- Additional comment from  on 2012-11-13 09:44:44 EST ---

okay.

i had to replace one of the centos nodes (treating it as a FAILED server because glusterd refused to start up healthily after a restart).

that process was completed rather smoothly, so there is now one working ubuntu node in volume.

*******************
gluster volume info
*******************
Volume Name: disrep-vol
Type: Distributed-Replicate
Volume ID: 2dfe4f23-8d10-4a88-85e2-97d3e72c13c4
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: ceno4.node:/exports/vol_01
Brick2: ubus02.node:/exports/vol_01
Brick3: ceno2.node:/exports/vol_01
Brick4: ceno1.node:/exports/vol_01
Options Reconfigured:
performance.cache-size: 64MB
auth.allow: 172.16.100.2,172.16.100.3,172.16.100.20,172.16.100.21,172.16.100.22,172.16.100.23,172.16.100.24,172.16.100.25
nfs.addr-namelookup: off
nfs.rpc-auth-allow: 172.16.100.2,172.16.100.3,172.16.100.20,172.16.100.21,172.16.100.22,172.16.100.23,172.16.100.24
nfs.disable: off
nfs.ports-insecure: on


***********************
gluster peer status
***********************
**ceno1.node
Number of Peers: 4

Hostname: 172.16.100.21
Uuid: c5c4c4fc-a22d-4f11-a42f-0d1f4ef7af70
State: Peer in Cluster (Connected)

Hostname: ceno2.node
Uuid: fa7005c3-a929-4110-b1be-ccd206000a67
State: Peer in Cluster (Connected)

Hostname: ceno4.node
Uuid: 6afbf15a-4294-44a0-b351-5c45b8142513
State: Peer in Cluster (Connected)

Hostname: ubus01.node
Uuid: 2547029f-15c1-4349-ba61-7bd9e226110d
State: Peer in Cluster (Connected)


**ceno4.node
Number of Peers: 4

Hostname: ceno1.node
Uuid: 85a362dd-79eb-48cf-80bd-f675617ad01e
State: Peer in Cluster (Connected)

Hostname: ceno2.node
Uuid: fa7005c3-a929-4110-b1be-ccd206000a67
State: Peer in Cluster (Connected)

Hostname: 172.16.100.21
Uuid: c5c4c4fc-a22d-4f11-a42f-0d1f4ef7af70
State: Peer in Cluster (Connected)

Hostname: ubus01.node
Uuid: 2547029f-15c1-4349-ba61-7bd9e226110d
State: Peer in Cluster (Connected)


**ubus02.node
Number of Peers: 4

Hostname: ceno1.node
Uuid: 85a362dd-79eb-48cf-80bd-f675617ad01e
State: Peer in Cluster (Connected)

Hostname: ceno4.node
Uuid: 6afbf15a-4294-44a0-b351-5c45b8142513
State: Peer in Cluster (Connected)

Hostname: ceno2.node
Uuid: fa7005c3-a929-4110-b1be-ccd206000a67
State: Peer in Cluster (Connected)

Hostname: ubus01.node
Uuid: 2547029f-15c1-4349-ba61-7bd9e226110d
State: Peer in Cluster (Connected)


**ubus01.node
Number of Peers: 4

Hostname: ceno2.node
Uuid: fa7005c3-a929-4110-b1be-ccd206000a67
State: Peer in Cluster (Connected)

Hostname: ceno1.node
Uuid: 85a362dd-79eb-48cf-80bd-f675617ad01e
State: Peer in Cluster (Connected)

Hostname: ceno4.node
Uuid: 6afbf15a-4294-44a0-b351-5c45b8142513
State: Peer in Cluster (Connected)

Hostname: 172.16.100.21
Uuid: c5c4c4fc-a22d-4f11-a42f-0d1f4ef7af70
State: Peer in Cluster (Connected)


**ceno2.node
Number of Peers: 4

Hostname: ceno1.node
Uuid: 85a362dd-79eb-48cf-80bd-f675617ad01e
State: Peer in Cluster (Connected)

Hostname: ceno4.node
Uuid: 6afbf15a-4294-44a0-b351-5c45b8142513
State: Peer in Cluster (Connected)

Hostname: 172.16.100.21
Uuid: c5c4c4fc-a22d-4f11-a42f-0d1f4ef7af70
State: Peer in Cluster (Connected)

Hostname: ubus01.node
Uuid: 2547029f-15c1-4349-ba61-7bd9e226110d
State: Peer in Cluster (Connected)


***********************
gluster volume replace-brick disrep-vol ceno1.node:/exports/vol_01 ubus01.node:/exports/vol_01 start
***********************

contents of /var/log/glusterfs/etc-glusterfs-glusterd.vol.log

******ubus01.node
[2012-11-12 16:21:31.013000] I [glusterd-handler.c:502:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:31.013103] I [glusterd-utils.c:285:glusterd_lock] 0-glusterd: Cluster lock held by fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:31.013172] I [glusterd-handler.c:1322:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0
[2012-11-12 16:21:31.013956] I [glusterd-handler.c:547:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:31.014092] I [glusterd-utils.c:857:glusterd_volume_brickinfo_get_by_brick] 0-: brick: ceno1.node:/exports/vol_01
[2012-11-12 16:21:31.014168] I [glusterd-utils.c:814:glusterd_volume_brickinfo_get] 0-management: Found brick
[2012-11-12 16:21:31.014870] E [glusterd-utils.c:4490:glusterd_is_path_in_use] 0-management: /exports/vol_01 or a prefix of it is already part of a volume
[2012-11-12 16:21:31.014927] E [glusterd-op-sm.c:2716:glusterd_op_ac_stage_op] 0-: Validate failed: -1
[2012-11-12 16:21:31.015010] I [glusterd-handler.c:1423:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0
[2012-11-12 16:21:31.015233] I [glusterd-handler.c:1366:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:31.015322] I [glusterd-handler.c:1342:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0


******ceno1.node
[2012-11-12 16:21:32.217035] I [glusterd-handler.c:502:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:32.217165] I [glusterd-utils.c:285:glusterd_lock] 0-glusterd: Cluster lock held by fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:32.217232] I [glusterd-handler.c:1322:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0
[2012-11-12 16:21:32.217973] I [glusterd-handler.c:547:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:32.218084] I [glusterd-utils.c:857:glusterd_volume_brickinfo_get_by_brick] 0-: brick: ceno1.node:/exports/vol_01
[2012-11-12 16:21:32.218478] I [glusterd-utils.c:814:glusterd_volume_brickinfo_get] 0-management: Found brick
[2012-11-12 16:21:32.218981] I [glusterd-handler.c:1423:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0
[2012-11-12 16:21:32.219282] I [glusterd-handler.c:1366:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:32.219336] I [glusterd-handler.c:1342:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0



****ubus01.node
sudo ls -la /exports/vol_01/
total 24
drwxr-xr-x 3 root root  4096 Dec 31  2006 .
drwxr-xr-x 3 root root  4096 Nov 12 16:05 ..
drwx------ 2 root root 16384 Dec 31  2006 lost+found

Comment 2 Amar Tumballi 2013-01-21 10:17:38 UTC

This bug is fixed with master, for flag (pm-rhel: rhs‑2.1.0+) by bug 877522. not there in 2.0.z branch yet.

Comment 3 krishnan parthasarathi 2013-01-28 10:40:19 UTC

I tried recreating the issue as follows,

1) Created a volume with 2 bricks in a single node.
2) Added another node to the cluster.
3) Performed replace-brick on the volume such that one of the bricks
   from the first peer was replaced with another brick, with the same path,
   to the second peer.
4) Replace-brick returned successfully.

I would recommend verification of this on the latest build of rhs-2.0z that is available.

Comment 6 Scott Haines 2013-03-08 20:56:22 UTC

Per 03/05 email exchange w/ PM, targeting for Arches.

Comment 7 Scott Haines 2013-04-11 16:36:25 UTC

Per 04-10-2013 Storage bug triage meeting, targeting for Big Bend.

Comment 8 shylesh 2013-07-11 06:30:24 UTC

Verified on 3.4.0.12rhs.beta3-1.el6rhs.x86_64

Comment 9 Scott Haines 2013-09-23 22:39:20 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 10 Scott Haines 2013-09-23 22:43:43 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html