Bug 878872

Summary: cannot replace brick in distributed replicated volume
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vidya Sakar <vinaraya>
Component: glusterdAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED ERRATA QA Contact: shylesh <shmohan>
Severity: unspecified Docs Contact:
Priority: high    
Version: 2.0CC: amarts, gluster-bugs, nsathyan, rfortier, rhs-bugs, ricor.bz, shaines, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0qa6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 875412 Environment:
Last Closed: 2013-09-23 22:39:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 875412, 877522    
Bug Blocks:    

Description Vidya Sakar 2012-11-21 12:56:58 UTC
+++ This bug was initially created as a clone of Bug #875412 +++

Description of problem:
unable to replace a brick in an online 4 node gluster volume
4 CentOS nodes serving a distributed replicated volume
attempt to replace one node with an Ubuntu node using replace-brick directive

Version-Release number of selected component (if applicable):
gluster 3.3.1 CentOS 6.3
gluster 3.3.1 Ubuntu Server 12.10

How reproducible:
occurs on all nodes

Steps to Reproduce:
1. gluster peer probe ubus01.node
2. gluster volume replace-brick gvol_1 ceno1.node:/exports/lv01 ubus01.node:/exports/lv01 start 
  
Actual results:
/exports/lv01 or a prefix of it is already part of a volume

Expected results:
data should be migrated from node ceno1.node to node ubus01.node

Additional info:
this is the first time the node is being added to the volume
all bricks are ext4 with mount options noatime,user_xattrs

--- Additional comment from krishnan parthasarathi on 2012-11-12 01:38:48 EST ---

Ricor,
Could you paste output of "gluster peer status" from all the nodes?
Please attach the glusterd log files from ubus01.node and ceno1.node.

--- Additional comment from  on 2012-11-13 09:44:44 EST ---

okay.

i had to replace one of the centos nodes (treating it as a FAILED server because glusterd refused to start up healthily after a restart).

that process was completed rather smoothly, so there is now one working ubuntu node in volume.

*******************
gluster volume info
*******************
Volume Name: disrep-vol
Type: Distributed-Replicate
Volume ID: 2dfe4f23-8d10-4a88-85e2-97d3e72c13c4
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: ceno4.node:/exports/vol_01
Brick2: ubus02.node:/exports/vol_01
Brick3: ceno2.node:/exports/vol_01
Brick4: ceno1.node:/exports/vol_01
Options Reconfigured:
performance.cache-size: 64MB
auth.allow: 172.16.100.2,172.16.100.3,172.16.100.20,172.16.100.21,172.16.100.22,172.16.100.23,172.16.100.24,172.16.100.25
nfs.addr-namelookup: off
nfs.rpc-auth-allow: 172.16.100.2,172.16.100.3,172.16.100.20,172.16.100.21,172.16.100.22,172.16.100.23,172.16.100.24
nfs.disable: off
nfs.ports-insecure: on


***********************
gluster peer status
***********************
**ceno1.node
Number of Peers: 4

Hostname: 172.16.100.21
Uuid: c5c4c4fc-a22d-4f11-a42f-0d1f4ef7af70
State: Peer in Cluster (Connected)

Hostname: ceno2.node
Uuid: fa7005c3-a929-4110-b1be-ccd206000a67
State: Peer in Cluster (Connected)

Hostname: ceno4.node
Uuid: 6afbf15a-4294-44a0-b351-5c45b8142513
State: Peer in Cluster (Connected)

Hostname: ubus01.node
Uuid: 2547029f-15c1-4349-ba61-7bd9e226110d
State: Peer in Cluster (Connected)


**ceno4.node
Number of Peers: 4

Hostname: ceno1.node
Uuid: 85a362dd-79eb-48cf-80bd-f675617ad01e
State: Peer in Cluster (Connected)

Hostname: ceno2.node
Uuid: fa7005c3-a929-4110-b1be-ccd206000a67
State: Peer in Cluster (Connected)

Hostname: 172.16.100.21
Uuid: c5c4c4fc-a22d-4f11-a42f-0d1f4ef7af70
State: Peer in Cluster (Connected)

Hostname: ubus01.node
Uuid: 2547029f-15c1-4349-ba61-7bd9e226110d
State: Peer in Cluster (Connected)


**ubus02.node
Number of Peers: 4

Hostname: ceno1.node
Uuid: 85a362dd-79eb-48cf-80bd-f675617ad01e
State: Peer in Cluster (Connected)

Hostname: ceno4.node
Uuid: 6afbf15a-4294-44a0-b351-5c45b8142513
State: Peer in Cluster (Connected)

Hostname: ceno2.node
Uuid: fa7005c3-a929-4110-b1be-ccd206000a67
State: Peer in Cluster (Connected)

Hostname: ubus01.node
Uuid: 2547029f-15c1-4349-ba61-7bd9e226110d
State: Peer in Cluster (Connected)


**ubus01.node
Number of Peers: 4

Hostname: ceno2.node
Uuid: fa7005c3-a929-4110-b1be-ccd206000a67
State: Peer in Cluster (Connected)

Hostname: ceno1.node
Uuid: 85a362dd-79eb-48cf-80bd-f675617ad01e
State: Peer in Cluster (Connected)

Hostname: ceno4.node
Uuid: 6afbf15a-4294-44a0-b351-5c45b8142513
State: Peer in Cluster (Connected)

Hostname: 172.16.100.21
Uuid: c5c4c4fc-a22d-4f11-a42f-0d1f4ef7af70
State: Peer in Cluster (Connected)


**ceno2.node
Number of Peers: 4

Hostname: ceno1.node
Uuid: 85a362dd-79eb-48cf-80bd-f675617ad01e
State: Peer in Cluster (Connected)

Hostname: ceno4.node
Uuid: 6afbf15a-4294-44a0-b351-5c45b8142513
State: Peer in Cluster (Connected)

Hostname: 172.16.100.21
Uuid: c5c4c4fc-a22d-4f11-a42f-0d1f4ef7af70
State: Peer in Cluster (Connected)

Hostname: ubus01.node
Uuid: 2547029f-15c1-4349-ba61-7bd9e226110d
State: Peer in Cluster (Connected)


***********************
gluster volume replace-brick disrep-vol ceno1.node:/exports/vol_01 ubus01.node:/exports/vol_01 start
***********************

contents of /var/log/glusterfs/etc-glusterfs-glusterd.vol.log

******ubus01.node
[2012-11-12 16:21:31.013000] I [glusterd-handler.c:502:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:31.013103] I [glusterd-utils.c:285:glusterd_lock] 0-glusterd: Cluster lock held by fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:31.013172] I [glusterd-handler.c:1322:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0
[2012-11-12 16:21:31.013956] I [glusterd-handler.c:547:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:31.014092] I [glusterd-utils.c:857:glusterd_volume_brickinfo_get_by_brick] 0-: brick: ceno1.node:/exports/vol_01
[2012-11-12 16:21:31.014168] I [glusterd-utils.c:814:glusterd_volume_brickinfo_get] 0-management: Found brick
[2012-11-12 16:21:31.014870] E [glusterd-utils.c:4490:glusterd_is_path_in_use] 0-management: /exports/vol_01 or a prefix of it is already part of a volume
[2012-11-12 16:21:31.014927] E [glusterd-op-sm.c:2716:glusterd_op_ac_stage_op] 0-: Validate failed: -1
[2012-11-12 16:21:31.015010] I [glusterd-handler.c:1423:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0
[2012-11-12 16:21:31.015233] I [glusterd-handler.c:1366:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:31.015322] I [glusterd-handler.c:1342:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0


******ceno1.node
[2012-11-12 16:21:32.217035] I [glusterd-handler.c:502:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:32.217165] I [glusterd-utils.c:285:glusterd_lock] 0-glusterd: Cluster lock held by fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:32.217232] I [glusterd-handler.c:1322:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0
[2012-11-12 16:21:32.217973] I [glusterd-handler.c:547:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:32.218084] I [glusterd-utils.c:857:glusterd_volume_brickinfo_get_by_brick] 0-: brick: ceno1.node:/exports/vol_01
[2012-11-12 16:21:32.218478] I [glusterd-utils.c:814:glusterd_volume_brickinfo_get] 0-management: Found brick
[2012-11-12 16:21:32.218981] I [glusterd-handler.c:1423:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0
[2012-11-12 16:21:32.219282] I [glusterd-handler.c:1366:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: fa7005c3-a929-4110-b1be-ccd206000a67
[2012-11-12 16:21:32.219336] I [glusterd-handler.c:1342:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0



****ubus01.node
sudo ls -la /exports/vol_01/
total 24
drwxr-xr-x 3 root root  4096 Dec 31  2006 .
drwxr-xr-x 3 root root  4096 Nov 12 16:05 ..
drwx------ 2 root root 16384 Dec 31  2006 lost+found

Comment 2 Amar Tumballi 2013-01-21 10:17:38 UTC
This bug is fixed with master, for flag (pm-rhel: rhs‑2.1.0+) by bug 877522. not there in 2.0.z branch yet.

Comment 3 krishnan parthasarathi 2013-01-28 10:40:19 UTC
I tried recreating the issue as follows,

1) Created a volume with 2 bricks in a single node.
2) Added another node to the cluster.
3) Performed replace-brick on the volume such that one of the bricks
   from the first peer was replaced with another brick, with the same path,
   to the second peer.
4) Replace-brick returned successfully.

I would recommend verification of this on the latest build of rhs-2.0z that is available.

Comment 6 Scott Haines 2013-03-08 20:56:22 UTC
Per 03/05 email exchange w/ PM, targeting for Arches.

Comment 7 Scott Haines 2013-04-11 16:36:25 UTC
Per 04-10-2013 Storage bug triage meeting, targeting for Big Bend.

Comment 8 shylesh 2013-07-11 06:30:24 UTC
Verified on 3.4.0.12rhs.beta3-1.el6rhs.x86_64

Comment 9 Scott Haines 2013-09-23 22:39:20 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 10 Scott Haines 2013-09-23 22:43:43 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html