Bug 1214912

Summary: Failure to recover disperse volume after add-brick failure
Product: [Community] GlusterFS Reporter: vnosov <vnosov>
Component: glusterdAssignee: bugs <bugs>
Status: CLOSED WONTFIX QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.6.2CC: bugs, vnosov
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-01 04:42:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description vnosov 2015-04-23 20:21:36 UTC
Description of problem:

Non-existing brick was used at "add-brick" command. The command failed. When it was repeated with the correct brick it failed again with message that the other brick from the command "is already part of a volume". All calls to "remove-brick" failed. It seems the one option is left is to delete volume. It's not appropriate if the volume has data. 


Version-Release number of selected component (if applicable):
GlusterFS 3.6.2


How reproducible:


Steps to Reproduce:
1. Have disperse volume:

[root@SC92 log]# gluster volume info dv3

Volume Name: dv3
Type: Disperse
Volume ID: 9547a2c0-1136-4fc9-915f-47d016a30484
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.10.60.182:/exports/182-ts3/dv3
Brick2: 10.10.60.90:/exports/90-ts3/dv3
Brick3: 10.10.60.92:/exports/92-ts3/dv3
Options Reconfigured:
snap-activate-on-create: enable

2. Issue command "add-brick" on node SC92, use invalid name for brick on node SC90:

[root@SC92 log]# gluster volume add-brick dv3 10.10.60.182:/exports/182-ts4/dv3 10.10.60.90:/exports/90-ts42/dv3 10.10.60.92:/exports/92-ts4/dv3
volume add-brick: failed: Staging failed on 10.10.60.90. Error: Failed to create brick directory for brick 10.10.60.90:/exports/90-ts42/dv3. Reason : No such file or directory



3. Issue command "add-brick" on node SC92, use valid name for brick on node SC90:

[root@SC92 log]# gluster volume add-brick dv3 10.10.60.182:/exports/182-ts4/dv3 10.10.60.90:/exports/90-ts4/dv3 10.10.60.92:/exports/92-ts4/dv3
volume add-brick: failed: /exports/92-ts4/dv3 is already part of a volume

4. [root@SC92 log]# gluster volume indo dv3
unrecognized word: indo (position 1)
[root@SC92 log]# gluster volume info dv3

Volume Name: dv3
Type: Disperse
Volume ID: 9547a2c0-1136-4fc9-915f-47d016a30484
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.10.60.182:/exports/182-ts3/dv3
Brick2: 10.10.60.90:/exports/90-ts3/dv3
Brick3: 10.10.60.92:/exports/92-ts3/dv3
Options Reconfigured:
snap-activate-on-create: enable



Actual results:


Expected results:


Additional info:

Comment 1 vnosov 2015-04-24 16:09:11 UTC
The problem is with brick that was used for expansion. After command "add-brick" fails some attributes are left on expansion bricks. These attributes do not let use these bricks at "add-brick" command later. The volume itself is OK.

Comment 2 Pranith Kumar K 2015-05-09 17:58:18 UTC
Assigning to glusterd based on comment-1

Comment 3 Atin Mukherjee 2016-08-01 04:42:34 UTC
This is not a security bug, not going to fix this in 3.6.x because of
http://www.gluster.org/pipermail/gluster-users/2016-July/027682.html

Comment 4 Atin Mukherjee 2016-08-01 04:43:54 UTC
If the issue persists in the latest releases, please feel free to clone them