Bug 1126437

Summary: unsuccessful volume creation leaves bricks in an unusable state
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Gerald Sternagl <gsternagl>
Component: glusterdAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED WONTFIX QA Contact: storage-qa-internal <storage-qa-internal>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.0CC: amukherj, nlevinki, vagarwal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-09-22 09:26:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gerald Sternagl 2014-08-04 12:51:06 UTC
Description of problem:
When trying to create a gluster volume from the CLI and anything fails during  creation due to whatever reason, the bricks which could be touched until the error occurred, remain in a touched state although the volume creation wasn't successful. With touched I mean that some extended attributes are written to the bricks and these bricks can't be used again until these attributes are deleted. 

IMHO the volume creation should always be transactional: either it's successful or it fails. When it fails then all changes to the bricks should be reversed and no "dead" bricks should be left behind.

Version-Release number of selected component (if applicable):
3.0 and earlier

How reproducible:
Say we have two servers: rhs3node1 and rhs3node2
each one has a brick directory called /brick/vol1

Now we try to create a volume with a mis-spelled second server name:

# gluster vol create rhs3node1:/brick/vol1 rhs3node5:/brick/vol1

Obviously this fails as rhs3node5 doesn't exist. Let's say you have 20 servers and you somewhere had a typo. First the error message doesn't tell you where you had your typo and you would need to repeat the process of clearing extended attributes for all servers which succeeded during creation. As you don't know for sure where the error was, in the worst case you would have to look at each server individually.


Steps to Reproduce:
1. gluster peer rhs3node2
2. gluster vol create gv01 rhs3node1:/brick/vol1 rhs3node5:/brick/vol1
3. attr -l /brick/vol1

Actual results:
Attribute "glusterfs.volume-id" is set on /brick/vol1 although volume couldn't be created successful.

Expected results:
No extended attributes should be set unless volume creation is successful. Otherwise all changes to the bricks should be reversed.


Additional info:

Comment 2 Atin Mukherjee 2015-09-22 09:26:42 UTC
This behaviour as per the design and we are not planning to fix this in near future. Reopen if you really have a use case which hits this issue and a blocker.