Bug 1182554

Summary: [SNAPSHOT]: In a n-way replica volume, snapshot should not be taken, even if one brick is down.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Avra Sengupta <asengupt>
Component: snapshotAssignee: rjoseph
Status: CLOSED ERRATA QA Contact: senaik
Severity: unspecified Docs Contact:
Priority: high    
Version: rhgs-3.0CC: annair, asrivast, rhs-bugs, senaik, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.0.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: SNAPSHOT
Fixed In Version: glusterfs-3.6.0.45-1 Doc Type: Known Issue
Doc Text:
Currently brick quorum support for snapshot is not available therefore snapshot create will fail even if one brick is down. Snapshot can be taken only if all the bricks of the volume is up.
Story Points: ---
Clone Of:
: 1184344 (view as bug list) Environment:
Last Closed: 2015-03-26 06:35:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1182947, 1184344, 1186189    

Description Avra Sengupta 2015-01-15 12:47:16 UTC
Description of problem:
In a n-way replica volume, snapshot create should fail even if one brick is down.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
snapshot create checks for quorum and if quorum is met, snapshot is taken, even if a few bricks are down.


Expected results:
snapshot create should fail even if one brick is down.

Additional info:

Comment 1 Avra Sengupta 2015-01-29 10:49:30 UTC
Fixed with https://code.engineering.redhat.com/gerrit/40933

Comment 2 senaik 2015-02-20 12:22:46 UTC
Version :
=========
glusterfs 3.6.0.45 built on Feb 12 2015 22:58:40

Verified on 6x2 and 6x3 dist-rep volumes and snap create fails with and without force option when brick/node is down 

gluster v status 
Status of volume: vol_test
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick snapshot13.lab.eng.blr.redhat.com:/rhs/brick3/b3	N/A	N	2564
Brick snapshot14.lab.eng.blr.redhat.com:/rhs/brick3/b3	49152	Y	14458
Brick snapshot15.lab.eng.blr.redhat.com:/rhs/brick3/b3	49152	Y	15846
Brick snapshot16.lab.eng.blr.redhat.com:/rhs/brick3/b3	49152	Y	15546
Brick snapshot13.lab.eng.blr.redhat.com:/rhs/brick4/b4	49153	Y	26568
Brick snapshot14.lab.eng.blr.redhat.com:/rhs/brick4/b4	49153	Y	14138
Brick snapshot15.lab.eng.blr.redhat.com:/rhs/brick4/b4	49153	Y	15858
Brick snapshot16.lab.eng.blr.redhat.com:/rhs/brick4/b4	49153	Y	15558
Brick snapshot13.lab.eng.blr.redhat.com:/rhs/brick5/b5	49154	Y	26580
Brick snapshot14.lab.eng.blr.redhat.com:/rhs/brick5/b5	49154	Y	6372
Brick snapshot15.lab.eng.blr.redhat.com:/rhs/brick5/b5	49154	Y	15870
Brick snapshot16.lab.eng.blr.redhat.com:/rhs/brick5/b5	49154	Y	15570
NFS Server on localhost					2049	Y	2577
Self-heal Daemon on localhost				N/A	Y	2586
NFS Server on snapshot14.lab.eng.blr.redhat.com		2049	Y	14471
Self-heal Daemon on snapshot14.lab.eng.blr.redhat.com	N/A	Y	14480
NFS Server on snapshot16.lab.eng.blr.redhat.com		2049	Y	23746
Self-heal Daemon on snapshot16.lab.eng.blr.redhat.com	N/A	Y	23756
NFS Server on snapshot15.lab.eng.blr.redhat.com		2049	Y	24055
Self-heal Daemon on snapshot15.lab.eng.blr.redhat.com	N/A	Y	24064
 
Task Status of Volume vol_test
------------------------------------------------------------------------------
There are no active volume tasks


gluster snapshot create SN2 vol_test
snapshot create: failed: brick snapshot13.lab.eng.blr.redhat.com:/rhs/brick3/b3 is not started. Please start the stopped brick and then issue snapshot create command or use [force] option in snapshot create to override this behavior.
Snapshot command failed

[root@snapshot13 ~]# gluster snapshot create SN2 vol_test force
snapshot create: failed: quorum is not met
Snapshot command failed

Marking the bug as 'Verified'

Note:
If a brick on another node goes down then snap create fails with "Pre-Validation" error instead of error message as shown above. 

Also when a node is down, snap create on nx2 fails with "quorum not met" and snap create on nx3  fails with "One or more bricks may be down" . The error messages should be uniform in both nx2 and nx3 volumes
Both these issues are tracked by bz 1085202

Comment 4 errata-xmlrpc 2015-03-26 06:35:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0682.html