Bug 1112250

Summary: [SNAPSHOT]: On attaching a new node to the cluster while snapshot create was in progress , one of the snapshots failed with "glusterd quorum not met"
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: senaik
Component: snapshotAssignee: Avra Sengupta <asengupt>
Status: CLOSED DEFERRED QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rhgs-3.0CC: asengupt, asriram, josferna, mlawrenc, rhs-bugs, storage-qa-internal, vagarwal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: SNAPSHOT
Fixed In Version: Doc Type: Known Issue
Doc Text:
Probing/detaching a new peer during any snapshot operation is not supported.
Story Points: ---
Clone Of:
: 1114403 (view as bug list) Environment:
Last Closed: 2016-01-29 13:41:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1085278, 1114403, 1216951    

Description senaik 2014-06-23 12:24:10 UTC
Description of problem:
======================
On attaching a new node to the cluster while snapshot create was in progress , one of the snapshots failed with "glusterd quorum not met"

Version-Release number of selected component (if applicable):
===========================================================
 glusterfs 3.6.0.20 built on Jun 19 2014

How reproducible:
================
1/1


Steps to Reproduce:
===================
I got the following error message while attaching a new node to the cluster while snapshot create was in progress 

snapshot create: success: Snap snap4 created successfully
snapshot create: failed: glusterds are not in quorum
Snapshot command failed
snapshot create: success: Snap snap6 created successfully

All glusterds were up and running on the nodes , but still we get the message that glusterd quorum is not met. 

----------------Part of log---------------------

name:snapshot15.lab.eng.blr.redhat.com
[2014-06-23 06:03:31.887252] I [glusterd-handler.c:2522:__glusterd_handle_friend_update] 0-: Received uuid: 7e97d0f0-8ae9-40eb-b822-952cc5a8dc46, host
name:10.70.44.54
[2014-06-23 06:03:32.166226] W [glusterd-utils.c:12909:glusterd_snap_quorum_check_for_create] 0-management: glusterds are not in quorum
[2014-06-23 06:03:32.166352] W [glusterd-utils.c:13058:glusterd_snap_quorum_check] 0-management: Quorum checkfailed during snapshot create command
[2014-06-23 06:03:32.166374] W [glusterd-mgmt.c:1846:glusterd_mgmt_v3_initiate_snap_phases] 0-management: quorum check failed
[2014-06-23 06:03:32.166416] W [glusterd-snapshot.c:7012:glusterd_snapshot_postvalidate] 0-management: Snapshot create post-validation failed
[2014-06-23 06:03:32.166433] W [glusterd-mgmt.c:248:gd_mgmt_v3_post_validate_fn] 0-management: postvalidate operation failed
[2014-06-23 06:03:32.166451] E [glusterd-mgmt.c:1335:glusterd_mgmt_v3_post_validate] 0-management: Post Validation failed for operation Snapshot on local node
[2014-06-23 06:03:32.166467] E [glusterd-mgmt.c:1944:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Post Validation Failed
[2014-06-23 06:03:33.972792] I [glusterd-handshake.c:1014:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30000

Actual results:
==============
snapshot create fails with "glusterd quorum not met" error message


Expected results:
=================
Snapshot create should not fail with "glusterd quorum not met" error message, when all glusterd was up and running on all nodes. 



Additional info:

Comment 3 Joseph Elwin Fernandes 2014-06-30 02:31:27 UTC
1) Couldn't reproduce the issue by issuing snapshot create and peer probe from the same host simultaneously
2) But was able to reproduce the issue by issuing snapshot create and peer probe from different host simultaneously. 
3) The cause for this issue is , During any snapshot operation the glusterd quorum is checked for total peer list of the node. This is not necessary as glusterd quorum should be check for the list of nodes that where chosen for this operation.
 In  glusterd_mgmt_v3_initiate_snap_phases(), As a preparation, before the 3 phases(pre-validate,commit and post-validate), a transaction list is prepared in this->private->xaction_peers. This list of peers will be participating in the operation, through-out the 3 phases. During a operation, the glusterd quorum should be checked only for these peers, as the checking of the quorum is w.r.t this current operation.

4) Fix: During a snapshot operation, glusterd quorum will be checked only for the 
transaction peers list.

Comment 4 Joseph Elwin Fernandes 2014-06-30 02:51:51 UTC
Fix submitted upstream:

REVIEW: http://review.gluster.org/8200 (glusterd/snapshot: fixing glusterd quorum during snap operation) posted (#1) for review on master by Joseph Fernandes (josferna)

Comment 6 Avra Sengupta 2015-03-30 09:55:37 UTC
Not targeting for 3.1

Comment 8 monti lawrence 2015-07-22 15:31:56 UTC
Doc text is edited. Please sign off to be included in Known Issues.

Comment 9 Avra Sengupta 2015-07-27 07:14:05 UTC
Doc text looks good. Verified.

Comment 10 Avra Sengupta 2015-07-28 05:49:57 UTC
Not targetting for 3.1.1

Comment 11 Avra Sengupta 2015-08-12 05:44:26 UTC
This Bug is not fixed with the submitted patch and it requires design changes in glusterd. Hence moving this back to New.

Comment 13 Avra Sengupta 2016-01-29 13:41:11 UTC
Current Glusterd architecture does not support implementation of this feature. Therefore this feature request is deferred till Gluterd 2.0.