Description of problem: ====================== On attaching a new node to the cluster while snapshot create was in progress , one of the snapshots failed with "glusterd quorum not met" Version-Release number of selected component (if applicable): =========================================================== glusterfs 3.6.0.20 built on Jun 19 2014 How reproducible: ================ 1/1 Steps to Reproduce: =================== I got the following error message while attaching a new node to the cluster while snapshot create was in progress snapshot create: success: Snap snap4 created successfully snapshot create: failed: glusterds are not in quorum Snapshot command failed snapshot create: success: Snap snap6 created successfully All glusterds were up and running on the nodes , but still we get the message that glusterd quorum is not met. ----------------Part of log--------------------- name:snapshot15.lab.eng.blr.redhat.com [2014-06-23 06:03:31.887252] I [glusterd-handler.c:2522:__glusterd_handle_friend_update] 0-: Received uuid: 7e97d0f0-8ae9-40eb-b822-952cc5a8dc46, host name:10.70.44.54 [2014-06-23 06:03:32.166226] W [glusterd-utils.c:12909:glusterd_snap_quorum_check_for_create] 0-management: glusterds are not in quorum [2014-06-23 06:03:32.166352] W [glusterd-utils.c:13058:glusterd_snap_quorum_check] 0-management: Quorum checkfailed during snapshot create command [2014-06-23 06:03:32.166374] W [glusterd-mgmt.c:1846:glusterd_mgmt_v3_initiate_snap_phases] 0-management: quorum check failed [2014-06-23 06:03:32.166416] W [glusterd-snapshot.c:7012:glusterd_snapshot_postvalidate] 0-management: Snapshot create post-validation failed [2014-06-23 06:03:32.166433] W [glusterd-mgmt.c:248:gd_mgmt_v3_post_validate_fn] 0-management: postvalidate operation failed [2014-06-23 06:03:32.166451] E [glusterd-mgmt.c:1335:glusterd_mgmt_v3_post_validate] 0-management: Post Validation failed for operation Snapshot on local node [2014-06-23 06:03:32.166467] E [glusterd-mgmt.c:1944:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Post Validation Failed [2014-06-23 06:03:33.972792] I [glusterd-handshake.c:1014:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30000 Actual results: ============== snapshot create fails with "glusterd quorum not met" error message Expected results: ================= Snapshot create should not fail with "glusterd quorum not met" error message, when all glusterd was up and running on all nodes. Additional info:
sosreports : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/snapshots/1104478/
1) Couldn't reproduce the issue by issuing snapshot create and peer probe from the same host simultaneously 2) But was able to reproduce the issue by issuing snapshot create and peer probe from different host simultaneously. 3) The cause for this issue is , During any snapshot operation the glusterd quorum is checked for total peer list of the node. This is not necessary as glusterd quorum should be check for the list of nodes that where chosen for this operation. In glusterd_mgmt_v3_initiate_snap_phases(), As a preparation, before the 3 phases(pre-validate,commit and post-validate), a transaction list is prepared in this->private->xaction_peers. This list of peers will be participating in the operation, through-out the 3 phases. During a operation, the glusterd quorum should be checked only for these peers, as the checking of the quorum is w.r.t this current operation. 4) Fix: During a snapshot operation, glusterd quorum will be checked only for the transaction peers list.
Fix submitted upstream: REVIEW: http://review.gluster.org/8200 (glusterd/snapshot: fixing glusterd quorum during snap operation) posted (#1) for review on master by Joseph Fernandes (josferna)
Not targeting for 3.1
Doc text is edited. Please sign off to be included in Known Issues.
Doc text looks good. Verified.
Not targetting for 3.1.1
This Bug is not fixed with the submitted patch and it requires design changes in glusterd. Hence moving this back to New.
Current Glusterd architecture does not support implementation of this feature. Therefore this feature request is deferred till Gluterd 2.0.