Bug 1232686 - quorum calculation might go for toss for a concurrent peer probe command
Summary: quorum calculation might go for toss for a concurrent peer probe command
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Atin Mukherjee
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1235512
TreeView+ depends on / blocked
 
Reported: 2015-06-17 09:59 UTC by Atin Mukherjee
Modified: 2016-06-16 13:13 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.8rc2
Clone Of:
: 1235512 (view as bug list)
Environment:
Last Closed: 2016-06-16 13:13:39 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Atin Mukherjee 2015-06-17 09:59:08 UTC
Description of problem:

Current codebase is skipping quorum calculation check if there a peer is in process of joining the cluster (quorum_contrib = QUORUM_WAITING). This might very well impact the server side quorum feature. We would allow operations to proceed even if the server side quorum is not met. 

The reproducer is pretty hard as to get to a state of a peer where its quorum_contrib is set to QUORUM_WAITING. However running the following test case might fail in the last volume stop command.

#!/bin/bash                                                                     . $(dirname $0)/../../include.rc                                                
. $(dirname $0)/../../volume.rc                                                 
. $(dirname $0)/../../cluster.rc                                                
                                                                                
cleanup;                                                                        
                                                                                
TEST launch_cluster 4;                                                          
                                                                                
TEST $CLI_1 peer probe $H2;                                                     
TEST $CLI_1 peer probe $H3;                                                     
                                                                                
EXPECT_WITHIN $PROBE_TIMEOUT 2 peer_count                                       
                                                                                
TEST $CLI_1 volume create $V0 $H1:$B1/$V0 $H2:$B2/$V0                           
TEST $CLI_1 volume set $V0 cluster.server-quorum-type server                    
TEST $CLI_1 volume start $V0                                                    
                                                                                
TEST kill_glusterd 2                                                            
TEST kill_glusterd 3                                                            
                                                                                
TEST ! $CLI_1 volume stop $V0;                                                  
                                                                                
cleanup;  

Version-Release number of selected component (if applicable):
Mainline

How reproducible:
Rare

Steps to Reproduce:
1. Source install gluster
2. run the above test cases in a loop and it might fail in the last test case

Actual results:
last test case might fail

Expected results:
All the test cases should pass every time.

Additional info:

Comment 1 Anand Avati 2015-06-17 10:00:45 UTC
REVIEW: http://review.gluster.org/11275 (glusterd: fix quorum calculation logic) posted (#1) for review on master by Atin Mukherjee (amukherj)

Comment 2 Anand Avati 2015-06-23 06:18:42 UTC
REVIEW: http://review.gluster.org/11275 (glusterd: fix quorum calculation logic) posted (#2) for review on master by Atin Mukherjee (amukherj)

Comment 3 Anand Avati 2015-06-24 06:54:06 UTC
REVIEW: http://review.gluster.org/11275 (glusterd: fix quorum calculation logic) posted (#3) for review on master by Atin Mukherjee (amukherj)

Comment 4 Anand Avati 2015-06-25 03:30:16 UTC
COMMIT: http://review.gluster.org/11275 committed in master by Atin Mukherjee (amukherj) 
------
commit 0be38bdb4007c1bcb51545057e6402f6e14922cd
Author: Atin Mukherjee <amukherj>
Date:   Wed Jun 17 14:20:14 2015 +0530

    glusterd: fix quorum calculation logic
    
    glusterd_get_quorum_cluster_counts () skips quorum calculation if it finds any
    of its peer in QUORUM_WAITING state. This means if any peer probe has been
    triggered and at the same point of time a transaction has been initiated, it
    might pass through the server quorum check which it should not.
    
    Change-Id: I44eda8905eab3349c9ebf2842e7131d4e758a528
    BUG: 1232686
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/11275
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Reviewed-by: Anand Nekkunti <anekkunt>
    Tested-by: NetBSD Build System <jenkins.org>

Comment 5 Nagaprasad Sathyanarayana 2015-10-25 14:44:05 UTC
Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well.

Comment 6 Niels de Vos 2016-06-16 13:13:39 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.