Bug 1552414

Summary: Take full lock on files in 3 way replication
Product: Red Hat Gluster Storage Reporter: Karthik U S <ksubrahm>
Component: replicateAssignee: Karthik U S <ksubrahm>
Status: CLOSED ERRATA QA Contact: Vijay Avuthu <vavuthu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: ksubrahm, pkarampu, ravishankar, rhs-bugs, srmukher, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-6 Doc Type: Bug Fix
Doc Text:
In replica 3 volumes, there was a possibility of ending up in split brain, when multiple clients simultaneously write data on the same file at non overlapping regions. With the new cluster.full-lock option, you can take full file lock which helps you in maintaining data consistency and avoid ending up in split-brain. By default, the cluster.full-lock option is set to take full file lock and can be reconfigured to take range locks, if needed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-04 06:44:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On: 1535438    
Bug Blocks: 1503137    

Description Karthik U S 2018-03-07 05:43:42 UTC
Description of problem:
In replica 3 volumes there is a possibilities of ending up in split brain, when multiple clients writing data on the same file at non overlapping regions in parallel.

Version-Release number of selected component (if applicable):


How reproducible:
It is very rare to hit hit this case, but we need to imitate the following scenario using gdb to test this.

Steps to Reproduce:
- Client C0 performs write W1 which fails on brick B0 and succeeds on other two bricks.
- C1 performs write W2 which fails on B1 and succeeds on other two bricks. - C2 performs write W3 which fails on B2 and succeeds on other two bricks.
- All the 3 writes above happen in parallel and fall on different ranges so afr takes granular locks and all the writes are performed in parallel. Since each client had data-readables as good initially, it does not see file going into split-brain in the in_flight_split_brain check, hence performs the post-op marking the pending xattrs. Now all the bricks are being blamed by each other, ending up in split-brain.


Actual results:
We will end up in split brain in replica 3 volumes.

Expected results:
We should not end up in split brain in replica 3 volumes.

Additional info:

Comment 2 Karthik U S 2018-03-07 08:44:55 UTC
Downstream patch: https://code.engineering.redhat.com/gerrit/#/c/131966/

Comment 3 Karthik U S 2018-03-09 07:16:43 UTC
Upstream patch: https://review.gluster.org/#/c/19218/

Comment 8 Vijay Avuthu 2018-08-08 07:21:37 UTC
Update:
=======

Build Used: glusterfs-3.12.2-15.el7rhgs.x86_64

Discussed with Karthik and below are the scenarios covered

Scenario 1:

1) create 1 * 3 volume and start
2) Disable eager lock
3) check full-lock is on or not  ( should be on )
4) write 1GB of file from mount point
5) take state dump for the volume while write is in progress
6) verify the state dump 
       - there should be ONE active lock which contains start=0, len=0
       - others should be blocked

scenario 2:

1) create 1 * 3 volume and start
2) Disable eager lock
3) disable full-lock 
4) write 1GB of file from mount point
5) take state dump for the volume while write is in progress
6) verify the state dump 
       - there should be range locks
       - locks are not blocked

scenario 3:

1) create 1 * 3 volume and start
2) Enable eager lock
3) enable full-lock      
4) write 1GB of same file from from 2 clients
5) take state dump for the volume while write is in progress
6) verify the state dump 
       - there should be ONE active lock which contains start=0, len=0 
       - others should be blocked


scenario 4:

1) create 1 * 3 volume and start
2) Enable eager lock
3) disable full-lock 
4) overwrite 1GB of same file ( which was written in previous scenario) from from 2 clients
5) take state dump for the volume while write is in progress
6) verify the state dump 
       - there should be range locks which are active


Moving status to verified

Comment 12 errata-xmlrpc 2018-09-04 06:44:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607