Bug 1536257

Summary: Take full lock on files in 3 way replication
Product: [Community] GlusterFS Reporter: Pranith Kumar K <pkarampu>
Component: replicateAssignee: bugs <bugs>
Status: CLOSED DUPLICATE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.13CC: bugs, dwojslaw, ksubrahm, pkarampu, ravishankar
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1535438 Environment:
Last Closed: 2018-01-31 05:52:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1535438    
Bug Blocks:    

Description Pranith Kumar K 2018-01-19 01:10:57 UTC
+++ This bug was initially created as a clone of Bug #1535438 +++

Description of problem:

Need a way to take full lock on files in replica 3 volume, which helps to prevent the files going to split brain.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Worker Ant on 2018-01-17 07:13:35 EST ---

REVIEW: https://review.gluster.org/19218 (cluster/afr: Adding option to take full file lock) posted (#1) for review on master by Karthik U S

--- Additional comment from Worker Ant on 2018-01-18 19:15:53 EST ---

COMMIT: https://review.gluster.org/19218 committed in master by \"Karthik U S\" <ksubrahm> with a commit message- cluster/afr: Adding option to take full file lock

Problem:
In replica 3 volumes there is a possibilities of ending up in split
brain scenario, when multiple clients writing data on the same file
at non overlapping regions in parallel.

Scenario:
- Initially all the copies are good and all the clients gets the value
  of data readables as all good.
- Client C0 performs write W1 which fails on brick B0 and succeeds on
  other two bricks.
- C1 performs write W2 which fails on B1 and succeeds on other two bricks.
- C2 performs write W3 which fails on B2 and succeeds on other two bricks.
- All the 3 writes above happen in parallel and fall on different ranges
  so afr takes granular locks and all the writes are performed in parallel.
  Since each client had data-readables as good, it does not see
  file going into split-brain in the in_flight_split_brain check, hence
  performs the post-op marking the pending xattrs. Now all the bricks
  are being blamed by each other, ending up in split-brain.

Fix:
Have an option to take either full lock or range lock on files while
doing data transactions, to prevent the possibility of ending up in
split brains. With this change, by default the files will take full
lock while doing IO. If you want to make use of the old range lock
change the value of "cluster.full-lock" to "no".

Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962
BUG: 1535438
Signed-off-by: karthik-us <ksubrahm>

Comment 1 Karthik U S 2018-01-31 05:52:03 UTC
Closing this bug hence it is fixed in 3.13 as part of the BZ #1535438.

*** This bug has been marked as a duplicate of bug 1535438 ***