Bug 1535438

Summary: Take full lock on files in 3 way replication
Product: [Community] GlusterFS Reporter: Karthik U S <ksubrahm>
Component: replicateAssignee: Karthik U S <ksubrahm>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.13CC: bugs, pkarampu, ravishankar
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-4.0.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1536257 (view as bug list) Environment:
Last Closed: 2018-01-23 21:37:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1536257, 1552414    

Description Karthik U S 2018-01-17 11:57:46 UTC
Description of problem:

Need a way to take full lock on files in replica 3 volume, which helps to prevent the files going to split brain.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Worker Ant 2018-01-17 12:13:35 UTC
REVIEW: https://review.gluster.org/19218 (cluster/afr: Adding option to take full file lock) posted (#1) for review on master by Karthik U S

Comment 2 Worker Ant 2018-01-19 00:15:53 UTC
COMMIT: https://review.gluster.org/19218 committed in master by \"Karthik U S\" <ksubrahm> with a commit message- cluster/afr: Adding option to take full file lock

Problem:
In replica 3 volumes there is a possibilities of ending up in split
brain scenario, when multiple clients writing data on the same file
at non overlapping regions in parallel.

Scenario:
- Initially all the copies are good and all the clients gets the value
  of data readables as all good.
- Client C0 performs write W1 which fails on brick B0 and succeeds on
  other two bricks.
- C1 performs write W2 which fails on B1 and succeeds on other two bricks.
- C2 performs write W3 which fails on B2 and succeeds on other two bricks.
- All the 3 writes above happen in parallel and fall on different ranges
  so afr takes granular locks and all the writes are performed in parallel.
  Since each client had data-readables as good, it does not see
  file going into split-brain in the in_flight_split_brain check, hence
  performs the post-op marking the pending xattrs. Now all the bricks
  are being blamed by each other, ending up in split-brain.

Fix:
Have an option to take either full lock or range lock on files while
doing data transactions, to prevent the possibility of ending up in
split brains. With this change, by default the files will take full
lock while doing IO. If you want to make use of the old range lock
change the value of "cluster.full-lock" to "no".

Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962
BUG: 1535438
Signed-off-by: karthik-us <ksubrahm>

Comment 3 Worker Ant 2018-01-19 01:20:12 UTC
REVIEW: https://review.gluster.org/19237 (cluster/afr: Adding option to take full file lock) posted (#1) for review on release-3.13 by Pranith Kumar Karampuri

Comment 4 Worker Ant 2018-01-19 14:25:19 UTC
COMMIT: https://review.gluster.org/19237 committed in release-3.13 by \"Pranith Kumar Karampuri\" <pkarampu> with a commit message- cluster/afr: Adding option to take full file lock

Problem:
In replica 3 volumes there is a possibilities of ending up in split
brain scenario, when multiple clients writing data on the same file
at non overlapping regions in parallel.

Scenario:
- Initially all the copies are good and all the clients gets the value
  of data readables as all good.
- Client C0 performs write W1 which fails on brick B0 and succeeds on
  other two bricks.
- C1 performs write W2 which fails on B1 and succeeds on other two bricks.
- C2 performs write W3 which fails on B2 and succeeds on other two bricks.
- All the 3 writes above happen in parallel and fall on different ranges
  so afr takes granular locks and all the writes are performed in parallel.
  Since each client had data-readables as good, it does not see
  file going into split-brain in the in_flight_split_brain check, hence
  performs the post-op marking the pending xattrs. Now all the bricks
  are being blamed by each other, ending up in split-brain.

Fix:
Have an option to take either full lock or range lock on files while
doing data transactions, to prevent the possibility of ending up in
split brains. With this change, by default the files will take full
lock while doing IO. If you want to make use of the old range lock
change the value of "cluster.full-lock" to "no".

Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962
BUG: 1535438
Signed-off-by: karthik-us <ksubrahm>

Comment 5 Shyamsundar 2018-01-23 21:37:57 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.2, please open a new bug report.

glusterfs-3.13.2 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-January/000089.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 6 Karthik U S 2018-01-31 05:52:03 UTC
*** Bug 1536257 has been marked as a duplicate of this bug. ***

Comment 7 Shyamsundar 2018-03-15 11:25:40 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.0, please open a new bug report.

glusterfs-4.0.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-March/000092.html
[2] https://www.gluster.org/pipermail/gluster-users/