Bug 1552414

Summary:	Take full lock on files in 3 way replication
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Karthik U S <ksubrahm>
Component:	replicate	Assignee:	Karthik U S <ksubrahm>
Status:	CLOSED ERRATA	QA Contact:	Vijay Avuthu <vavuthu>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.4	CC:	ksubrahm, pkarampu, ravishankar, rhs-bugs, srmukher, storage-qa-internal
Target Milestone:	---
Target Release:	RHGS 3.4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.12.2-6	Doc Type:	Bug Fix
Doc Text:	In replica 3 volumes, there was a possibility of ending up in split brain, when multiple clients simultaneously write data on the same file at non overlapping regions. With the new cluster.full-lock option, you can take full file lock which helps you in maintaining data consistency and avoid ending up in split-brain. By default, the cluster.full-lock option is set to take full file lock and can be reconfigured to take range locks, if needed.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-09-04 06:44:11 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1535438
Bug Blocks:	1503137

Description Karthik U S 2018-03-07 05:43:42 UTC

Description of problem:
In replica 3 volumes there is a possibilities of ending up in split brain, when multiple clients writing data on the same file at non overlapping regions in parallel.

Version-Release number of selected component (if applicable):


How reproducible:
It is very rare to hit hit this case, but we need to imitate the following scenario using gdb to test this.

Steps to Reproduce:
- Client C0 performs write W1 which fails on brick B0 and succeeds on other two bricks.
- C1 performs write W2 which fails on B1 and succeeds on other two bricks. - C2 performs write W3 which fails on B2 and succeeds on other two bricks.
- All the 3 writes above happen in parallel and fall on different ranges so afr takes granular locks and all the writes are performed in parallel. Since each client had data-readables as good initially, it does not see file going into split-brain in the in_flight_split_brain check, hence performs the post-op marking the pending xattrs. Now all the bricks are being blamed by each other, ending up in split-brain.


Actual results:
We will end up in split brain in replica 3 volumes.

Expected results:
We should not end up in split brain in replica 3 volumes.

Additional info:

Comment 2 Karthik U S 2018-03-07 08:44:55 UTC

Downstream patch: https://code.engineering.redhat.com/gerrit/#/c/131966/

Comment 3 Karthik U S 2018-03-09 07:16:43 UTC

Upstream patch: https://review.gluster.org/#/c/19218/

Comment 8 Vijay Avuthu 2018-08-08 07:21:37 UTC

Update:
=======

Build Used: glusterfs-3.12.2-15.el7rhgs.x86_64

Discussed with Karthik and below are the scenarios covered

Scenario 1:

1) create 1 * 3 volume and start
2) Disable eager lock
3) check full-lock is on or not  ( should be on )
4) write 1GB of file from mount point
5) take state dump for the volume while write is in progress
6) verify the state dump 
       - there should be ONE active lock which contains start=0, len=0
       - others should be blocked

scenario 2:

1) create 1 * 3 volume and start
2) Disable eager lock
3) disable full-lock 
4) write 1GB of file from mount point
5) take state dump for the volume while write is in progress
6) verify the state dump 
       - there should be range locks
       - locks are not blocked

scenario 3:

1) create 1 * 3 volume and start
2) Enable eager lock
3) enable full-lock      
4) write 1GB of same file from from 2 clients
5) take state dump for the volume while write is in progress
6) verify the state dump 
       - there should be ONE active lock which contains start=0, len=0 
       - others should be blocked


scenario 4:

1) create 1 * 3 volume and start
2) Enable eager lock
3) disable full-lock 
4) overwrite 1GB of same file ( which was written in previous scenario) from from 2 clients
5) take state dump for the volume while write is in progress
6) verify the state dump 
       - there should be range locks which are active


Moving status to verified

Comment 12 errata-xmlrpc 2018-09-04 06:44:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607