Bug 1552414 - Take full lock on files in 3 way replication
Summary: Take full lock on files in 3 way replication
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.4.0
Assignee: Karthik U S
QA Contact: Vijay Avuthu
URL:
Whiteboard:
Depends On: 1535438
Blocks: 1503137
TreeView+ depends on / blocked
 
Reported: 2018-03-07 05:43 UTC by Karthik U S
Modified: 2018-09-20 04:40 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.12.2-6
Doc Type: Bug Fix
Doc Text:
In replica 3 volumes, there was a possibility of ending up in split brain, when multiple clients simultaneously write data on the same file at non overlapping regions. With the new cluster.full-lock option, you can take full file lock which helps you in maintaining data consistency and avoid ending up in split-brain. By default, the cluster.full-lock option is set to take full file lock and can be reconfigured to take range locks, if needed.
Clone Of:
Environment:
Last Closed: 2018-09-04 06:44:11 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 0 None None None 2018-09-04 06:45:12 UTC

Description Karthik U S 2018-03-07 05:43:42 UTC
Description of problem:
In replica 3 volumes there is a possibilities of ending up in split brain, when multiple clients writing data on the same file at non overlapping regions in parallel.

Version-Release number of selected component (if applicable):


How reproducible:
It is very rare to hit hit this case, but we need to imitate the following scenario using gdb to test this.

Steps to Reproduce:
- Client C0 performs write W1 which fails on brick B0 and succeeds on other two bricks.
- C1 performs write W2 which fails on B1 and succeeds on other two bricks. - C2 performs write W3 which fails on B2 and succeeds on other two bricks.
- All the 3 writes above happen in parallel and fall on different ranges so afr takes granular locks and all the writes are performed in parallel. Since each client had data-readables as good initially, it does not see file going into split-brain in the in_flight_split_brain check, hence performs the post-op marking the pending xattrs. Now all the bricks are being blamed by each other, ending up in split-brain.


Actual results:
We will end up in split brain in replica 3 volumes.

Expected results:
We should not end up in split brain in replica 3 volumes.

Additional info:

Comment 2 Karthik U S 2018-03-07 08:44:55 UTC
Downstream patch: https://code.engineering.redhat.com/gerrit/#/c/131966/

Comment 3 Karthik U S 2018-03-09 07:16:43 UTC
Upstream patch: https://review.gluster.org/#/c/19218/

Comment 8 Vijay Avuthu 2018-08-08 07:21:37 UTC
Update:
=======

Build Used: glusterfs-3.12.2-15.el7rhgs.x86_64

Discussed with Karthik and below are the scenarios covered

Scenario 1:

1) create 1 * 3 volume and start
2) Disable eager lock
3) check full-lock is on or not  ( should be on )
4) write 1GB of file from mount point
5) take state dump for the volume while write is in progress
6) verify the state dump 
       - there should be ONE active lock which contains start=0, len=0
       - others should be blocked

scenario 2:

1) create 1 * 3 volume and start
2) Disable eager lock
3) disable full-lock 
4) write 1GB of file from mount point
5) take state dump for the volume while write is in progress
6) verify the state dump 
       - there should be range locks
       - locks are not blocked

scenario 3:

1) create 1 * 3 volume and start
2) Enable eager lock
3) enable full-lock      
4) write 1GB of same file from from 2 clients
5) take state dump for the volume while write is in progress
6) verify the state dump 
       - there should be ONE active lock which contains start=0, len=0 
       - others should be blocked


scenario 4:

1) create 1 * 3 volume and start
2) Enable eager lock
3) disable full-lock 
4) overwrite 1GB of same file ( which was written in previous scenario) from from 2 clients
5) take state dump for the volume while write is in progress
6) verify the state dump 
       - there should be range locks which are active


Moving status to verified

Comment 12 errata-xmlrpc 2018-09-04 06:44:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607


Note You need to log in before you can comment on or make changes to this bug.