985380 – Change order of locks for data self-heal

Bug 985380 - Change order of locks for data self-heal

Summary: Change order of locks for data self-heal

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Pranith Kumar K
QA Contact:	spandura
Docs Contact:
URL:
Whiteboard:
Depends On:	967717
Blocks:
TreeView+	depends on / blocked

Reported:	2013-07-17 11:14 UTC by Pranith Kumar K
Modified:	2013-09-23 22:35 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.4.0.12rhs.beta6-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:	967717
Environment:
Last Closed:	2013-09-23 22:35:54 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Pranith Kumar K 2013-07-17 11:14:49 UTC

+++ This bug was initially created as a clone of Bug #967717 +++

Description of problem:
    At the moment data-self-heal acquires locks in following
    pattern. It takes full file lock then gets xattrs on files on both
    replicas. Decides sources/sinks based on the xattrs. Now it acquires
    lock from 0-128k then unlocks the full file lock. Syncs 0-128k range
    from source to sink now acquires lock 128k+1 till 256k then unlocks
    0-128k, syncs 128k+1 till 256k block... so on finally it takes full file
    lock again then unlocks the final small range block.
    It decrements pending counts and then unlocks the full file lock.
         This pattern of locks is chosen to avoid more than 1 self-heal
    to be in progress. BUT if another self-heal tries to take a full
    file lock while a self-heal is already in progress it will be put in
    blocked queue, further inodelks from writes by the application will
    also be put in blocked queue because of the way locks xlator grants
    inodelks. So until the self-heal is complete writes are blocked.
    
    Here is the code:
    xlators/features/locks/src/inodelk.c - line 225
    if (__blocked_lock_conflict (dom, lock) && !(__owner_has_lock (dom, lock))) {
             ret = -EAGAIN;
             if (can_block == 0)
                     goto out;
    
             gettimeofday (&lock->blkd_time, NULL);
             list_add_tail (&lock->blocked_locks, &dom->blocked_inodelks);
    }
    
    This leads to hangs in applications.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Anand Avati on 2013-05-28 04:18:58 EDT ---

REVIEW: http://review.gluster.org/5099 (cluster/afr: Refactor inodelk to handle multiple domains) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-05-28 04:19:20 EDT ---

REVIEW: http://review.gluster.org/5100 (cluster/afr: Let two data-self-heals compete in new domain) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-05-28 04:32:54 EDT ---

REVIEW: http://review.gluster.org/5099 (cluster/afr: Refactor inodelk to handle multiple domains) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-05-28 04:33:15 EDT ---

REVIEW: http://review.gluster.org/5100 (cluster/afr: Let two data-self-heals compete in new domain) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-05-28 05:14:34 EDT ---

REVIEW: http://review.gluster.org/5099 (cluster/afr: Refactor inodelk to handle multiple domains) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-05-28 05:14:56 EDT ---

REVIEW: http://review.gluster.org/5100 (cluster/afr: Let two data-self-heals compete in new domain) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-05-31 03:41:01 EDT ---

REVIEW: http://review.gluster.org/5099 (cluster/afr: Refactor inodelk to handle multiple domains) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-05-31 03:41:23 EDT ---

REVIEW: http://review.gluster.org/5100 (cluster/afr: Let two data-self-heals compete in new domain) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-06-18 03:32:03 EDT ---

REVIEW: http://review.gluster.org/5099 (cluster/afr: Refactor inodelk to handle multiple domains) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-06-18 03:32:29 EDT ---

REVIEW: http://review.gluster.org/5100 (cluster/afr: Let two data-self-heals compete in new domain) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-06-25 22:53:40 EDT ---

REVIEW: http://review.gluster.org/5099 (cluster/afr: Refactor inodelk to handle multiple domains) posted (#6) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-06-25 22:54:04 EDT ---

REVIEW: http://review.gluster.org/5100 (cluster/afr: Let two data-self-heals compete in new domain) posted (#6) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-07-03 12:59:43 EDT ---

REVIEW: http://review.gluster.org/5099 (cluster/afr: Refactor inodelk to handle multiple domains) posted (#7) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-07-03 13:00:07 EDT ---

REVIEW: http://review.gluster.org/5100 (cluster/afr: Let two data-self-heals compete in new domain) posted (#7) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-07-03 20:49:32 EDT ---

REVIEW: http://review.gluster.org/5099 (cluster/afr: Refactor inodelk to handle multiple domains) posted (#8) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-07-03 20:49:56 EDT ---

REVIEW: http://review.gluster.org/5100 (cluster/afr: Let two data-self-heals compete in new domain) posted (#8) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2013-07-04 00:28:55 EDT ---

COMMIT: http://review.gluster.org/5099 committed in master by Vijay Bellur (vbellur) 
------
commit c2abf3a6e39c5a5832a165757483bc0ae23cdb63
Author: Pranith Kumar K <pkarampu>
Date:   Wed Jul 3 21:21:48 2013 +0530

    cluster/afr: Refactor inodelk to handle multiple domains
    
    - afr_local_copy should not be memduping locked nodes, that would
      mean that lock is taken in self-heal on those nodes even before
      it actually takes the lock. So removed memdup code. Even entry
      lock related copying (lockee info) is also not necessary for
      self-heal functionality, so removing that as well. Since it is
      not local_copy anymore changed its name.
    
    - My editor changed tabs to spaces.
    
    Change-Id: I8dfb92cb8338e9a967c06907a8e29a8404782d61
    BUG: 967717
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/5099
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

--- Additional comment from Anand Avati on 2013-07-04 00:36:06 EDT ---

COMMIT: http://review.gluster.org/5100 committed in master by Vijay Bellur (vbellur) 
------
commit 0aa4248cf967b158c0cf6ce33c300ad4b3ee3249
Author: Pranith Kumar K <pkarampu>
Date:   Wed Jul 3 21:28:25 2013 +0530

    cluster/afr: Let two data-self-heals compete in new domain
    
    Problem:
    At the moment data-self-heal acquires locks in following
    pattern. It takes full file lock then gets xattrs on files on both
    replicas. Decides sources/sinks based on the xattrs. Now it acquires
    lock from 0-128k then unlocks the full file lock. Syncs 0-128k range
    from source to sink now acquires lock 128k+1 till 256k then unlocks
    0-128k, syncs 128k+1 till 256k block... so on finally it takes full file
    lock again then unlocks the final small range block.
    It decrements pending counts and then unlocks the full file lock.
    
         This pattern of locks is chosen to avoid more than 1 self-heal
    to be in progress. BUT if another self-heal tries to take a full
    file lock while a self-heal is already in progress it will be put in
    blocked queue, further inodelks from writes by the application will
    also be put in blocked queue because of the way locks xlator grants
    inodelks. So until the self-heal is complete writes are blocked.
    
    Here is the code:
    xlators/features/locks/src/inodelk.c - line 225
    if (__blocked_lock_conflict (dom, lock) && !(__owner_has_lock (dom, lock))) {
             ret = -EAGAIN;
             if (can_block == 0)
                     goto out;
    
             gettimeofday (&lock->blkd_time, NULL);
             list_add_tail (&lock->blocked_locks, &dom->blocked_inodelks);
    }
    
    This leads to hangs in applications.
    
    Fix:
    Since we want to prevent two parallel self-heals. We let them compete
    in a separate "domain". Lets call the domain on which the locks have
    been taken on in previous approach as "data-domain".
    
    In the new approach When a self-heal is triggered,
    it acquires a full lock in the new domain "self-heal-domain".
        After this it performs data-self-heal using the locks in
        "data-domain" as before.
    unlock the full file lock in "self-heal-domain"
    
    With this approach, application's writevs don't have to wait
    in pending queue when more than 1 self-heal is triggered.
    
    Change-Id: Id79aef3dfa888945977fb9758374ac41c320d0d5
    BUG: 967717
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/5100
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 2 Amar Tumballi 2013-07-23 10:03:50 UTC

https://code.engineering.redhat.com/gerrit/#/c/10494/

Comment 3 spandura 2013-08-22 09:11:39 UTC

Pranith can you please provide the test case to verify this bug?

Comment 4 spandura 2013-09-06 14:18:21 UTC

Verified the fix on the build:
==============================
glusterfs 3.4.0.31rhs built on Sep  5 2013 08:23:16

Test Case:
===========
1. Create a replicate volume (1 x 2). Start the volume.

2. Create 4 fuse mounts. From all the mount start dd on a file: "dd if=/dev/urandom of=./test_file1 bs=1K count=20480000" 

3. While dd in progress, bring down a brick.

4. Bring back brick online while dd is still in progress.

5. Take volume statedump.

6. Check the brick statedumps. 

Expected result:
===============
1. In statedump search for "self-heal" keyword. In self-heal domain one of the mount process will be having a active lock and all other process will be in blocked state.

++++++++++++++++++++++++++++++++++
Example:
++++++++++++++++++++++++++++++++++
[xlator.features.locks.vol_dis_1_rep_2-locks.inode]
path=/testdir_gluster/test_file1
mandatory=0
inodelk-count=3352
lock-dump.domain.domain=vol_dis_1_rep_2-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551615, owner=400a6058f47f0000, transport=0x1a42860, , granted at Fri Sep  6 11:26:59 2013

inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551615, owner=7ca9a2144e7f0000, transport=0x1a488c0, , blocked at Fri Sep  6 11:27:00 2013

inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551615, owner=1497d664317f0000, transport=0x1a42600, , blocked at Fri Sep  6 11:27:00 2013

inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551615, owner=b027b6b8f97f0000, transport=0x1a02190, , blocked at Fri Sep  6 11:27:00 2013

2. Writes on the file from mount shouldn't hang and should be successfully completed. 


3. self-heal should be successful. {check mount, glustershd.log for self-heal completion.

Example:
==========
[2013-09-06 11:54:08.509665] I [afr-self-heal-common.c:2840:afr_log_self_heal_completion_status] 0-vol_dis_1_rep_2-replicate-0:  metadata self heal  is successfully com
pleted, backgroung data self heal  is successfully completed,  from vol_dis_1_rep_2-client-0 with 6677982208 6677982208  sizes - Pending matrix:  [ [ 0 1239959 ] [ 93 9
3 ] ] on /testdir_gluster/test_file1


Actual result:
===============
As expected. 

Bug is fixed . Moving bug to verified state.

Comment 5 Scott Haines 2013-09-23 22:35:54 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.