Bug 967717

Summary:	Change order of locks for data self-heal
Product:	[Community] GlusterFS	Reporter:	Pranith Kumar K <pkarampu>
Component:	replicate	Assignee:	Pranith Kumar K <pkarampu>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	gluster-bugs
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.4.0	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	985380 (view as bug list)		Environment:
Last Closed:	2013-07-24 17:51:01 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	985380

Description Pranith Kumar K 2013-05-28 06:27:33 UTC

Description of problem:
    At the moment data-self-heal acquires locks in following
    pattern. It takes full file lock then gets xattrs on files on both
    replicas. Decides sources/sinks based on the xattrs. Now it acquires
    lock from 0-128k then unlocks the full file lock. Syncs 0-128k range
    from source to sink now acquires lock 128k+1 till 256k then unlocks
    0-128k, syncs 128k+1 till 256k block... so on finally it takes full file
    lock again then unlocks the final small range block.
    It decrements pending counts and then unlocks the full file lock.
         This pattern of locks is chosen to avoid more than 1 self-heal
    to be in progress. BUT if another self-heal tries to take a full
    file lock while a self-heal is already in progress it will be put in
    blocked queue, further inodelks from writes by the application will
    also be put in blocked queue because of the way locks xlator grants
    inodelks. So until the self-heal is complete writes are blocked.
    
    Here is the code:
    xlators/features/locks/src/inodelk.c - line 225
    if (__blocked_lock_conflict (dom, lock) && !(__owner_has_lock (dom, lock))) {
             ret = -EAGAIN;
             if (can_block == 0)
                     goto out;
    
             gettimeofday (&lock->blkd_time, NULL);
             list_add_tail (&lock->blocked_locks, &dom->blocked_inodelks);
    }
    
    This leads to hangs in applications.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Anand Avati 2013-05-28 08:18:58 UTC

REVIEW: http://review.gluster.org/5099 (cluster/afr: Refactor inodelk to handle multiple domains) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 2 Anand Avati 2013-05-28 08:19:20 UTC

REVIEW: http://review.gluster.org/5100 (cluster/afr: Let two data-self-heals compete in new domain) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Anand Avati 2013-05-28 08:32:54 UTC

REVIEW: http://review.gluster.org/5099 (cluster/afr: Refactor inodelk to handle multiple domains) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 4 Anand Avati 2013-05-28 08:33:15 UTC

REVIEW: http://review.gluster.org/5100 (cluster/afr: Let two data-self-heals compete in new domain) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 5 Anand Avati 2013-05-28 09:14:34 UTC

REVIEW: http://review.gluster.org/5099 (cluster/afr: Refactor inodelk to handle multiple domains) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 6 Anand Avati 2013-05-28 09:14:56 UTC

REVIEW: http://review.gluster.org/5100 (cluster/afr: Let two data-self-heals compete in new domain) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 7 Anand Avati 2013-05-31 07:41:01 UTC

REVIEW: http://review.gluster.org/5099 (cluster/afr: Refactor inodelk to handle multiple domains) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 8 Anand Avati 2013-05-31 07:41:23 UTC

REVIEW: http://review.gluster.org/5100 (cluster/afr: Let two data-self-heals compete in new domain) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 9 Anand Avati 2013-06-18 07:32:03 UTC

REVIEW: http://review.gluster.org/5099 (cluster/afr: Refactor inodelk to handle multiple domains) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 10 Anand Avati 2013-06-18 07:32:29 UTC

REVIEW: http://review.gluster.org/5100 (cluster/afr: Let two data-self-heals compete in new domain) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 11 Anand Avati 2013-06-26 02:53:40 UTC

REVIEW: http://review.gluster.org/5099 (cluster/afr: Refactor inodelk to handle multiple domains) posted (#6) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 12 Anand Avati 2013-06-26 02:54:04 UTC

REVIEW: http://review.gluster.org/5100 (cluster/afr: Let two data-self-heals compete in new domain) posted (#6) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 13 Anand Avati 2013-07-03 16:59:43 UTC

REVIEW: http://review.gluster.org/5099 (cluster/afr: Refactor inodelk to handle multiple domains) posted (#7) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 14 Anand Avati 2013-07-03 17:00:07 UTC

REVIEW: http://review.gluster.org/5100 (cluster/afr: Let two data-self-heals compete in new domain) posted (#7) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 15 Anand Avati 2013-07-04 00:49:32 UTC

REVIEW: http://review.gluster.org/5099 (cluster/afr: Refactor inodelk to handle multiple domains) posted (#8) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 16 Anand Avati 2013-07-04 00:49:56 UTC

REVIEW: http://review.gluster.org/5100 (cluster/afr: Let two data-self-heals compete in new domain) posted (#8) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 17 Anand Avati 2013-07-04 04:28:55 UTC

COMMIT: http://review.gluster.org/5099 committed in master by Vijay Bellur (vbellur) 
------
commit c2abf3a6e39c5a5832a165757483bc0ae23cdb63
Author: Pranith Kumar K <pkarampu>
Date:   Wed Jul 3 21:21:48 2013 +0530

    cluster/afr: Refactor inodelk to handle multiple domains
    
    - afr_local_copy should not be memduping locked nodes, that would
      mean that lock is taken in self-heal on those nodes even before
      it actually takes the lock. So removed memdup code. Even entry
      lock related copying (lockee info) is also not necessary for
      self-heal functionality, so removing that as well. Since it is
      not local_copy anymore changed its name.
    
    - My editor changed tabs to spaces.
    
    Change-Id: I8dfb92cb8338e9a967c06907a8e29a8404782d61
    BUG: 967717
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/5099
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 18 Anand Avati 2013-07-04 04:36:06 UTC

COMMIT: http://review.gluster.org/5100 committed in master by Vijay Bellur (vbellur) 
------
commit 0aa4248cf967b158c0cf6ce33c300ad4b3ee3249
Author: Pranith Kumar K <pkarampu>
Date:   Wed Jul 3 21:28:25 2013 +0530

    cluster/afr: Let two data-self-heals compete in new domain
    
    Problem:
    At the moment data-self-heal acquires locks in following
    pattern. It takes full file lock then gets xattrs on files on both
    replicas. Decides sources/sinks based on the xattrs. Now it acquires
    lock from 0-128k then unlocks the full file lock. Syncs 0-128k range
    from source to sink now acquires lock 128k+1 till 256k then unlocks
    0-128k, syncs 128k+1 till 256k block... so on finally it takes full file
    lock again then unlocks the final small range block.
    It decrements pending counts and then unlocks the full file lock.
    
         This pattern of locks is chosen to avoid more than 1 self-heal
    to be in progress. BUT if another self-heal tries to take a full
    file lock while a self-heal is already in progress it will be put in
    blocked queue, further inodelks from writes by the application will
    also be put in blocked queue because of the way locks xlator grants
    inodelks. So until the self-heal is complete writes are blocked.
    
    Here is the code:
    xlators/features/locks/src/inodelk.c - line 225
    if (__blocked_lock_conflict (dom, lock) && !(__owner_has_lock (dom, lock))) {
             ret = -EAGAIN;
             if (can_block == 0)
                     goto out;
    
             gettimeofday (&lock->blkd_time, NULL);
             list_add_tail (&lock->blocked_locks, &dom->blocked_inodelks);
    }
    
    This leads to hangs in applications.
    
    Fix:
    Since we want to prevent two parallel self-heals. We let them compete
    in a separate "domain". Lets call the domain on which the locks have
    been taken on in previous approach as "data-domain".
    
    In the new approach When a self-heal is triggered,
    it acquires a full lock in the new domain "self-heal-domain".
        After this it performs data-self-heal using the locks in
        "data-domain" as before.
    unlock the full file lock in "self-heal-domain"
    
    With this approach, application's writevs don't have to wait
    in pending queue when more than 1 self-heal is triggered.
    
    Change-Id: Id79aef3dfa888945977fb9758374ac41c320d0d5
    BUG: 967717
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/5100
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>