Bug 763820 (GLUSTER-2088) - Add logging to speed recovery with AFR
Summary: Add logging to speed recovery with AFR
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-2088
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact: Raghavendra Bhat
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2010-11-11 17:17 UTC by Jeff Darcy
Modified: 2013-07-24 17:42 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:42:13 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: glusterfs-3.3.0qa43
Embargoed:


Attachments (Terms of Use)
New translator (6.79 KB, application/x-gzip)
2010-11-11 14:19 UTC, Jeff Darcy
no flags Details

Description Jeff Darcy 2010-11-11 14:19:21 UTC
Created attachment 377


Gzipped due to (apparent) attachment-size limit.

Comment 1 Jeff Darcy 2010-11-11 17:17:37 UTC
Many users have expressed frustration with a recovery process based on "find" or "ls -alR" which can take days on a multi-TB filesystem to force self-heal on only a few files which were actually out of sync when a failure occurred.  This new translator attempts to remedy that by tracking which files (on each server) have pending transactions, and keeping that information in a readily queryable database.  The recovery process can therefore consist of:

(1) Query the database on each server to see which files might be out of sync.

(2) Combine the results, filtering out duplicates etc.

(3) Touch the (relatively) few files that are are listed.

I've tested this by having multiple processes write random files within a 5000-file subdirectory, then using iptables to simulate a network partition that prevents each client from reaching one of the servers.  After the partition is resolved, I check the results using the above procedure against those obtained by examining the trusted.afr.* xattrs on the servers directly.  In several dozen tests, I have yet to see a difference.  The code also handles renames and unlinks which can affect the paths reported when new transactions occur (a problem that remains unaddressed in debug/io-stats BTW), but that part's more lightly tested.  I intend to continue with more intensive testing while the code is reviewed for possible integration into a future release.

Comment 2 Anand Avati 2012-02-20 16:52:54 UTC
CHANGE: http://review.gluster.com/2722 (features/index: Index translator implementation) merged in master by Vijay Bellur (vijay)

Comment 3 Anand Avati 2012-02-20 18:05:44 UTC
CHANGE: http://review.gluster.com/2774 (storage/posix: Add xattr for gfid2path) merged in master by Vijay Bellur (vijay)

Comment 4 Anand Avati 2012-02-21 05:12:17 UTC
CHANGE: http://review.gluster.com/443 (syncop: Multi-processor support in syncenv) merged in master by Anand Avati (avati)

Comment 5 Anand Avati 2012-02-21 05:23:30 UTC
CHANGE: http://review.gluster.com/2749 (cluster/afr: Self-heald, Index integration) merged in master by Vijay Bellur (vijay)

Comment 6 Anand Avati 2012-02-21 05:24:06 UTC
CHANGE: http://review.gluster.com/2775 (cluster/afr: Add commands to see self-heald ops) merged in master by Vijay Bellur (vijay)

Comment 7 Anand Avati 2012-02-21 10:37:04 UTC
CHANGE: http://review.gluster.com/2780 (features/index: Add release, releasedir cbks) merged in master by Vijay Bellur (vijay)

Comment 8 Anand Avati 2012-02-21 10:50:23 UTC
CHANGE: http://review.gluster.com/2782 (libglusterfs: Warn on missing _cbk calls in xlator) merged in master by Vijay Bellur (vijay)

Comment 9 Anand Avati 2012-02-21 10:50:55 UTC
CHANGE: http://review.gluster.com/2781 (features/index: Fix fd leak) merged in master by Vijay Bellur (vijay)

Comment 10 Anand Avati 2012-02-22 06:20:09 UTC
CHANGE: http://review.gluster.com/2786 (features/index: Set correct ret value in index_add) merged in master by Vijay Bellur (vijay)

Comment 11 Raghavendra Bhat 2012-05-24 09:58:46 UTC
Now with index xlator integration with self-heal daemon, the information on which files are healed, which files could not be healed (healing failed) and which files are in the split brain state can be obtained via gluster cli command.


gluster volume heal
Usage: volume heal <VOLNAME> [{full | info {healed | heal-failed | split-brain}}]

Comment 12 Vijay Bellur 2013-03-25 06:45:13 UTC
REVIEW: http://review.gluster.org/4717 (synctask: introduce synclocks for co-operative locking) on master posted (#2) for review by Anand Avati (avati)

Comment 13 Vijay Bellur 2013-04-02 23:02:55 UTC
COMMIT: http://review.gluster.org/4717 committed in master by Anand Avati (avati) 
------
commit 87300be91cb9e1cd98ac5cba8998524d95c98d12
Author: Anand Avati <avati>
Date:   Sat Mar 23 13:55:09 2013 -0700

    synctask: introduce synclocks for co-operative locking
    
    This patch introduces a synclocks - co-operative locks for synctasks.
    Synctasks yield themselves when a lock cannot be acquired at the time
    of the lock call, and the unlocker will wake the yielded locker at
    the time of unlock.
    
    The implementation is safe in a multi-threaded syncenv framework.
    
    It is also safe for sharing the lock between non-synctasks. i.e, the
    same lock can be used for synchronization between a synctask and
    a regular thread. In such a situation, waiting synctasks will yield
    themselves while non-synctasks will sleep on a cond variable. The
    unlocker (which could be either a synctask or a regular thread) will
    wake up any type of lock waiter (synctask or regular).
    
    Usage:
    
        Declaration and Initialization
        ------------------------------
    
        synclock_t lock;
    
        ret = synclock_init (&lock);
        if (ret) {
            /* lock could not be allocated */
        }
    
       Locking and non-blocking lock attempt
       -------------------------------------
    
       ret = synclock_trylock (&lock);
       if (ret && (errno == EBUSY)) {
          /* lock is held by someone else */
          return;
       }
    
       synclock_lock (&lock);
       {
          /* critical section */
       }
       synclock_unlock (&lock);
    
    Change-Id: I081873edb536ddde69a20f4a7dc6558ebf19f5b2
    BUG: 763820
    Signed-off-by: Anand Avati <avati>
    Reviewed-on: http://review.gluster.org/4717
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Raghavendra G <raghavendra>
    Reviewed-by: Jeff Darcy <jdarcy>

Comment 14 Anand Avati 2013-04-15 10:34:30 UTC
REVIEW: http://review.gluster.org/4830 (synctask: introduce synclocks for co-operative locking) posted (#1) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 15 Anand Avati 2013-04-16 16:36:00 UTC
REVIEW: http://review.gluster.org/4830 (synctask: introduce synclocks for co-operative locking) posted (#2) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 16 Anand Avati 2013-04-17 06:08:59 UTC
REVIEW: http://review.gluster.org/4830 (synctask: introduce synclocks for co-operative locking) posted (#3) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 17 Anand Avati 2013-04-17 08:53:50 UTC
COMMIT: http://review.gluster.org/4830 committed in release-3.4 by Vijay Bellur (vbellur) 
------
commit 563b608126e812482a25464df7c70079fb0ba2c0
Author: Krishnan Parthasarathi <kparthas>
Date:   Mon Apr 15 15:41:21 2013 +0530

    synctask: introduce synclocks for co-operative locking
    
    This patch introduces a synclocks - co-operative locks for synctasks.
    Synctasks yield themselves when a lock cannot be acquired at the time
    of the lock call, and the unlocker will wake the yielded locker at
    the time of unlock.
    
    The implementation is safe in a multi-threaded syncenv framework.
    
    It is also safe for sharing the lock between non-synctasks. i.e, the
    same lock can be used for synchronization between a synctask and
    a regular thread. In such a situation, waiting synctasks will yield
    themselves while non-synctasks will sleep on a cond variable. The
    unlocker (which could be either a synctask or a regular thread) will
    wake up any type of lock waiter (synctask or regular).
    
    Usage:
    
        Declaration and Initialization
        ------------------------------
    
        synclock_t lock;
    
        ret = synclock_init (&lock);
        if (ret) {
            /* lock could not be allocated */
        }
    
       Locking and non-blocking lock attempt
       -------------------------------------
    
       ret = synclock_trylock (&lock);
       if (ret && (errno == EBUSY)) {
          /* lock is held by someone else */
          return;
       }
    
       synclock_lock (&lock);
       {
          /* critical section */
       }
       synclock_unlock (&lock);
    
    BUG: 763820
    Change-Id: I23066f7b66b41d3d9fb2311fdaca333e98dd7442
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Original-author: Anand Avati <avati>
    Reviewed-on: http://review.gluster.org/4830
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>


Note You need to log in before you can comment on or make changes to this bug.