1428061 – Halo Replication feature for AFR translator

Bug 1428061 - Halo Replication feature for AFR translator

Summary: Halo Replication feature for AFR translator

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1448416
TreeView+	depends on / blocked

Reported:	2017-03-01 19:05 UTC by Vijay Bellur
Modified:	2017-10-09 13:13 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.12.0
Clone Of:
Clones:	1448416 (view as bug list)
Environment:
Last Closed:	2017-08-23 09:04:21 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Vijay Bellur 2017-03-01 19:05:20 UTC

Halo Replication feature for AFR translator

Summary:
Halo Geo-replication is a feature which allows Gluster or NFS clients to write locally to their region (as defined by a latency "halo" or threshold if you like), and have their writes asynchronously propagate from their origin to the rest of the cluster. Clients can also write synchronously to the cluster simply by specifying a halo-latency which is very large (e.g. 10seconds) which will include all bricks.

In other words, it allows clients to decide at mount time if they desire synchronous or asynchronous IO into a cluster and the cluster can support both of these modes to any number of clients simultaneously.

There are a few new volume options due to this feature:
halo-shd-latency: The threshold below which self-heal daemons will
consider children (bricks) connected.

halo-nfsd-latency: The threshold below which NFS daemons will consider
children (bricks) connected.

halo-latency: The threshold below which all other clients will
consider children (bricks) connected.

halo-min-replicas: The minimum number of replicas which are to
be enforced regardless of latency specified in the above 3 options.
If the number of children falls below this threshold the next
best (chosen by latency) shall be swapped in.

New FUSE mount options:
halo-latency & halo-min-replicas: As descripted above.

This feature combined with multi-threaded SHD support (D1271745) results in some pretty cool geo-replication possibilities.

Operational Notes:
- Global consistency is gaurenteed for synchronous clients, this is provided by the existing entry-locking mechanism.
- Asynchronous clients on the other hand and merely consistent to their region. Writes & deletes will be protected via entry-locks as usual preventing concurrent writes into files which are undergoing replication. Read operations on the other hand should never block.
- Writes are allowed from _any_ region and propagated from the origin to all other regions. The take away from this is care should be taken to ensure multiple writers do not write the same files resulting in a gfid split-brain which will require resolution via split-brain policies (majority, mtime & size). Recommended method for preventing this is using the nfs-auth feature to define which region for each share has RW permissions, tiers not in the origin region should have RO perms.

TODO:
- Synchronous clients (including the SHD) should choose clients from their own region as preferred sources for reads. Most of the plumbing is in place for this via the child_latency array.
- Better GFID split brain handling & better dent type split brain handling (i.e. create a trash can and move the offending files into it).
- Tagging in addition to latency as a means of defining which children you wish to synchronously write to

Test Plan:
- The usual suspects, clang, gcc w/ address sanitizer & valgrind
- Prove tests

Reviewers: jackl, dph, cjh, meyering

Reviewed By: meyering

Subscribers: ethanr

Differential Revision: https://phabricator.fb.com/D1272053

Tasks: 4117827

Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1
Signed-off-by: Kevin Vigor <kvigor>
Reviewed-on: http://review.gluster.org/16099
NetBSD-regression: NetBSD Build System <jenkins.org>
Smoke: Gluster Build System <jenkins.org>
CentOS-regression: Gluster Build System <jenkins.org>
Reviewed-by: Shreyas Siravara <sshreyas>

Comment 1 Worker Ant 2017-03-03 02:19:27 UTC

REVIEW: https://review.gluster.org/16177 (Halo Replication feature for AFR translator) posted (#5) for review on master by Vijay Bellur (vbellur)

Comment 2 Worker Ant 2017-03-13 18:57:55 UTC

REVIEW: https://review.gluster.org/16177 (Halo Replication feature for AFR translator) posted (#6) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Worker Ant 2017-03-21 18:49:42 UTC

REVIEW: https://review.gluster.org/16177 (Halo Replication feature for AFR translator) posted (#7) for review on master by Kevin Vigor (kvigor)

Comment 4 Worker Ant 2017-04-25 10:39:31 UTC

REVIEW: https://review.gluster.org/16177 (Halo Replication feature for AFR translator) posted (#8) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 5 Worker Ant 2017-05-02 05:53:18 UTC

REVIEW: https://review.gluster.org/16177 (Halo Replication feature for AFR translator) posted (#9) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 6 Worker Ant 2017-05-02 10:23:57 UTC

COMMIT: https://review.gluster.org/16177 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 07cc8679cdf3b29680f4f105d0222da168d8bfc1
Author: Kevin Vigor <kvigor>
Date:   Tue Mar 21 08:23:25 2017 -0700

    Halo Replication feature for AFR translator
    
    Summary:
    Halo Geo-replication is a feature which allows Gluster or NFS clients to write
    locally to their region (as defined by a latency "halo" or threshold if you
    like), and have their writes asynchronously propagate from their origin to the
    rest of the cluster.  Clients can also write synchronously to the cluster
    simply by specifying a halo-latency which is very large (e.g. 10seconds) which
    will include all bricks.
    
    In other words, it allows clients to decide at mount time if they desire
    synchronous or asynchronous IO into a cluster and the cluster can support both
    of these modes to any number of clients simultaneously.
    
    There are a few new volume options due to this feature:
      halo-shd-latency:  The threshold below which self-heal daemons will
      consider children (bricks) connected.
    
      halo-nfsd-latency: The threshold below which NFS daemons will consider
      children (bricks) connected.
    
      halo-latency: The threshold below which all other clients will
      consider children (bricks) connected.
    
      halo-min-replicas: The minimum number of replicas which are to
      be enforced regardless of latency specified in the above 3 options.
      If the number of children falls below this threshold the next
      best (chosen by latency) shall be swapped in.
    
    New FUSE mount options:
      halo-latency & halo-min-replicas: As descripted above.
    
    This feature combined with multi-threaded SHD support (D1271745) results in
    some pretty cool geo-replication possibilities.
    
    Operational Notes:
    - Global consistency is gaurenteed for synchronous clients, this is provided by
      the existing entry-locking mechanism.
    - Asynchronous clients on the other hand and merely consistent to their region.
      Writes & deletes will be protected via entry-locks as usual preventing
      concurrent writes into files which are undergoing replication.  Read operations
      on the other hand should never block.
    - Writes are allowed from _any_ region and propagated from the origin to all
      other regions.  The take away from this is care should be taken to ensure
      multiple writers do not write the same files resulting in a gfid split-brain
      which will require resolution via split-brain policies (majority, mtime &
      size).  Recommended method for preventing this is using the nfs-auth feature to
      define which region for each share has RW permissions, tiers not in the origin
      region should have RO perms.
    
    TODO:
    - Synchronous clients (including the SHD) should choose clients from their own
      region as preferred sources for reads.  Most of the plumbing is in place for
      this via the child_latency array.
    - Better GFID split brain handling & better dent type split brain handling
      (i.e. create a trash can and move the offending files into it).
    - Tagging in addition to latency as a means of defining which children you wish
      to synchronously write to
    
    Test Plan:
    - The usual suspects, clang, gcc w/ address sanitizer & valgrind
    - Prove tests
    
    Reviewers: jackl, dph, cjh, meyering
    
    Reviewed By: meyering
    
    Subscribers: ethanr
    
    Differential Revision: https://phabricator.fb.com/D1272053
    
    Tasks: 4117827
    
    Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1
    BUG: 1428061
    Signed-off-by: Kevin Vigor <kvigor>
    Reviewed-on: http://review.gluster.org/16099
    Reviewed-on: https://review.gluster.org/16177
    Tested-by: Pranith Kumar Karampuri <pkarampu>
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 7 Shyamsundar 2017-09-05 17:25:47 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.