Bug 1399450 - Backport few of the md-cache enhancements from master to 3.9
Summary: Backport few of the md-cache enhancements from master to 3.9
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: md-cache
Version: 3.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Poornima G
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-29 05:45 UTC by Poornima G
Modified: 2017-03-08 10:23 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.9.1
Clone Of:
Environment:
Last Closed: 2017-03-08 10:20:15 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Poornima G 2016-11-29 05:45:58 UTC
Description of problem:
The md-cache improvemets feature implemented as a part of BZ: 1211863, is not yet completely backported to 3.9. Backport the following patches as a part of this BZ:
[9ab5b52] afr: Implement IPC fop
[0fd7d0e] tests: Fix one of the md-cache test cases
[359b72a] ec: Implement ipc fop
[8d8eded] md-cache, afr: Reduce the window of stale read


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Worker Ant 2016-11-29 05:47:13 UTC
REVIEW: http://review.gluster.org/15956 (ec: Implement ipc fop) posted (#1) for review on release-3.9 by Poornima G (pgurusid)

Comment 2 Worker Ant 2016-11-29 05:54:36 UTC
REVIEW: http://review.gluster.org/15957 (afr: Implement IPC fop) posted (#1) for review on release-3.9 by Poornima G (pgurusid)

Comment 3 Worker Ant 2016-11-29 05:54:39 UTC
REVIEW: http://review.gluster.org/15958 (md-cache, afr: Reduce the window of stale read) posted (#1) for review on release-3.9 by Poornima G (pgurusid)

Comment 4 Worker Ant 2016-11-29 05:54:41 UTC
REVIEW: http://review.gluster.org/15959 (afr: Fix the EIO that can occur in afr_inode_refresh as a result      of cache invalidation(upcall).) posted (#1) for review on release-3.9 by Poornima G (pgurusid)

Comment 5 Worker Ant 2016-11-29 06:14:46 UTC
REVIEW: http://review.gluster.org/15960 (tests: Fix one of the md-cache test cases) posted (#1) for review on release-3.9 by Poornima G (pgurusid)

Comment 6 Worker Ant 2016-11-30 16:03:06 UTC
COMMIT: http://review.gluster.org/15956 committed in release-3.9 by Pranith Kumar Karampuri (pkarampu) 
------
commit 84a0c892a5a3edb5e30750ab15916fdb858dd93c
Author: Poornima G <pgurusid>
Date:   Fri Sep 2 12:47:15 2016 +0530

    ec: Implement ipc fop
    
    Backport of http://review.gluster.org/#/c/15387/
    
    The ipc will be wound to all the bricks, but for it to be
    successfull, the fop should succeed on minimum number of bricks.
    
    >Reviewed-on: http://review.gluster.org/15387
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >Smoke: Gluster Build System <jenkins.org>
    >Reviewed-by: Ashish Pandey <aspandey>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    >(cherry picked from commit 359b72a57b7c92fc2a11236ac05f5d740db2f540)
    
    Change-Id: I3f8cb6a349e87bafd0773583def9d4e3765aa140
    BUG: 1399450
    Signed-off-by: Poornima G <pgurusid>
    Reviewed-on: http://review.gluster.org/15956
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 7 Worker Ant 2016-12-02 05:45:07 UTC
COMMIT: http://review.gluster.org/15957 committed in release-3.9 by Pranith Kumar Karampuri (pkarampu) 
------
commit ccecf4f069961ca5c7c392e8702883e17adfe767
Author: Poornima G <pgurusid>
Date:   Mon Aug 22 12:30:43 2016 +0530

    afr: Implement IPC fop
    
    Backport of http://review.gluster.org/15378
    
    Currently ipc() is not implemented in afr. md-cache and upcall
    uses ipc to register the list of xattrs, [1] for more details.
    For the ipc op GF_IPC_TARGET_UPCALL, it has to be wound to all
    the replica subvolumes. ipc() is failed when any of the
    subvolumes fails with other than ENOTCONN or all of the subvolumes
    are down.
    
    [1] http://review.gluster.org/#/c/15002/
    
    >Reviewed-on: http://review.gluster.org/15378
    >Tested-by: Pranith Kumar Karampuri <pkarampu>
    >Smoke: Gluster Build System <jenkins.org>
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    >Signed-off-by: Poornima G <pgurusid>
    
    Change-Id: I0f651330eafda64e4d922043fe53bd0014536247
    BUG: 1399450
    Signed-off-by: Poornima G <pgurusid>
    Reviewed-on: http://review.gluster.org/15957
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 8 Worker Ant 2016-12-02 05:45:44 UTC
COMMIT: http://review.gluster.org/15958 committed in release-3.9 by Pranith Kumar Karampuri (pkarampu) 
------
commit b80e1c607b3d3aeaea2f929716b676918dc74cad
Author: Poornima G <pgurusid>
Date:   Sun Sep 4 08:27:47 2016 +0530

    md-cache, afr: Reduce the window of stale read
    
    Problem:
    Consider a replica setup, where one mount writes data to a
    file and the other mount reads the file. In afr, read operations
    are not transaction based, a brick(read subvolume) is chosen as
    a part of lookup or other operations, read is always wound only
    to the read subvolume, even if there was write from a different client
    that failed on this brick. This stale read continues until there is
    a lookup or any write operation from the mount point. Currently, this
    is not a major issue, as a lookup is issued before every read and it will
    switch the read subvolume to a correct one. But with the plan of
    increasing md-cache timeout to 600s, the stale read problem will be
    more pronounced, i.e. stale read can continue for 600s(or more if cascaded
    with readdirp), as there will be no lookups.
    
    Solution:
    Afr doesn't have any built-in solution for stale read(without affecting
    the performance). The solution that came up, was to use upcall. When a file
    on any brick is marked bad for the first time, upcall sends a notification
    to all the clients that had recently accessed the file. The solution has
    2 parts:
    - Identifying when a file is marked bad, on any of the bricks,
      for the first time
    - Client side actions on recieving the notifications
    
    Identifying when a file is marked bad on any of the bricks for the first time:
    -----------------------------------------------------------------------------
    The idea is to track xattrop in upcall. xattrop currently comes with 2 afr
    xattrs - afr dirty bit and afr pending xattrs.
       Dirty xattr is set to 1 before every write, and is unset if write succeeds.
    In certain scenarios, dirty xattr can be 0 and still the file could be bad
    copy. Hence do not track dirty xattr.
       Pending xattr is set on the good copy, indicating the other bricks that have
    bad copy. It is still not as simple as, notifying when any of the pending xattrs
    change. It could lead to flood of notifcations, in case the other brick is
    completely down or consistantly failing. Hence it is important to notify only
    once, the first time a good copy is marked bad.
    
    Client side actions on recieving pending xattr change, notification:
    --------------------------------------------------------------------
    md-cache will invalidate the cache of that file, so that further lookup is
    passed down to afr and hence update the read subvolume. Invalidating only in
    md-cache is not enough, consider the folling oder of opertaions:
    - pending xattr invalidation - invalidate md-cache
    - readdirp on the bad read subvolume - fill md-cache
    - lookup (served from md-cache)
    - read - wound to the old read subvol.
    Hence, along with invalidating md-cache, it is very important to reset the
    read subvolume for that file, in afr.
    
    Design Credit: Anuradha Talur, Ravishankar N
    
    1. xattrop doesn't carry info saying post op/pre op.
    2. Pre xattrop will have 0 value for all pending xattrs,
       the cbk of pre xattrop carries the on-disk xattr value.
       Non zero indicated healing is required.
    3. Post xattrop will have non zero value for any of the
       pending xattrs, if the fop failed on any of the bricks.
    
    >Reviewed-on: http://review.gluster.org/15398
    >Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    >Tested-by: Pranith Kumar Karampuri <pkarampu>
    >Smoke: Gluster Build System <jenkins.org>
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >Signed-off-by: Poornima G <pgurusid>
    
    Change-Id: I469cbc111714c433984fe1c922be2ef113c25804
    BUG: 1399450
    Signed-off-by: Poornima G <pgurusid>
    Reviewed-on: http://review.gluster.org/15958
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 9 Worker Ant 2016-12-13 09:40:26 UTC
COMMIT: http://review.gluster.org/15959 committed in release-3.9 by Pranith Kumar Karampuri (pkarampu) 
------
commit 18ac64b083bde041b9a3011224f067966721668a
Author: Poornima G <pgurusid>
Date:   Mon Nov 21 11:49:35 2016 +0530

    afr: Fix the EIO that can occur in afr_inode_refresh as a result
         of cache invalidation(upcall).
    
    Issue:
    ------
    When a cache invalidation is recieved as a result of changing
    pending xattr, the read_subvol is reset. Consider the below chain
    of execution:
    
    CHILD_DOWN
    ...
    afr_readv
    ...
    afr_inode_refresh
    ...
    afr_inode_read_subvol_reset <- as a result of pending xattr set by
                                   some other client GF_EVENT_UPCALL will
                                   be sent
    afr_refresh_done -> this results in an EIO, as the read subvol was
                        reset by the end of the afr_inode_refresh
    
    Solution:
    ---------
    When GF_EVENT_UPCALL is recieved, instead of resetting read_subvol,
    set a variable need_refresh in inode_ctx, the next time some one
    starts a txn, along with event gen, need_rrefresh also needs to
    be checked.
    
    >Reviewed-on: http://review.gluster.org/15892
    >Reviewed-by: Ravishankar N <ravishankar>
    >Smoke: Gluster Build System <jenkins.org>
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    >Signed-off-by: Poornima G <pgurusid>
    
    Change-Id: Ifda21a7a8039b8874215e1afa4bdf20f7d991b58
    BUG: 1399450
    Signed-off-by: Poornima G <pgurusid>
    Reviewed-on: http://review.gluster.org/15959
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Ravishankar N <ravishankar>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 10 Worker Ant 2016-12-19 09:34:48 UTC
COMMIT: http://review.gluster.org/15960 committed in release-3.9 by Raghavendra Talur (rtalur) 
------
commit bcb4712ebf10a3f0b4d0c677b031d812153ecdcd
Author: Poornima G <pgurusid>
Date:   Wed Sep 7 15:47:14 2016 +0530

    tests: Fix one of the md-cache test cases
    
    Verify if the unlink, rename and other ops are reflected both on
    the current mount and other mounts.
    
    >Reviewed-on: http://review.gluster.org/15419
    >Smoke: Gluster Build System <jenkins.org>
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >Reviewed-by: Vijay Bellur <vbellur>
    >(cherry picked from commit 0fd7d0e1c78fdbedfcdb085445c4b0be3c1a97a9)
    
    Change-Id: I5a296cdd557194dcf487e65ee4a14bbeaf4be690
    BUG: 1399450
    Signed-off-by: Poornima G <pgurusid>
    Reviewed-on: http://review.gluster.org/15960
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Rajesh Joseph <rjoseph>
    Reviewed-by: Raghavendra Talur <rtalur>

Comment 11 Kaushal 2017-03-08 10:20:15 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.1, please open a new bug report.

glusterfs-3.9.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-January/029725.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 12 Kaushal 2017-03-08 10:20:37 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.1, please open a new bug report.

glusterfs-3.9.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-January/029725.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 13 Kaushal 2017-03-08 10:20:37 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.1, please open a new bug report.

glusterfs-3.9.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-January/029725.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 14 Kaushal 2017-03-08 10:22:04 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.1, please open a new bug report.

glusterfs-3.9.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-January/029725.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 15 Kaushal 2017-03-08 10:23:02 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.1, please open a new bug report.

glusterfs-3.9.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-January/029725.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.