Bug 1369638

Summary: DHT stale layout issue will be seen often with md-cache prolonged cache of lookups
Product: [Community] GlusterFS Reporter: Poornima G <pgurusid>
Component: md-cacheAssignee: Poornima G <pgurusid>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: bugs
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.9.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-27 18:19:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1211863    

Description Poornima G 2016-08-24 03:57:10 UTC
Description of problem:
    dht_layout is built as a part of lookup only. The layout can be
    modified by rebalance process. Since every IO fop is preceded
    by a lookup, there are very less issues of stale layout. But
    with enhancements of aggressive caching of stats in md-cache,
    the lookup will reduce and expose the stale layout issue often.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Worker Ant 2016-08-24 04:03:34 UTC
REVIEW: http://review.gluster.org/15300 (dht, md-cache, upcall: Add invalidation of IATT when the layout changes) posted (#1) for review on master by Poornima G (pgurusid)

Comment 2 Worker Ant 2016-08-25 05:55:40 UTC
REVIEW: http://review.gluster.org/15300 (dht, md-cache, upcall: Add invalidation of IATT when the layout changes) posted (#2) for review on master by Poornima G (pgurusid)

Comment 3 Worker Ant 2016-08-25 06:47:10 UTC
REVIEW: http://review.gluster.org/15300 (dht, md-cache, upcall: Add invalidation of IATT when the layout changes) posted (#3) for review on master by Poornima G (pgurusid)

Comment 4 Worker Ant 2016-08-25 09:03:01 UTC
REVIEW: http://review.gluster.org/15300 (dht, md-cache, upcall: Add invalidation of IATT when the layout changes) posted (#4) for review on master by Poornima G (pgurusid)

Comment 5 Worker Ant 2016-08-25 10:48:29 UTC
REVIEW: http://review.gluster.org/15300 (dht, md-cache, upcall: Add invalidation of IATT when the layout changes) posted (#5) for review on master by Poornima G (pgurusid)

Comment 6 Worker Ant 2016-08-25 11:28:35 UTC
REVIEW: http://review.gluster.org/15300 (dht, md-cache, upcall: Add invalidation of IATT when the layout changes) posted (#6) for review on master by Poornima G (pgurusid)

Comment 7 Worker Ant 2016-08-29 08:54:30 UTC
REVIEW: http://review.gluster.org/15300 (dht, md-cache, upcall: Add invalidation of IATT when the layout changes) posted (#7) for review on master by Poornima G (pgurusid)

Comment 8 Worker Ant 2016-08-31 06:08:59 UTC
COMMIT: http://review.gluster.org/15300 committed in master by Raghavendra G (rgowdapp) 
------
commit 065a27948c4e0651f5bdac1703939adf34e5380e
Author: Poornima G <pgurusid>
Date:   Tue Aug 23 18:15:22 2016 +0530

    dht, md-cache, upcall: Add invalidation of IATT when the layout changes
    
    Issue:
    dht_layout is built as a part of lookup only. The layout can be
    modified by rebalance process. Since every IO fop is preceded
    by a lookup, there are very less issues of stale layout. But
    with enhancements of aggressive caching of stats in md-cache,
    the lookup will reduce and expose the stale layout issue often.
    
    Solution:
    Since stale layout is already an issue on dht, there is already
    a plan to fix this at the dht layer, but this fix is not currently
    planned for any release. Until this fix comes out, we can have
    a workaround where, the upcall will send a notification to md-cache
    when a layout xattr is changed. As a part of layout change notification
    the existing cache is invalidated and the next lookup will fetch the
    latest layout.
    
    This is not a foolproof solution as the window between the layout change
    and the next lookup(after invalidation of stat), where there will be stale
    layout. But until the final fix comes in, this reduces the stale layout
    window.
    
    Change-Id: Iacf871a38b35880c1fc0bc68fe7ce291265e71d4
    BUG: 1369638
    Signed-off-by: Poornima G <pgurusid>
    Reviewed-on: http://review.gluster.org/15300
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>

Comment 9 Shyamsundar 2017-03-27 18:19:57 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.0, please open a new bug report.

glusterfs-3.9.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2016-November/029281.html
[2] https://www.gluster.org/pipermail/gluster-users/