Bug 1460899 - [Negative Lookup]: negative lookup features doesn't seem to work on restart of volume
Summary: [Negative Lookup]: negative lookup features doesn't seem to work on restart o...
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: gluster-smb
Version: 3.11
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On: 1450080 1458539
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-13 05:13 UTC by Poornima G
Modified: 2018-06-20 18:26 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1458539
Environment:
Last Closed: 2018-06-20 18:26:48 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Poornima G 2017-06-13 05:13:39 UTC
+++ This bug was initially created as a clone of Bug #1458539 +++

+++ This bug was initially created as a clone of Bug #1450080 +++

Description of problem:
========================
Seems like if we restart a volume , the negative lookup caching feature is lost.
I did a lookup of a non existing file with nl caching enabled
generally the first lookup takes about 7-13 lookup FOPs depending on the dht hash/unhashed brick
On a second lookup, I notice that the lookup FOP generally  seemed to take about 1 lookup FOP on each brick

I then did a volume restart and reran above checks and found that even on repeatitive  negative lookups, the lookup FOPs take the same as without nl cache enabled

I saw the trusted vol file and didn't note any descrepencies, as the nl cache translators got loaded


Version-Release number of selected component (if applicable):
======
[root@dhcp35-192 negative]# rpm -qa|grep gluster
gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64
python-gluster-3.8.4-24.el7rhgs.noarch
glusterfs-libs-3.8.4-24.el7rhgs.x86_64
glusterfs-fuse-3.8.4-24.el7rhgs.x86_64
glusterfs-api-devel-3.8.4-24.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-24.el7rhgs.x86_64
samba-vfs-glusterfs-4.6.3-0.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-3.8.4-24.el7rhgs.x86_64
glusterfs-api-3.8.4-24.el7rhgs.x86_64
glusterfs-cli-3.8.4-24.el7rhgs.x86_64
glusterfs-devel-3.8.4-24.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-24.el7rhgs.x86_64
glusterfs-rdma-3.8.4-24.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.1.el7rhgs.noarch
glusterfs-client-xlators-3.8.4-24.el7rhgs.x86_64
glusterfs-server-3.8.4-24.el7rhgs.x86_64
glusterfs-events-3.8.4-24.el7rhgs.x86_64
[root@dhcp35-192 negative]# rpm -qa|grep samba
samba-common-4.6.3-0.el7rhgs.noarch
samba-vfs-glusterfs-4.6.3-0.el7rhgs.x86_64
samba-common-tools-4.6.3-0.el7rhgs.x86_64
samba-common-libs-4.6.3-0.el7rhgs.x86_64
samba-4.6.3-0.el7rhgs.x86_64
samba-client-4.6.3-0.el7rhgs.x86_64
samba-libs-4.6.3-0.el7rhgs.x86_64
samba-client-libs-4.6.3-0.el7rhgs.x86_64
[root@dhcp35-192 negative]# 


How reproducible:
===============
consistent on my setup

Steps to Reproduce:
1.had a 3 node cluster with samba setup using https://mojo.redhat.com/docs/DOC-1027985(no ctdb setup)
2.created a 2x2 volume as below ===>vol bricks are part of n2 and n3 nodes, while n1 was the samba server node
[root@dhcp35-192 ~]# gluster v info
 
Volume Name: negative
Type: Distributed-Replicate
Volume ID: 48c809a8-8a82-4806-aaf0-5ea0bbf423cb
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: :/rhs/brick1/negative
Brick2: :/rhs/brick1/negative
Brick3: :/rhs/brick2/negative
Brick4: :/rhs/brick2/negative
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
features.cache-invalidation-timeout: 3000
features.cache-invalidation: on
performance.nl-cache: on
nfs.disable: on
transport.address-family: inet
server.allow-insecure: on
storage.batch-fsync-delay-usec: 0

VOLUME PROFILE IS STARTED
4. mounted using cifs on a rhel client
5.did some amount of testing  wrt nl caching(for about half a day)
6. created a directory dir4 
7. did a negative lookup by issuing ls x2 
says file not existing as expected 
the profiling mentioned about 7-13 fops
8. again did a ls x2 to see nl cache working
fop redcued to about 1 per brick

9. restarted vol
10. repeated 7,8 steps again and again, but the lookup fop count is not reducing



before restart


first negative lookup
[root@dhcp35-192 ~]# gluster v profile negative info incremental
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 16 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      9.69      60.00 us      60.00 us      60.00 us              1    GETXATTR
     90.31     111.80 us      82.00 us     143.00 us              5      LOOKUP
 
    Duration: 4 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 16 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
     12.04     124.00 us     124.00 us     124.00 us              1    GETXATTR
     87.96     113.25 us      63.00 us     252.00 us              8      LOOKUP
 
    Duration: 4 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 16 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      9.87      70.00 us      70.00 us      70.00 us              1    GETXATTR
     90.13     127.80 us     100.00 us     152.00 us              5      LOOKUP
 
    Duration: 5 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 16 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
     12.18     147.00 us     147.00 us     147.00 us              1    GETXATTR
     87.82     132.50 us      79.00 us     299.00 us              8      LOOKUP
 
    Duration: 5 seconds
   Data Read: 0 bytes
Data Written: 0 bytes



second negative lookup 
[root@dhcp35-192 ~]# gluster v profile negative info incremental
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 17 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     336.00 us     336.00 us     336.00 us              1      LOOKUP
 
    Duration: 6 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 17 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     338.00 us     338.00 us     338.00 us              1      LOOKUP
 
    Duration: 6 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 17 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     144.00 us     144.00 us     144.00 us              1      LOOKUP
 
    Duration: 6 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 17 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     263.00 us     263.00 us     263.00 us              1      LOOKUP
 
    Duration: 6 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
[root@dhcp35-192 ~]# 



#############################
on restart

first negative lookup
[root@dhcp35-192 ~]# gluster v profile negative info incremental
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     121.29 us      77.00 us     251.00 us              7      LOOKUP
 
    Duration: 8 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     116.08 us      53.00 us     446.00 us             13      LOOKUP
 
    Duration: 8 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     125.92 us      69.00 us     477.00 us             13      LOOKUP
 
    Duration: 8 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     113.86 us      74.00 us     249.00 us              7      LOOKUP
 
    Duration: 8 seconds
   Data Read: 0 bytes
Data Written: 0 bytes


second negative lookup
[root@dhcp35-192 ~]# gluster v profile negative info incremental
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     121.29 us      77.00 us     251.00 us              7      LOOKUP
 
    Duration: 8 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     116.08 us      53.00 us     446.00 us             13      LOOKUP
 
    Duration: 8 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: :/rhs/brick1/negative
----------------------------------------
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     125.92 us      69.00 us     477.00 us             13      LOOKUP
 
    Duration: 8 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: :/rhs/brick2/negative
----------------------------------------
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
    100.00     113.86 us      74.00 us     249.00 us              7      LOOKUP
 
    Duration: 8 seconds
   Data Read: 0 bytes
Data Written: 0 bytes






Workaround:
========
post restart turn off and again turn on nl cache option

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-11 09:51:31 EDT ---

This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.3.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Poornima G on 2017-05-12 08:50:26 EDT ---

RCA:

The cache is stored in the directories inode_ctx, and the time of cache is set to the time inode_ctx was created, and is not updated as and when new cache entries are added to the inode_ctx(as it is cumbersome to have timeout for each entry, and updating global time will result in stale cache of older entries). Consider a sequence of operation as below:

At T1, lookup dir1/f1 (f1 doesn't exist)
       dir1 inode_ctx->cache_time = T1
            cache = "f1"
       lookup dir f2 (f2 doesn't exist)
       dir1 inode_ctx->cache_time remains unchanged
            cache = "f1", "f2"

Child_DOWN / all /some of the bricks down at T2
       thus anything cached before T2 is considered invalid but is not cleared from memory.

At T3, lookup dir1/f1
       dir1 inode_ctx->cache_time(T1) < T2, hence do not serve from cache.
       lookup is sent to the brick.

At T4, lookup dir1/f3
       dir1 inode_ctx->cache_time(T1) < T2, hence do not serve from cache.
       lookup is sent to the brick.

Thus the cache of dir1 is invalid until T1+10 min(cache timeout), and also for the 10 min no new cache is added to the dir1.

Consequence:
After any brick down/up, the cache of all the existing directories is invalid and cannot be corrected for max 10 min. Thus after brick down/up the nl-cache doesn't work for the directories that already had the cache, for the max of 10 min.



Solution:
If the inoce_ctx->cachetime < brick_down time, then delete the existing cache and start populating fresh cache.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-15 05:48:25 EDT ---

This bug is automatically being provided 'pm_ack+' for the release flag 'rhgs‑3.3.0', the current release of Red Hat Gluster Storage 3 under active development, having been appropriately marked for the release, and having been provided ACK from Development and QE

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-15 06:03:05 EDT ---

Since this bug has been approved for the RHGS 3.3.0 release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.3.0+', and through the Internal Whiteboard entry of '3.3.0', the Target Release is being automatically set to 'RHGS 3.3.0'

--- Additional comment from Worker Ant on 2017-06-04 01:36:22 EDT ---

REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and stale cache) posted (#1) for review on master by Poornima G (pgurusid)

--- Additional comment from Worker Ant on 2017-06-08 01:24:49 EDT ---

REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and stale cache) posted (#2) for review on master by Poornima G (pgurusid)

--- Additional comment from Worker Ant on 2017-06-09 01:33:27 EDT ---

REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and stale cache) posted (#3) for review on master by Poornima G (pgurusid)

--- Additional comment from Worker Ant on 2017-06-12 01:03:12 EDT ---

REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and stale cache) posted (#4) for review on master by Poornima G (pgurusid)

--- Additional comment from Worker Ant on 2017-06-12 01:14:44 EDT ---

REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and stale cache) posted (#5) for review on master by Poornima G (pgurusid)

--- Additional comment from Worker Ant on 2017-06-13 01:01:21 EDT ---

COMMIT: https://review.gluster.org/17453 committed in master by Atin Mukherjee (amukherj) 
------
commit 7674584fa53944a4e982e217798f31a3d1ef313b
Author: Poornima G <pgurusid>
Date:   Fri May 26 15:45:57 2017 +0530

    nl-cache: Fix a possible crash and stale cache
    
    Issue1:
    Consider the followinf sequence of operations:
       ...
       nlc_ctx = nlc_ctx_get (inode i1)
       ....... -> nlc_clear_cache (i1) gets called as a part of nlc_invalidate
                                       or any other callers
                  ...
                  GF_FREE (ii nlc_ctx)
       LOCK (nlc_ctx->lock);  ->  This will result in crash as the ctx
                                  got freed in nlc_clear_cache.
    
    Issue2:
       lookup on dir1/file1 result in ENOENT
       add cache to dir1 at time T1
       ....
       CHILD_DOWN at T2
       lookup on dir1/file2 result in ENOENT
       add cache to dir1, but the cache time is still T1
       lookup on dir1/file2 - should have been served from cache
                              but the cache time is T1 < T2, hence
                              cache is considered as invalid.
    So, after CHILD_DOWN the right thing would be to clear the cache
    and restart caching on that inode.
    
    Solution:
    Do not free nlc_ctx in nlc_clear_cache, but only in inode_forget()
    The fix for both issue1 and 2 is interleaved hence sending it as
    single patch.
    
    Change-Id: I83d8ed36c049a93567c6d7e63d045dc14ccbb397
    BUG: 1458539
    Signed-off-by: Poornima G <pgurusid>
    Reviewed-on: https://review.gluster.org/17453
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 1 Shyamsundar 2018-06-20 18:26:48 UTC
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.


Note You need to log in before you can comment on or make changes to this bug.