+++ This bug was initially created as a clone of Bug #1458539 +++ +++ This bug was initially created as a clone of Bug #1450080 +++ Description of problem: ======================== Seems like if we restart a volume , the negative lookup caching feature is lost. I did a lookup of a non existing file with nl caching enabled generally the first lookup takes about 7-13 lookup FOPs depending on the dht hash/unhashed brick On a second lookup, I notice that the lookup FOP generally seemed to take about 1 lookup FOP on each brick I then did a volume restart and reran above checks and found that even on repeatitive negative lookups, the lookup FOPs take the same as without nl cache enabled I saw the trusted vol file and didn't note any descrepencies, as the nl cache translators got loaded Version-Release number of selected component (if applicable): ====== [root@dhcp35-192 negative]# rpm -qa|grep gluster gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64 python-gluster-3.8.4-24.el7rhgs.noarch glusterfs-libs-3.8.4-24.el7rhgs.x86_64 glusterfs-fuse-3.8.4-24.el7rhgs.x86_64 glusterfs-api-devel-3.8.4-24.el7rhgs.x86_64 glusterfs-debuginfo-3.8.4-24.el7rhgs.x86_64 samba-vfs-glusterfs-4.6.3-0.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-3.8.4-24.el7rhgs.x86_64 glusterfs-api-3.8.4-24.el7rhgs.x86_64 glusterfs-cli-3.8.4-24.el7rhgs.x86_64 glusterfs-devel-3.8.4-24.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-24.el7rhgs.x86_64 glusterfs-rdma-3.8.4-24.el7rhgs.x86_64 vdsm-gluster-4.17.33-1.1.el7rhgs.noarch glusterfs-client-xlators-3.8.4-24.el7rhgs.x86_64 glusterfs-server-3.8.4-24.el7rhgs.x86_64 glusterfs-events-3.8.4-24.el7rhgs.x86_64 [root@dhcp35-192 negative]# rpm -qa|grep samba samba-common-4.6.3-0.el7rhgs.noarch samba-vfs-glusterfs-4.6.3-0.el7rhgs.x86_64 samba-common-tools-4.6.3-0.el7rhgs.x86_64 samba-common-libs-4.6.3-0.el7rhgs.x86_64 samba-4.6.3-0.el7rhgs.x86_64 samba-client-4.6.3-0.el7rhgs.x86_64 samba-libs-4.6.3-0.el7rhgs.x86_64 samba-client-libs-4.6.3-0.el7rhgs.x86_64 [root@dhcp35-192 negative]# How reproducible: =============== consistent on my setup Steps to Reproduce: 1.had a 3 node cluster with samba setup using https://mojo.redhat.com/docs/DOC-1027985(no ctdb setup) 2.created a 2x2 volume as below ===>vol bricks are part of n2 and n3 nodes, while n1 was the samba server node [root@dhcp35-192 ~]# gluster v info Volume Name: negative Type: Distributed-Replicate Volume ID: 48c809a8-8a82-4806-aaf0-5ea0bbf423cb Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: :/rhs/brick1/negative Brick2: :/rhs/brick1/negative Brick3: :/rhs/brick2/negative Brick4: :/rhs/brick2/negative Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on features.cache-invalidation-timeout: 3000 features.cache-invalidation: on performance.nl-cache: on nfs.disable: on transport.address-family: inet server.allow-insecure: on storage.batch-fsync-delay-usec: 0 VOLUME PROFILE IS STARTED 4. mounted using cifs on a rhel client 5.did some amount of testing wrt nl caching(for about half a day) 6. created a directory dir4 7. did a negative lookup by issuing ls x2 says file not existing as expected the profiling mentioned about 7-13 fops 8. again did a ls x2 to see nl cache working fop redcued to about 1 per brick 9. restarted vol 10. repeated 7,8 steps again and again, but the lookup fop count is not reducing before restart first negative lookup [root@dhcp35-192 ~]# gluster v profile negative info incremental Brick: :/rhs/brick2/negative ---------------------------------------- Interval 16 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 9.69 60.00 us 60.00 us 60.00 us 1 GETXATTR 90.31 111.80 us 82.00 us 143.00 us 5 LOOKUP Duration: 4 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: :/rhs/brick1/negative ---------------------------------------- Interval 16 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 12.04 124.00 us 124.00 us 124.00 us 1 GETXATTR 87.96 113.25 us 63.00 us 252.00 us 8 LOOKUP Duration: 4 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: :/rhs/brick2/negative ---------------------------------------- Interval 16 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 9.87 70.00 us 70.00 us 70.00 us 1 GETXATTR 90.13 127.80 us 100.00 us 152.00 us 5 LOOKUP Duration: 5 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: :/rhs/brick1/negative ---------------------------------------- Interval 16 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 12.18 147.00 us 147.00 us 147.00 us 1 GETXATTR 87.82 132.50 us 79.00 us 299.00 us 8 LOOKUP Duration: 5 seconds Data Read: 0 bytes Data Written: 0 bytes second negative lookup [root@dhcp35-192 ~]# gluster v profile negative info incremental Brick: :/rhs/brick1/negative ---------------------------------------- Interval 17 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 100.00 336.00 us 336.00 us 336.00 us 1 LOOKUP Duration: 6 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: :/rhs/brick2/negative ---------------------------------------- Interval 17 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 100.00 338.00 us 338.00 us 338.00 us 1 LOOKUP Duration: 6 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: :/rhs/brick2/negative ---------------------------------------- Interval 17 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 100.00 144.00 us 144.00 us 144.00 us 1 LOOKUP Duration: 6 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: :/rhs/brick1/negative ---------------------------------------- Interval 17 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 100.00 263.00 us 263.00 us 263.00 us 1 LOOKUP Duration: 6 seconds Data Read: 0 bytes Data Written: 0 bytes [root@dhcp35-192 ~]# ############################# on restart first negative lookup [root@dhcp35-192 ~]# gluster v profile negative info incremental Brick: :/rhs/brick2/negative ---------------------------------------- Interval 13 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 100.00 121.29 us 77.00 us 251.00 us 7 LOOKUP Duration: 8 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: :/rhs/brick1/negative ---------------------------------------- Interval 13 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 100.00 116.08 us 53.00 us 446.00 us 13 LOOKUP Duration: 8 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: :/rhs/brick1/negative ---------------------------------------- Interval 13 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 100.00 125.92 us 69.00 us 477.00 us 13 LOOKUP Duration: 8 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: :/rhs/brick2/negative ---------------------------------------- Interval 13 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 100.00 113.86 us 74.00 us 249.00 us 7 LOOKUP Duration: 8 seconds Data Read: 0 bytes Data Written: 0 bytes second negative lookup [root@dhcp35-192 ~]# gluster v profile negative info incremental Brick: :/rhs/brick2/negative ---------------------------------------- Interval 13 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 100.00 121.29 us 77.00 us 251.00 us 7 LOOKUP Duration: 8 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: :/rhs/brick1/negative ---------------------------------------- Interval 13 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 100.00 116.08 us 53.00 us 446.00 us 13 LOOKUP Duration: 8 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: :/rhs/brick1/negative ---------------------------------------- Interval 13 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 100.00 125.92 us 69.00 us 477.00 us 13 LOOKUP Duration: 8 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: :/rhs/brick2/negative ---------------------------------------- Interval 13 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 100.00 113.86 us 74.00 us 249.00 us 7 LOOKUP Duration: 8 seconds Data Read: 0 bytes Data Written: 0 bytes Workaround: ======== post restart turn off and again turn on nl cache option --- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-11 09:51:31 EDT --- This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.3.0' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from Poornima G on 2017-05-12 08:50:26 EDT --- RCA: The cache is stored in the directories inode_ctx, and the time of cache is set to the time inode_ctx was created, and is not updated as and when new cache entries are added to the inode_ctx(as it is cumbersome to have timeout for each entry, and updating global time will result in stale cache of older entries). Consider a sequence of operation as below: At T1, lookup dir1/f1 (f1 doesn't exist) dir1 inode_ctx->cache_time = T1 cache = "f1" lookup dir f2 (f2 doesn't exist) dir1 inode_ctx->cache_time remains unchanged cache = "f1", "f2" Child_DOWN / all /some of the bricks down at T2 thus anything cached before T2 is considered invalid but is not cleared from memory. At T3, lookup dir1/f1 dir1 inode_ctx->cache_time(T1) < T2, hence do not serve from cache. lookup is sent to the brick. At T4, lookup dir1/f3 dir1 inode_ctx->cache_time(T1) < T2, hence do not serve from cache. lookup is sent to the brick. Thus the cache of dir1 is invalid until T1+10 min(cache timeout), and also for the 10 min no new cache is added to the dir1. Consequence: After any brick down/up, the cache of all the existing directories is invalid and cannot be corrected for max 10 min. Thus after brick down/up the nl-cache doesn't work for the directories that already had the cache, for the max of 10 min. Solution: If the inoce_ctx->cachetime < brick_down time, then delete the existing cache and start populating fresh cache. --- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-15 05:48:25 EDT --- This bug is automatically being provided 'pm_ack+' for the release flag 'rhgs‑3.3.0', the current release of Red Hat Gluster Storage 3 under active development, having been appropriately marked for the release, and having been provided ACK from Development and QE --- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-15 06:03:05 EDT --- Since this bug has been approved for the RHGS 3.3.0 release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.3.0+', and through the Internal Whiteboard entry of '3.3.0', the Target Release is being automatically set to 'RHGS 3.3.0' --- Additional comment from Worker Ant on 2017-06-04 01:36:22 EDT --- REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and stale cache) posted (#1) for review on master by Poornima G (pgurusid) --- Additional comment from Worker Ant on 2017-06-08 01:24:49 EDT --- REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and stale cache) posted (#2) for review on master by Poornima G (pgurusid) --- Additional comment from Worker Ant on 2017-06-09 01:33:27 EDT --- REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and stale cache) posted (#3) for review on master by Poornima G (pgurusid) --- Additional comment from Worker Ant on 2017-06-12 01:03:12 EDT --- REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and stale cache) posted (#4) for review on master by Poornima G (pgurusid) --- Additional comment from Worker Ant on 2017-06-12 01:14:44 EDT --- REVIEW: https://review.gluster.org/17453 (nl-cache: Fix a possible crash and stale cache) posted (#5) for review on master by Poornima G (pgurusid) --- Additional comment from Worker Ant on 2017-06-13 01:01:21 EDT --- COMMIT: https://review.gluster.org/17453 committed in master by Atin Mukherjee (amukherj) ------ commit 7674584fa53944a4e982e217798f31a3d1ef313b Author: Poornima G <pgurusid> Date: Fri May 26 15:45:57 2017 +0530 nl-cache: Fix a possible crash and stale cache Issue1: Consider the followinf sequence of operations: ... nlc_ctx = nlc_ctx_get (inode i1) ....... -> nlc_clear_cache (i1) gets called as a part of nlc_invalidate or any other callers ... GF_FREE (ii nlc_ctx) LOCK (nlc_ctx->lock); -> This will result in crash as the ctx got freed in nlc_clear_cache. Issue2: lookup on dir1/file1 result in ENOENT add cache to dir1 at time T1 .... CHILD_DOWN at T2 lookup on dir1/file2 result in ENOENT add cache to dir1, but the cache time is still T1 lookup on dir1/file2 - should have been served from cache but the cache time is T1 < T2, hence cache is considered as invalid. So, after CHILD_DOWN the right thing would be to clear the cache and restart caching on that inode. Solution: Do not free nlc_ctx in nlc_clear_cache, but only in inode_forget() The fix for both issue1 and 2 is interleaved hence sending it as single patch. Change-Id: I83d8ed36c049a93567c6d7e63d045dc14ccbb397 BUG: 1458539 Signed-off-by: Poornima G <pgurusid> Reviewed-on: https://review.gluster.org/17453 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> CentOS-regression: Gluster Build System <jenkins.org>
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained. As a result this bug is being closed. If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.