+++ This bug was initially created as a clone of Bug #965400 +++ Description of problem: Testing between 2.0 and 2.1 shows that the performance of reading large directories over NFS is very slow - especially when compared with the readdirplus improvements added to fuse/gluster in 2.1 Version-Release number of selected component (if applicable): 2.1 is the real issue, but the issue seems apparent in 2.0u4 How reproducible: I've attached my findings that illustrate the issue, which includes a breakdown of the test environment Steps to Reproduce: 1. create a large directory (>=75k files) 2. Mount the volume, and use ls -l (long listing) 3. Compare these times, with native access Actual results: Glusterfs and SMB access is significantly quicker to return directory listings than NFS Expected results: Given the NFS protocol is within the translator stack, I would have expected NFS access to be at least comparable to SMB.Directory listings are fundamental to a lot of organisations work flows, so access times of 5 minutes are not really acceptable. Additional info: --- Additional comment from RHEL Product and Program Management on 2013-05-21 04:13:04 EDT --- Since this issue was entered in bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release. --- Additional comment from Vijay Bellur on 2013-05-21 06:58:15 EDT --- Is there any performance difference if nfs client is mounted with -o noacl? --- Additional comment from Paul Cuzner on 2013-05-22 10:32:21 EDT --- (In reply to Vijay Bellur from comment #2) > Is there any performance difference if nfs client is mounted with -o noacl? Yes, for the first run (my tests scenario is initial run, followed by 3 successive runs to see if there is any caching effect) mount with mount -t nfs -o proto=tcp,vers=3,noatime,nodiratime,noacl rhs21-1-1:/smallfiles /mnt/nfs NFS test - time /bin/ls -l /mnt/nfs | wc -l #1 0m33.336s #2 1m36.452s #3 1m6.302s #4 1m7.647s NB. The timings for runs 2,3 and 4 are the same with or without noacl. I also tried to force rdirplus on the mount options drop cache, reboot client mount with mount -t nfs -o vers=3,proto=tcp,noatime,nodiratime,rdirplus rhs21-1-1:/smallfiles /mnt/nfs NFS test - time /bin/ls -l /mnt/nfs | wc -l #1 2m41.269s #2 1m34.010s #3 1m8.909s #4 1m6.816s The same environment/test based on the SMB/glusterfs client yields SMB - 43s, 23s, 24s, 24s glusterfs - 36s, 18s, 18s, 19s --- Additional comment from Ben England on 2013-05-23 15:24:33 EDT --- in glusterfs 3.3.0.8 on RHEL6.4 I saw readdirplus actually running using wireshark Gluster plugin, I tested it using native client and smallfile benchmark, and with NFS mountpoint from same host (not a RHS server). workload: # time ls -l /mnt/glnfs/bigdir2.d/file_srcdir/gprfc088/d00 > /tmp/foo # time ls -l /mnt/glusterfs/bigdir2.d/file_srcdir/gprfc088/d00 > /tmp/foo results for 30000-file directory: NFS: 7 seconds for 30000 files > 4000 files/sec, far improved. native: 28 seconds for 30000 files > 1000 files/sec, same as RHS 2.0 basically don't understand why I'm not getting performance boost for native client, but with NFS it was definitely working. Perhaps the NFS translator is doing something more intelligent with it than native FUSE client. NFS protocol trace (from NFS server end) at http://perf1.perf.lab.eng.bos.redhat.com/bengland/public/rhs/bigbend/s045-nfs-ls-l.tcpdump You see READDIRP calls coming from NFS server to brick servers, and readdirplus replies coming from NFS server to NFS client. native client trace at: http://perf1.perf.lab.eng.bos.redhat.com/bengland/public/rhs/bigbend/native-ls-l-try2.tcpdump native client protocol trace is mostly LOOKUP calls in addition to the READDIRP calls. I don't understand why it does LOOKUP. --- Additional comment from Rajesh on 2013-05-24 05:47:49 EDT --- This is apparently caused by the ACL code which was recently introduced. Mounting with noacl option solves this issue, the times being: /mnt/m1/test# time ls > /n real 0m14.790s user 0m0.200s sys 0m0.872s which is comparable to that of fuse and smb mounts. Looking into what changes in ACL affects ls performance. --- Additional comment from Rajesh on 2013-05-24 06:40:07 EDT --- Interesting observations: On a fresh mount with acl, this is the second run with caches dropped: /mnt/m1/test# time ls | wc -l 74999 real 0m17.669s user 0m0.264s sys 0m0.828s ~# gluster volume profile vol info nfs NFS Server : localhost ---------------------- Cumulative Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 3 RELEASEDIR 0.01 844.00 us 309.00 us 2420.00 us 5 ACCESS 0.33 18334.67 us 1081.00 us 89234.00 us 6 LOOKUP 0.77 28412.33 us 447.00 us 162240.00 us 9 STAT 4.69 520457.33 us 137367.00 us 1255235.00 us 3 OPENDIR 12.82 579.96 us 105.00 us 65877.00 us 7358 FSTAT 81.38 3682.40 us 455.00 us 207464.00 us 7358 READDIRP Duration: 118 seconds Data Read: 0 bytes Data Written: 0 bytes Interval 0 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 3 RELEASEDIR 0.01 844.00 us 309.00 us 2420.00 us 5 ACCESS 0.33 18334.67 us 1081.00 us 89234.00 us 6 LOOKUP 0.77 28412.33 us 447.00 us 162240.00 us 9 STAT 4.69 520457.33 us 137367.00 us 1255235.00 us 3 OPENDIR 12.82 579.96 us 105.00 us 65877.00 us 7358 FSTAT 81.38 3682.40 us 455.00 us 207464.00 us 7358 READDIRP Duration: 118 seconds Data Read: 0 bytes Data Written: 0 bytes After the second run, without clearing the caches: /mnt/m1/test# time ls | wc -l 74999 real 1m50.138s user 0m0.644s sys 0m5.292s ~# gluster volume profile vol info nfs NFS Server : localhost ---------------------- Cumulative Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 3 RELEASEDIR 0.00 707.86 us 278.00 us 2420.00 us 7 ACCESS 0.20 23381.09 us 447.00 us 162240.00 us 11 STAT 1.23 520457.33 us 137367.00 us 1255235.00 us 3 OPENDIR 3.36 579.96 us 105.00 us 65877.00 us 7358 FSTAT 21.32 3682.40 us 455.00 us 207464.00 us 7358 READDIRP 73.89 1252.25 us 208.00 us 273030.00 us 75005 LOOKUP Duration: 283 seconds Data Read: 0 bytes Data Written: 0 bytes Interval 1 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 367.50 us 278.00 us 457.00 us 2 ACCESS 0.00 740.50 us 638.00 us 843.00 us 2 STAT 100.00 1250.88 us 208.00 us 273030.00 us 74999 LOOKUP Duration: 165 seconds Data Read: 0 bytes Data Written: 0 bytes We see 75k lookups which result in a time of 110 seconds. --- Additional comment from Rajesh on 2013-05-24 06:47:09 EDT --- Observations without acl: chronology of commands is preserved: ~# mount -t nfs -o vers=3,nolock,soft,intr,noacl localhost:/vol /mnt/m1 ~# cd /mnt/m1 /mnt/m1# cd test /mnt/m1/test# time ls | wc -l 74999 real 0m13.040s user 0m0.276s sys 0m0.768s ~# gluster volume profile vol info nfs NFS Server : localhost ---------------------- Cumulative Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 4 RELEASEDIR 0.00 641.70 us 242.00 us 2420.00 us 10 ACCESS 0.19 17306.07 us 353.00 us 162240.00 us 15 STAT 1.13 390568.50 us 902.00 us 1255235.00 us 4 OPENDIR 4.49 562.50 us 102.00 us 65877.00 us 11036 FSTAT 26.20 3281.44 us 455.00 us 207464.00 us 11036 READDIRP 67.98 1252.58 us 208.00 us 273030.00 us 75009 LOOKUP Duration: 722 seconds Data Read: 0 bytes Data Written: 0 bytes Interval 2 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1 RELEASEDIR 0.01 902.00 us 902.00 us 902.00 us 1 OPENDIR 0.01 487.33 us 242.00 us 730.00 us 3 ACCESS 0.02 599.75 us 353.00 us 1016.00 us 4 STAT 0.26 7328.75 us 1421.00 us 23480.00 us 4 LOOKUP 17.49 527.57 us 102.00 us 2887.00 us 3678 FSTAT 82.20 2479.31 us 595.00 us 4856.00 us 3678 READDIRP Duration: 439 seconds Data Read: 0 bytes Data Written: 0 bytes /mnt/m1/test# time ls | wc -l 74999 real 0m47.507s user 0m0.596s sys 0m5.384s ~# gluster volume profile vol info nfs NFS Server : localhost ---------------------- Cumulative Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 4 RELEASEDIR 0.00 631.45 us 242.00 us 2420.00 us 11 ACCESS 0.91 390568.50 us 902.00 us 1255235.00 us 4 OPENDIR 3.63 562.50 us 102.00 us 65877.00 us 11036 FSTAT 19.32 440.36 us 96.00 us 162240.00 us 75017 STAT 21.18 3281.44 us 455.00 us 207464.00 us 11036 READDIRP 54.95 1252.58 us 208.00 us 273030.00 us 75009 LOOKUP Duration: 835 seconds Data Read: 0 bytes Data Written: 0 bytes Interval 3 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 529.00 us 529.00 us 529.00 us 1 ACCESS 100.00 436.99 us 96.00 us 31673.00 us 75002 STAT Duration: 113 seconds Data Read: 0 bytes Data Written: 0 bytes --- Additional comment from Rajesh on 2013-05-24 07:19:15 EDT --- The same tests run with glusterfs version: rhs-2.0 latest HEAD: /mnt/m1/test# time ls | wc -l 75000 real 0m10.161s user 0m0.256s sys 0m0.644s /mnt/m1/test# gluster volume profile vol info nfs NFS Server : localhost ---------------------- Cumulative Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.05 1530.67 us 1206.00 us 2127.00 us 3 LOOKUP 18.28 421.51 us 106.00 us 3376.00 us 3678 FSTAT 81.67 1883.42 us 496.00 us 4035.00 us 3678 READDIRP Duration: 33 seconds Data Read: 0 bytes Data Written: 0 bytes Interval 0 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.05 1530.67 us 1206.00 us 2127.00 us 3 LOOKUP 18.28 421.51 us 106.00 us 3376.00 us 3678 FSTAT 81.67 1883.42 us 496.00 us 4035.00 us 3678 READDIRP Duration: 33 seconds Data Read: 0 bytes Data Written: 0 bytes Second run without dropping caches: /mnt/m1/test# time ls | wc -l 75000 real 0m48.514s user 0m0.448s sys 0m2.880s /mnt/m1/test# gluster volume profile vol info nfs NFS Server : localhost ---------------------- Cumulative Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 591.00 us 591.00 us 591.00 us 1 ACCESS 0.00 630.00 us 630.00 us 630.00 us 1 STAT 3.24 421.51 us 106.00 us 3376.00 us 3678 FSTAT 14.47 1883.42 us 496.00 us 4035.00 us 3678 READDIRP 82.29 525.43 us 196.00 us 6560.00 us 75005 LOOKUP Duration: 155 seconds Data Read: 0 bytes Data Written: 0 bytes Interval 1 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 591.00 us 591.00 us 591.00 us 1 ACCESS 0.00 630.00 us 630.00 us 630.00 us 1 STAT 100.00 525.39 us 196.00 us 6560.00 us 75002 LOOKUP Duration: 122 seconds Data Read: 0 bytes Data Written: 0 bytes --- Additional comment from Rajesh on 2013-05-24 08:20:58 EDT --- Single brick volume, rhs 2.1, noacl: /mnt/m1# time ls | wc -l 75001 real 0m15.119s user 0m0.228s sys 0m0.804s l/mnt/m1# gluster volume profile vol info nfs NFS Server : localhost ---------------------- Cumulative Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1 RELEASEDIR 0.12 7981.00 us 434.00 us 15528.00 us 2 ACCESS 0.34 9089.00 us 600.00 us 27439.00 us 5 LOOKUP 0.38 50874.00 us 50874.00 us 50874.00 us 1 OPENDIR 1.98 52955.00 us 144.00 us 263421.00 us 5 STAT 11.05 401.76 us 72.00 us 25597.00 us 3677 FSTAT 86.13 3132.19 us 617.00 us 289829.00 us 3677 READDIRP Duration: 38 seconds Data Read: 0 bytes Data Written: 0 bytes Interval 0 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1 RELEASEDIR 0.12 7981.00 us 434.00 us 15528.00 us 2 ACCESS 0.34 9089.00 us 600.00 us 27439.00 us 5 LOOKUP 0.38 50874.00 us 50874.00 us 50874.00 us 1 OPENDIR 1.98 52955.00 us 144.00 us 263421.00 us 5 STAT 11.05 401.76 us 72.00 us 25597.00 us 3677 FSTAT 86.13 3132.19 us 617.00 us 289829.00 us 3677 READDIRP Duration: 38 seconds Data Read: 0 bytes Data Written: 0 bytes Second run: without dropping caches /mnt/m1# time ls | wc -l 75001 real 1m16.110s user 0m0.592s sys 0m5.824s /mnt/m1# gluster volume profile vol info nfs NFS Server : localhost ---------------------- Cumulative Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1 RELEASEDIR 0.02 4189.25 us 332.00 us 15528.00 us 4 ACCESS 0.07 50874.00 us 50874.00 us 50874.00 us 1 OPENDIR 0.37 29594.89 us 144.00 us 263421.00 us 9 STAT 2.06 401.76 us 72.00 us 25597.00 us 3677 FSTAT 16.03 3132.19 us 617.00 us 289829.00 us 3677 READDIRP 81.45 780.05 us 116.00 us 110062.00 us 75006 LOOKUP Duration: 364 seconds Data Read: 0 bytes Data Written: 0 bytes Interval 1 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 397.50 us 332.00 us 463.00 us 2 ACCESS 0.00 394.75 us 352.00 us 439.00 us 4 STAT 100.00 779.50 us 116.00 us 110062.00 us 75001 LOOKUP Duration: 326 seconds Data Read: 0 bytes Data Written: 0 bytes --- Additional comment from Ben England on 2013-05-29 10:34:10 EDT --- cc'ing Steve Dickson, he would know better than I how to tune NFS client caching for this case. re: comment 9 -- 15 seconds for 75000 files isn't bad., that's 5K files/sec! If you got this result consistently we would declare victory and close the bug. The problem is that we don't get this consistently. Why do the LOOKUP storms happen? Could the problem be that NFS is timing out the cached metadata and so it's asking Gluster for the metadata again, and Gluster uses LOOKUP the 2nd time instead of READDIRP? If so, maybe you might be able to solve this by timing out the directory metadata before the file metadata -- this would force Gluster to do READDIRP again and avoid the need for 75000 LOOKUPs perhaps? There are separate timeout parameters in the NFS client mount options for directories and for attribute caching, perhaps the directory cache timeout needs to be lower than the attribute cache timeout? I would try acdirmax < acregmin. Also, why doesn't Gluster use the md-cache translator and hang onto the metadata that it got the first time so LOOKUP calls aren't needed? Also, is there a way for Gluster client to express in READDIRP to the brick server the thought "tell me the list of files in this directory with metadata that changed since time T" where T is last time that client did READDIRP? This would be a very short list and would avoid unnecessary LOOKUP calls. --- Additional comment from Scott Haines on 2013-07-24 03:10:20 EDT --- Per discussion w/ Vivek, works without ACL being on. Not for Big Bend as ACL is off by default. Targeting for Corbett. --- Additional comment from Simon Green on 2013-07-28 19:18:42 EDT --- User pcuzner's account has been closed --- Additional comment from Scott Haines on 2013-09-23 19:18:56 EDT --- Retargeting for 2.1.z U2 (Corbett) release. --- Additional comment from on 2013-12-12 04:22:10 EST --- --- Additional comment from on 2013-12-12 04:23:21 EST --- --- Additional comment from on 2013-12-12 04:27:29 EST --- version: glusterfs 3.4.0.49rhs built on Dec 12 2013 00:06:18 Volume type: Dist-Rep 2x2 No of nodes for bricks:2 No of nodes for client:1 No of file: 75000 1. With ACL real 4m24.429s user 0m2.520s sys 0m9.591s 2. Without Flushing Cache and With ACL real 1m29.440s user 0m2.100s sys 0m7.167s 3. Flushed the cache and ran it again real 4m21.228s user 0m2.369s sys 0m8.966s 4. Ran again with cache real 1m27.342s user 0m2.198s sys 0m7.477s With NO-ACL ----------- 1. Fresh Mount: --------------- real 1m26.744s user 0m1.380s sys 0m2.077s 2. With Cache (noacl): ---------------------- real 2m7.984s user 0m1.953s sys 0m7.645s 3. Flushed Cache (noacl): ------------------------- real 1m25.701s user 0m1.346s sys 0m1.960s Profiling: ----------- Profie output attached 1.With Acl real 4m15.650s user 0m2.312s sys 0m8.970s Profile output attached 2.NOACL real 1m6.859s user 0m1.390s sys 0m2.015s --- Additional comment from santosh pradhan on 2013-12-14 08:27:49 EST --- Finding by code-review and discussion with Amar: Each getacl call imposes a lookup call which is same as every NFS call, followed by stat call (to be the attributes same as getattr call). And each getacl() call will stack winds twice to bricks for xattrs (once for user acl and other for default acl). Which brings in some performance penalty. That happens for each FILE. ACL performance issue is very generic. Thats why it's suggested to mount with noacl for better performance if ACL is not absolutely necessary. --- Additional comment from Amar Tumballi on 2013-12-15 23:46:38 EST --- to confirm the theory, one would need to run the tests with 'md-cache' translator on nfs-server process. But having 'md-cache' in nfs graph is not totally recommended at the moment as we don't run our tests with it.
REVIEW: http://review.gluster.org/6523 (gNFS: Client cache invalidation with bad fsid) posted (#1) for review on master by Santosh Pradhan (spradhan)
REVIEW: http://review.gluster.org/6523 (gNFS: Client cache invalidation with bad fsid) posted (#2) for review on master by Santosh Pradhan (spradhan)
Results with the FIX: (with 50k files) 1. 1st run: ====== [root@santosh-new-4 glnfs]# time ls |wc -l 50000 real 0m5.634s user 0m0.445s sys 0m0.108s [root@spradhan glusterfs]# gluster v profile nfsperf info nfs NFS Server : localhost ---------------------- Cumulative Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 50001 RELEASE 0.00 0.00 us 0.00 us 0.00 us 11 RELEASEDIR 0.00 598.00 us 598.00 us 598.00 us 1 STATFS 0.00 664.50 us 615.00 us 714.00 us 2 LOOKUP 0.00 600.33 us 458.00 us 1147.00 us 6 OPENDIR 0.00 519.67 us 237.00 us 666.00 us 9 ACCESS 0.00 515.18 us 304.00 us 913.00 us 28 STAT 0.53 496.30 us 125.00 us 1357.00 us 3210 FSTAT 99.46 92418.16 us 1002.00 us 1728777.00 us 3210 READDIRP Duration: 3600 seconds Data Read: 0 bytes Data Written: 0 bytes Interval 11 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1 RELEASEDIR 0.01 458.00 us 458.00 us 458.00 us 1 OPENDIR 0.02 543.50 us 469.00 us 618.00 us 2 STAT 5.28 450.75 us 125.00 us 1357.00 us 535 FSTAT 94.68 8077.19 us 1478.00 us 18099.00 us 535 READDIRP Duration: 19 seconds Data Read: 0 bytes Data Written: 0 bytes 2. 2nd run with attr cache: ============================ [root@santosh-new-4 glnfs]# time ls |wc -l 50000 real 0m0.466s user 0m0.429s sys 0m0.036s [root@spradhan glusterfs]# gluster v profile nfsperf info nfs NFS Server : localhost ---------------------- Cumulative Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 50001 RELEASE 0.00 0.00 us 0.00 us 0.00 us 11 RELEASEDIR 0.00 598.00 us 598.00 us 598.00 us 1 STATFS 0.00 664.50 us 615.00 us 714.00 us 2 LOOKUP 0.00 600.33 us 458.00 us 1147.00 us 6 OPENDIR 0.00 528.40 us 237.00 us 666.00 us 10 ACCESS 0.01 516.90 us 304.00 us 913.00 us 30 STAT 0.53 496.30 us 125.00 us 1357.00 us 3210 FSTAT 99.46 92418.16 us 1002.00 us 1728777.00 us 3210 READDIRP Duration: 3612 seconds Data Read: 0 bytes Data Written: 0 bytes Interval 12 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 35.94 607.00 us 607.00 us 607.00 us 1 ACCESS 64.06 541.00 us 523.00 us 559.00 us 2 STAT Duration: 12 seconds Data Read: 0 bytes Data Written: 0 bytes 3. dropped the attr cache: [root@santosh-new-4 glnfs]# echo 3 > /proc/sys/vm/drop_caches [root@santosh-new-4 glnfs]# time ls |wc -l 50000 real 0m5.643s user 0m0.441s sys 0m0.129s [root@spradhan glusterfs]# gluster v profile nfsperf info nfs NFS Server : localhost ---------------------- Cumulative Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 50001 RELEASE 0.00 0.00 us 0.00 us 0.00 us 12 RELEASEDIR 0.00 598.00 us 598.00 us 598.00 us 1 STATFS 0.00 664.50 us 615.00 us 714.00 us 2 LOOKUP 0.00 543.43 us 202.00 us 1147.00 us 7 OPENDIR 0.00 497.18 us 185.00 us 666.00 us 11 ACCESS 0.01 498.33 us 165.00 us 913.00 us 33 STAT 0.61 489.29 us 125.00 us 1357.00 us 3745 FSTAT 99.39 80366.99 us 1002.00 us 1728777.00 us 3745 READDIRP Duration: 3753 seconds Data Read: 0 bytes Data Written: 0 bytes Interval 13 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1 RELEASEDIR 0.00 185.00 us 185.00 us 185.00 us 1 ACCESS 0.00 202.00 us 202.00 us 202.00 us 1 OPENDIR 0.02 312.67 us 165.00 us 573.00 us 3 STAT 5.26 447.23 us 152.00 us 692.00 us 535 FSTAT 94.72 8059.91 us 1348.00 us 18544.00 us 535 READDIRP Duration: 141 seconds Data Read: 0 bytes Data Written: 0 bytes There is no LOOKUP storm anymore :)
COMMIT: http://review.gluster.org/6523 committed in master by Vijay Bellur (vbellur) ------ commit 329e38d4ab5af1a675b4d5651eda983f8a924418 Author: Santosh Kumar Pradhan <spradhan> Date: Tue Dec 17 08:43:50 2013 +0530 gNFS: Client cache invalidation with bad fsid 1. Problem: Couple of issues are seen when NFS-ACL is turned ON. i.e. i) NFS directory access is too slow, impacting customer workflows with ACL ii)dbench fails with 100 directories. 2. Root cause: Frequent cache invalidation in the client side when ACL is turned ON with NFS because NFS server getacl() code returns the wrong fsid to the client. 3. This attr-cache invlaidation triggers the frequent LOOKUP ops for each file instead of relying on the readdir or readdirp data. As a result performance gets impacted. 4. In case of dbench workload, the problem is more severe. e.g. Client side rpcdebug output: =========================== Dec 16 10:16:53 santosh-3 kernel: NFS: nfs_update_inode(0:1b/12061953567282551806 ct=2 info=0x7e7f) Dec 16 10:16:53 santosh-3 kernel: NFS: nfs_fhget(0:1b/12061953567282551806 ct=2) Dec 16 10:16:53 santosh-3 kernel: <-- nfs_xdev_get_sb() = -116 [splat] Dec 16 10:16:53 santosh-3 kernel: nfs_do_submount: done Dec 16 10:16:53 santosh-3 kernel: <-- nfs_do_submount() = ffffffffffffff8c Dec 16 10:16:53 santosh-3 kernel: <-- nfs_follow_mountpoint() = ffffffffffffff8c Dec 16 10:16:53 santosh-3 kernel: NFS: dentry_delete(clients/client77, 20008) As per Jeff Layton, This occurs when the client detects that the fsid on a filehandle is different from its parent. At that point, it tries to do a new submount of the new filesystem onto the correct point. It means client got a superblock reference for the new fs and is now looking to set up the root of the mount. It calls nfs_get_root to do that, which basically takes the superblock and a filehandle and returns a dentry. The problem here is that the dentry->d_inode you're getting back looks wrong. It's not a directory as expected -- it's something else. So the client gives up and tosses back an ESTALE. Which clearly says that, In getacl() code while it does the stat() call to get the attrs, it forgets to populate the deviceid or fsid before going ahead and does getxattr(). FIX: 1. Fill the deviceid in iatt. 2. Do bit more clean up for the confusing part of the code. NB: Many many thanks to Niels de Vos and Jeff Layton for their help to debug the issue. Change-Id: I8d3c2a844c9d1761051a883b5ebaeb84062a11c8 BUG: 1043737 Signed-off-by: Santosh Kumar Pradhan <spradhan> Reviewed-on: http://review.gluster.org/6523 Reviewed-by: Rajesh Joseph <rjoseph> Reviewed-by: Niels de Vos <ndevos> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Vijay Bellur <vbellur>
REVIEW: http://review.gluster.org/6526 (gNFS: Client cache invalidation with bad fsid) posted (#1) for review on release-3.5 by Santosh Pradhan (spradhan)
REVIEW: http://review.gluster.org/6526 (gNFS: Client cache invalidation with bad fsid) posted (#2) for review on release-3.5 by Santosh Pradhan (spradhan)
COMMIT: http://review.gluster.org/6526 committed in release-3.5 by Vijay Bellur (vbellur) ------ commit 54201dd0495e52e0722898ab7fdad007d28d1676 Author: Santosh Kumar Pradhan <spradhan> Date: Tue Dec 17 17:32:14 2013 +0530 gNFS: Client cache invalidation with bad fsid 1. Problem: Couple of issues are seen when NFS-ACL is turned ON. i.e. i) NFS directory access is too slow, impacting customer workflows with ACL ii)dbench fails with 100 directories. 2. Root cause: Frequent cache invalidation in the client side when ACL is turned ON with NFS because NFS server getacl() code returns the wrong fsid to the client. 3. This attr-cache invlaidation triggers the frequent LOOKUP ops for each file instead of relying on the readdir or readdirp data. As a result performance gets impacted. 4. In case of dbench workload, the problem is more severe. e.g. Client side rpcdebug output: =========================== Dec 16 10:16:53 santosh-3 kernel: NFS: nfs_update_inode(0:1b/12061953567282551806 ct=2 info=0x7e7f) Dec 16 10:16:53 santosh-3 kernel: NFS: nfs_fhget(0:1b/12061953567282551806 ct=2) Dec 16 10:16:53 santosh-3 kernel: <-- nfs_xdev_get_sb() = -116 [splat] Dec 16 10:16:53 santosh-3 kernel: nfs_do_submount: done Dec 16 10:16:53 santosh-3 kernel: <-- nfs_do_submount() = ffffffffffffff8c Dec 16 10:16:53 santosh-3 kernel: <-- nfs_follow_mountpoint() = ffffffffffffff8c Dec 16 10:16:53 santosh-3 kernel: NFS: dentry_delete(clients/client77, 20008) As per Jeff Layton, This occurs when the client detects that the fsid on a filehandle is different from its parent. At that point, it tries to do a new submount of the new filesystem onto the correct point. It means client got a superblock reference for the new fs and is now looking to set up the root of the mount. It calls nfs_get_root to do that, which basically takes the superblock and a filehandle and returns a dentry. The problem here is that the dentry->d_inode you're getting back looks wrong. It's not a directory as expected -- it's something else. So the client gives up and tosses back an ESTALE. Which clearly says that, In getacl() code while it does the stat() call to get the attrs, it forgets to populate the deviceid or fsid before going ahead and does getxattr(). FIX: 1. Fill the deviceid in iatt. 2. Do bit more clean up for the confusing part of the code. NB: Many many thanks to Niels de Vos and Jeff Layton for their help to debug the issue. Change-Id: I44d8d2fa3ec7fb33a67dfdd4bbe2c45cdf67db8c BUG: 1043737 Signed-off-by: Santosh Kumar Pradhan <spradhan> Reviewed-on: http://review.gluster.org/6526 Reviewed-by: Niels de Vos <ndevos> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Vijay Bellur <vbellur>