Bug 1043737 - NFS directory access is too slow with ACL
Summary: NFS directory access is too slow with ACL
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: santosh pradhan
QA Contact:
URL:
Whiteboard:
Depends On: 965400
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-17 03:25 UTC by santosh pradhan
Modified: 2014-01-30 11:38 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 965400
Environment:
Last Closed: 2014-01-30 11:38:08 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description santosh pradhan 2013-12-17 03:25:08 UTC
+++ This bug was initially created as a clone of Bug #965400 +++

Description of problem:
Testing between 2.0 and 2.1 shows that the performance of reading large directories over NFS is very slow - especially when compared with the readdirplus improvements added to fuse/gluster in 2.1


Version-Release number of selected component (if applicable):
2.1 is the real issue, but the issue seems apparent in 2.0u4

How reproducible:
I've attached my findings that illustrate the issue, which includes a breakdown of the test environment

Steps to Reproduce:
1. create a large directory (>=75k files)
2. Mount the volume, and use ls -l (long listing)
3. Compare these times, with native access 

Actual results:
Glusterfs and SMB access is significantly quicker to return directory listings than NFS

Expected results:
Given the NFS protocol is within the translator stack, I would have expected NFS access to be at least comparable to SMB.Directory listings are fundamental to a lot of organisations work flows, so access times of 5 minutes are not really acceptable. 

Additional info:

--- Additional comment from RHEL Product and Program Management on 2013-05-21 04:13:04 EDT ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Vijay Bellur on 2013-05-21 06:58:15 EDT ---

Is there any performance difference if nfs client is mounted with -o noacl?

--- Additional comment from Paul Cuzner on 2013-05-22 10:32:21 EDT ---

(In reply to Vijay Bellur from comment #2)
> Is there any performance difference if nfs client is mounted with -o noacl?

Yes, for the first run (my tests scenario is initial run, followed by 3 successive runs to see if there is any caching effect)

mount with mount -t nfs -o proto=tcp,vers=3,noatime,nodiratime,noacl rhs21-1-1:/smallfiles /mnt/nfs
NFS test - time /bin/ls -l /mnt/nfs | wc -l

#1 0m33.336s
#2 1m36.452s
#3 1m6.302s 
#4 1m7.647s

NB. The timings for runs 2,3 and 4 are the same with or without noacl. I also tried to force rdirplus on the mount options 

drop cache, reboot client
mount with mount -t nfs -o vers=3,proto=tcp,noatime,nodiratime,rdirplus rhs21-1-1:/smallfiles /mnt/nfs

NFS test - time /bin/ls -l /mnt/nfs | wc -l

#1 2m41.269s
#2 1m34.010s
#3 1m8.909s
#4 1m6.816s

The same environment/test based on the SMB/glusterfs client yields
SMB - 43s, 23s, 24s, 24s
glusterfs - 36s, 18s, 18s, 19s

--- Additional comment from Ben England on 2013-05-23 15:24:33 EDT ---

in glusterfs 3.3.0.8 on RHEL6.4 I saw readdirplus actually running using wireshark Gluster plugin, I tested it using native client and smallfile benchmark, and with NFS mountpoint from same host (not a RHS server).

workload:

# time ls -l     /mnt/glnfs/bigdir2.d/file_srcdir/gprfc088/d00 > /tmp/foo
# time ls -l /mnt/glusterfs/bigdir2.d/file_srcdir/gprfc088/d00 > /tmp/foo

results for 30000-file directory:

NFS:    7 seconds for 30000 files > 4000 files/sec, far improved. 
native: 28 seconds for 30000 files > 1000 files/sec, same as RHS 2.0 basically

don't understand why I'm not getting performance boost for native client, but with NFS it was definitely working.  Perhaps the NFS translator is doing something more intelligent with it than native FUSE client.

NFS protocol trace (from NFS server end) at

http://perf1.perf.lab.eng.bos.redhat.com/bengland/public/rhs/bigbend/s045-nfs-ls-l.tcpdump

You see READDIRP calls coming from NFS server to brick servers, and readdirplus replies coming from NFS server to NFS client.

native client trace at:

http://perf1.perf.lab.eng.bos.redhat.com/bengland/public/rhs/bigbend/native-ls-l-try2.tcpdump

native client protocol trace is mostly LOOKUP calls in addition to the READDIRP calls.  I don't understand why it does LOOKUP.

--- Additional comment from Rajesh on 2013-05-24 05:47:49 EDT ---

This is apparently caused by the ACL code which was recently introduced.
Mounting with noacl option solves this issue, the times being:

/mnt/m1/test# time ls > /n

real	0m14.790s
user	0m0.200s
sys	0m0.872s

which is comparable to that of fuse and smb mounts.
Looking into what changes in ACL affects ls performance.

--- Additional comment from Rajesh on 2013-05-24 06:40:07 EDT ---

Interesting observations:
On a fresh mount with acl, this is the second run with caches dropped:
/mnt/m1/test# time ls | wc -l 
74999

real	0m17.669s
user	0m0.264s
sys	0m0.828s

~# gluster volume profile vol info nfs 
NFS Server : localhost
----------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      0.01     844.00 us     309.00 us    2420.00 us              5      ACCESS
      0.33   18334.67 us    1081.00 us   89234.00 us              6      LOOKUP
      0.77   28412.33 us     447.00 us  162240.00 us              9        STAT
      4.69  520457.33 us  137367.00 us 1255235.00 us              3     OPENDIR
     12.82     579.96 us     105.00 us   65877.00 us           7358       FSTAT
     81.38    3682.40 us     455.00 us  207464.00 us           7358    READDIRP
 
    Duration: 118 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 Interval 0 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      0.01     844.00 us     309.00 us    2420.00 us              5      ACCESS
      0.33   18334.67 us    1081.00 us   89234.00 us              6      LOOKUP
      0.77   28412.33 us     447.00 us  162240.00 us              9        STAT
      4.69  520457.33 us  137367.00 us 1255235.00 us              3     OPENDIR
     12.82     579.96 us     105.00 us   65877.00 us           7358       FSTAT
     81.38    3682.40 us     455.00 us  207464.00 us           7358    READDIRP
 
    Duration: 118 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
After the second run, without clearing the caches:

/mnt/m1/test# time ls | wc -l 
74999

real	1m50.138s
user	0m0.644s
sys	0m5.292s

~# gluster volume profile vol info nfs 
NFS Server : localhost
----------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              3  RELEASEDIR
      0.00     707.86 us     278.00 us    2420.00 us              7      ACCESS
      0.20   23381.09 us     447.00 us  162240.00 us             11        STAT
      1.23  520457.33 us  137367.00 us 1255235.00 us              3     OPENDIR
      3.36     579.96 us     105.00 us   65877.00 us           7358       FSTAT
     21.32    3682.40 us     455.00 us  207464.00 us           7358    READDIRP
     73.89    1252.25 us     208.00 us  273030.00 us          75005      LOOKUP
 
    Duration: 283 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 1 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00     367.50 us     278.00 us     457.00 us              2      ACCESS
      0.00     740.50 us     638.00 us     843.00 us              2        STAT
    100.00    1250.88 us     208.00 us  273030.00 us          74999      LOOKUP
 
    Duration: 165 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
We see 75k lookups which result in a time of 110 seconds.

--- Additional comment from Rajesh on 2013-05-24 06:47:09 EDT ---

Observations without acl: chronology of commands is preserved:

~# mount -t nfs -o vers=3,nolock,soft,intr,noacl localhost:/vol /mnt/m1
~# cd /mnt/m1
/mnt/m1# cd test
/mnt/m1/test# time ls | wc -l 
74999

real	0m13.040s
user	0m0.276s
sys	0m0.768s

~# gluster volume profile vol info nfs 
NFS Server : localhost
----------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              4  RELEASEDIR
      0.00     641.70 us     242.00 us    2420.00 us             10      ACCESS
      0.19   17306.07 us     353.00 us  162240.00 us             15        STAT
      1.13  390568.50 us     902.00 us 1255235.00 us              4     OPENDIR
      4.49     562.50 us     102.00 us   65877.00 us          11036       FSTAT
     26.20    3281.44 us     455.00 us  207464.00 us          11036    READDIRP
     67.98    1252.58 us     208.00 us  273030.00 us          75009      LOOKUP
 
    Duration: 722 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 2 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              1  RELEASEDIR
      0.01     902.00 us     902.00 us     902.00 us              1     OPENDIR
      0.01     487.33 us     242.00 us     730.00 us              3      ACCESS
      0.02     599.75 us     353.00 us    1016.00 us              4        STAT
      0.26    7328.75 us    1421.00 us   23480.00 us              4      LOOKUP
     17.49     527.57 us     102.00 us    2887.00 us           3678       FSTAT
     82.20    2479.31 us     595.00 us    4856.00 us           3678    READDIRP
 
    Duration: 439 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
/mnt/m1/test# time ls | wc -l 
74999

real	0m47.507s
user	0m0.596s
sys	0m5.384s

~# gluster volume profile vol info nfs 
NFS Server : localhost
----------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              4  RELEASEDIR
      0.00     631.45 us     242.00 us    2420.00 us             11      ACCESS
      0.91  390568.50 us     902.00 us 1255235.00 us              4     OPENDIR
      3.63     562.50 us     102.00 us   65877.00 us          11036       FSTAT
     19.32     440.36 us      96.00 us  162240.00 us          75017        STAT
     21.18    3281.44 us     455.00 us  207464.00 us          11036    READDIRP
     54.95    1252.58 us     208.00 us  273030.00 us          75009      LOOKUP
 
    Duration: 835 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 3 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00     529.00 us     529.00 us     529.00 us              1      ACCESS
    100.00     436.99 us      96.00 us   31673.00 us          75002        STAT
 
    Duration: 113 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

--- Additional comment from Rajesh on 2013-05-24 07:19:15 EDT ---

The same tests run with glusterfs version: rhs-2.0 latest HEAD:

/mnt/m1/test# time ls | wc -l 
75000

real	0m10.161s
user	0m0.256s
sys	0m0.644s
/mnt/m1/test# gluster volume profile vol info nfs
NFS Server : localhost
----------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.05    1530.67 us    1206.00 us    2127.00 us              3      LOOKUP
     18.28     421.51 us     106.00 us    3376.00 us           3678       FSTAT
     81.67    1883.42 us     496.00 us    4035.00 us           3678    READDIRP
 
    Duration: 33 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 0 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.05    1530.67 us    1206.00 us    2127.00 us              3      LOOKUP
     18.28     421.51 us     106.00 us    3376.00 us           3678       FSTAT
     81.67    1883.42 us     496.00 us    4035.00 us           3678    READDIRP
 
    Duration: 33 seconds
   Data Read: 0 bytes
Data Written: 0 bytes


Second run without dropping caches:
/mnt/m1/test# time ls | wc -l 
75000

real	0m48.514s
user	0m0.448s
sys	0m2.880s
/mnt/m1/test# gluster volume profile vol info nfs
NFS Server : localhost
----------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00     591.00 us     591.00 us     591.00 us              1      ACCESS
      0.00     630.00 us     630.00 us     630.00 us              1        STAT
      3.24     421.51 us     106.00 us    3376.00 us           3678       FSTAT
     14.47    1883.42 us     496.00 us    4035.00 us           3678    READDIRP
     82.29     525.43 us     196.00 us    6560.00 us          75005      LOOKUP
 
    Duration: 155 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 1 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00     591.00 us     591.00 us     591.00 us              1      ACCESS
      0.00     630.00 us     630.00 us     630.00 us              1        STAT
    100.00     525.39 us     196.00 us    6560.00 us          75002      LOOKUP
 
    Duration: 122 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

--- Additional comment from Rajesh on 2013-05-24 08:20:58 EDT ---

Single brick volume, rhs 2.1, noacl:

/mnt/m1# time ls | wc -l
75001

real	0m15.119s
user	0m0.228s
sys	0m0.804s
l/mnt/m1# gluster volume profile vol info nfs
NFS Server : localhost
----------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              1  RELEASEDIR
      0.12    7981.00 us     434.00 us   15528.00 us              2      ACCESS
      0.34    9089.00 us     600.00 us   27439.00 us              5      LOOKUP
      0.38   50874.00 us   50874.00 us   50874.00 us              1     OPENDIR
      1.98   52955.00 us     144.00 us  263421.00 us              5        STAT
     11.05     401.76 us      72.00 us   25597.00 us           3677       FSTAT
     86.13    3132.19 us     617.00 us  289829.00 us           3677    READDIRP
 
    Duration: 38 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 0 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              1  RELEASEDIR
      0.12    7981.00 us     434.00 us   15528.00 us              2      ACCESS
      0.34    9089.00 us     600.00 us   27439.00 us              5      LOOKUP
      0.38   50874.00 us   50874.00 us   50874.00 us              1     OPENDIR
      1.98   52955.00 us     144.00 us  263421.00 us              5        STAT
     11.05     401.76 us      72.00 us   25597.00 us           3677       FSTAT
     86.13    3132.19 us     617.00 us  289829.00 us           3677    READDIRP
 
    Duration: 38 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 

Second run: without dropping caches
 
/mnt/m1# time ls | wc -l
75001

real	1m16.110s
user	0m0.592s
sys	0m5.824s
/mnt/m1# gluster volume profile vol info nfs
NFS Server : localhost
----------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              1  RELEASEDIR
      0.02    4189.25 us     332.00 us   15528.00 us              4      ACCESS
      0.07   50874.00 us   50874.00 us   50874.00 us              1     OPENDIR
      0.37   29594.89 us     144.00 us  263421.00 us              9        STAT
      2.06     401.76 us      72.00 us   25597.00 us           3677       FSTAT
     16.03    3132.19 us     617.00 us  289829.00 us           3677    READDIRP
     81.45     780.05 us     116.00 us  110062.00 us          75006      LOOKUP
 
    Duration: 364 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 1 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00     397.50 us     332.00 us     463.00 us              2      ACCESS
      0.00     394.75 us     352.00 us     439.00 us              4        STAT
    100.00     779.50 us     116.00 us  110062.00 us          75001      LOOKUP
 
    Duration: 326 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

--- Additional comment from Ben England on 2013-05-29 10:34:10 EDT ---

cc'ing Steve Dickson, he would know better than I how to tune NFS client caching for this case.

re: comment 9 -- 15 seconds for 75000 files isn't bad., that's 5K files/sec!  If you got this result consistently we would declare victory and close the bug.  

The problem is that we don't get this consistently.  Why do the LOOKUP storms happen?  Could the problem be that NFS is timing out the cached metadata and so it's asking Gluster for the metadata again, and Gluster uses LOOKUP the 2nd time instead of READDIRP?  

If so, maybe you might be able to solve this by timing out the directory metadata before the file metadata -- this would force Gluster to do READDIRP again and avoid the need for 75000 LOOKUPs perhaps?  There are separate timeout parameters in the NFS client mount options for directories and for attribute caching, perhaps the directory cache timeout needs to be lower than the attribute cache timeout?  I would try acdirmax < acregmin.  

Also, why doesn't Gluster use the md-cache translator and hang onto the metadata that it got the first time so LOOKUP calls aren't needed?  

Also, is there a way for Gluster client to express in READDIRP to the brick server the thought "tell me the list of files in this directory with  metadata that changed since time T" where T is last time that client did READDIRP?   This would be a very short list and would avoid unnecessary LOOKUP calls.

--- Additional comment from Scott Haines on 2013-07-24 03:10:20 EDT ---

Per discussion w/ Vivek, works without ACL being on.  Not for Big Bend as ACL is off by default.  Targeting for Corbett.

--- Additional comment from Simon Green on 2013-07-28 19:18:42 EDT ---

User pcuzner's account has been closed

--- Additional comment from Scott Haines on 2013-09-23 19:18:56 EDT ---

Retargeting for 2.1.z U2 (Corbett) release.

--- Additional comment from  on 2013-12-12 04:22:10 EST ---



--- Additional comment from  on 2013-12-12 04:23:21 EST ---



--- Additional comment from  on 2013-12-12 04:27:29 EST ---

version: glusterfs 3.4.0.49rhs built on Dec 12 2013 00:06:18
Volume type: Dist-Rep 2x2
No of nodes for bricks:2
No of nodes for client:1
No of file: 75000

1. With ACL 
real        4m24.429s
user        0m2.520s
sys         0m9.591s


2. Without Flushing Cache and With ACL
real        1m29.440s
user        0m2.100s
sys        0m7.167s

3. Flushed the cache and ran it again
real        4m21.228s
user        0m2.369s
sys        0m8.966s

4. Ran again with cache
real        1m27.342s
user        0m2.198s
sys        0m7.477s


With NO-ACL
-----------

1. Fresh Mount:
---------------
real        1m26.744s
user        0m1.380s
sys        0m2.077s

2. With Cache (noacl):
----------------------
real        2m7.984s
user        0m1.953s
sys        0m7.645s

3. Flushed Cache (noacl):
-------------------------
real        1m25.701s
user        0m1.346s
sys        0m1.960s





Profiling:
-----------

Profie output attached

1.With Acl
real        4m15.650s
user        0m2.312s
sys         0m8.970s


Profile output attached
2.NOACL
real        1m6.859s
user        0m1.390s
sys         0m2.015s

--- Additional comment from santosh pradhan on 2013-12-14 08:27:49 EST ---


Finding by code-review and discussion with Amar:

Each getacl call imposes a lookup call which is same as every NFS call, followed by stat call (to be the attributes same as getattr call). And each getacl() call will stack winds twice to bricks for xattrs (once for user acl and other for default acl). Which brings in some performance penalty. That happens for each FILE. 

ACL performance issue is very generic. Thats why it's suggested to mount with noacl for better performance if ACL is not absolutely necessary.

--- Additional comment from Amar Tumballi on 2013-12-15 23:46:38 EST ---

to confirm the theory, one would need to run the tests with 'md-cache' translator on nfs-server process.

But having 'md-cache' in nfs graph is not totally recommended at the moment as we don't run our tests with it.

Comment 1 Anand Avati 2013-12-17 03:47:48 UTC
REVIEW: http://review.gluster.org/6523 (gNFS: Client cache invalidation with bad fsid) posted (#1) for review on master by Santosh Pradhan (spradhan)

Comment 2 Anand Avati 2013-12-17 03:58:17 UTC
REVIEW: http://review.gluster.org/6523 (gNFS: Client cache invalidation with bad fsid) posted (#2) for review on master by Santosh Pradhan (spradhan)

Comment 3 santosh pradhan 2013-12-17 07:20:20 UTC
Results with the FIX: (with 50k files)

1. 1st run:
======

[root@santosh-new-4 glnfs]# time ls |wc -l 
50000

real	0m5.634s
user	0m0.445s
sys	0m0.108s

[root@spradhan glusterfs]# gluster v profile nfsperf info nfs
NFS Server : localhost
----------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us          50001     RELEASE
      0.00       0.00 us       0.00 us       0.00 us             11  RELEASEDIR
      0.00     598.00 us     598.00 us     598.00 us              1      STATFS
      0.00     664.50 us     615.00 us     714.00 us              2      LOOKUP
      0.00     600.33 us     458.00 us    1147.00 us              6     OPENDIR
      0.00     519.67 us     237.00 us     666.00 us              9      ACCESS
      0.00     515.18 us     304.00 us     913.00 us             28        STAT
      0.53     496.30 us     125.00 us    1357.00 us           3210       FSTAT
     99.46   92418.16 us    1002.00 us 1728777.00 us           3210    READDIRP
 
    Duration: 3600 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 11 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              1  RELEASEDIR
      0.01     458.00 us     458.00 us     458.00 us              1     OPENDIR
      0.02     543.50 us     469.00 us     618.00 us              2        STAT
      5.28     450.75 us     125.00 us    1357.00 us            535       FSTAT
     94.68    8077.19 us    1478.00 us   18099.00 us            535    READDIRP
 
    Duration: 19 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 

2. 2nd run with attr cache:
============================

[root@santosh-new-4 glnfs]# time ls |wc -l 
50000

real	0m0.466s
user	0m0.429s
sys	0m0.036s


[root@spradhan glusterfs]# gluster v profile nfsperf info nfs
NFS Server : localhost
----------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us          50001     RELEASE
      0.00       0.00 us       0.00 us       0.00 us             11  RELEASEDIR
      0.00     598.00 us     598.00 us     598.00 us              1      STATFS
      0.00     664.50 us     615.00 us     714.00 us              2      LOOKUP
      0.00     600.33 us     458.00 us    1147.00 us              6     OPENDIR
      0.00     528.40 us     237.00 us     666.00 us             10      ACCESS
      0.01     516.90 us     304.00 us     913.00 us             30        STAT
      0.53     496.30 us     125.00 us    1357.00 us           3210       FSTAT
     99.46   92418.16 us    1002.00 us 1728777.00 us           3210    READDIRP
 
    Duration: 3612 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 12 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
     35.94     607.00 us     607.00 us     607.00 us              1      ACCESS
     64.06     541.00 us     523.00 us     559.00 us              2        STAT
 
    Duration: 12 seconds
   Data Read: 0 bytes
Data Written: 0 bytes


3. dropped the attr cache:
[root@santosh-new-4 glnfs]# echo 3 > /proc/sys/vm/drop_caches
[root@santosh-new-4 glnfs]# time ls |wc -l 
50000

real	0m5.643s
user	0m0.441s
sys	0m0.129s

[root@spradhan glusterfs]# gluster v profile nfsperf info nfs
NFS Server : localhost
----------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us          50001     RELEASE
      0.00       0.00 us       0.00 us       0.00 us             12  RELEASEDIR
      0.00     598.00 us     598.00 us     598.00 us              1      STATFS
      0.00     664.50 us     615.00 us     714.00 us              2      LOOKUP
      0.00     543.43 us     202.00 us    1147.00 us              7     OPENDIR
      0.00     497.18 us     185.00 us     666.00 us             11      ACCESS
      0.01     498.33 us     165.00 us     913.00 us             33        STAT
      0.61     489.29 us     125.00 us    1357.00 us           3745       FSTAT
     99.39   80366.99 us    1002.00 us 1728777.00 us           3745    READDIRP
 
    Duration: 3753 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 13 Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us              1  RELEASEDIR
      0.00     185.00 us     185.00 us     185.00 us              1      ACCESS
      0.00     202.00 us     202.00 us     202.00 us              1     OPENDIR
      0.02     312.67 us     165.00 us     573.00 us              3        STAT
      5.26     447.23 us     152.00 us     692.00 us            535       FSTAT
     94.72    8059.91 us    1348.00 us   18544.00 us            535    READDIRP
 
    Duration: 141 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

There is no LOOKUP storm anymore :)

Comment 4 Anand Avati 2013-12-17 11:24:31 UTC
COMMIT: http://review.gluster.org/6523 committed in master by Vijay Bellur (vbellur) 
------
commit 329e38d4ab5af1a675b4d5651eda983f8a924418
Author: Santosh Kumar Pradhan <spradhan>
Date:   Tue Dec 17 08:43:50 2013 +0530

    gNFS: Client cache invalidation with bad fsid
    
    1. Problem:
    Couple of issues are seen when NFS-ACL is turned ON. i.e.
    i) NFS directory access is too slow, impacting customer workflows
       with ACL
    ii)dbench fails with 100 directories.
    
    2. Root cause: Frequent cache invalidation in the client side when ACL
    is turned ON with NFS because NFS server getacl() code returns the
    wrong fsid to the client.
    
    3. This attr-cache invlaidation triggers the frequent LOOKUP ops for
    each file instead of relying on the readdir or readdirp data. As
    a result performance gets impacted.
    
    4. In case of dbench workload, the problem is more severe. e.g.
    
    Client side rpcdebug output:
    ===========================
    
    Dec 16 10:16:53 santosh-3 kernel: NFS:
             nfs_update_inode(0:1b/12061953567282551806 ct=2 info=0x7e7f)
    Dec 16 10:16:53 santosh-3 kernel: NFS:
             nfs_fhget(0:1b/12061953567282551806 ct=2)
    Dec 16 10:16:53 santosh-3 kernel: <-- nfs_xdev_get_sb() = -116 [splat]
    Dec 16 10:16:53 santosh-3 kernel: nfs_do_submount: done
    Dec 16 10:16:53 santosh-3 kernel: <-- nfs_do_submount() = ffffffffffffff8c
    Dec 16 10:16:53 santosh-3 kernel: <-- nfs_follow_mountpoint() = ffffffffffffff8c
    Dec 16 10:16:53 santosh-3 kernel: NFS: dentry_delete(clients/client77, 20008)
    
    As per Jeff Layton, This occurs when the client detects that the fsid on
    a filehandle is different from its parent. At that point, it tries to
    do a new submount of the new filesystem onto the correct point. It means
    client got a superblock reference for the new fs and is now looking to set
    up the root of the mount. It calls nfs_get_root to do that, which basically
    takes the superblock and a filehandle and returns a dentry.  The problem
    here is that the dentry->d_inode you're getting back looks wrong. It's not
    a directory as expected -- it's something else. So the client gives up and
    tosses back an ESTALE.
    
    Which clearly says that, In getacl() code while it does the stat() call
    to get the attrs, it forgets to populate the deviceid or fsid before
    going ahead and does getxattr().
    
    FIX:
    1. Fill the deviceid in iatt.
    2. Do bit more clean up for the confusing part of the code.
    
    NB: Many many thanks to Niels de Vos and Jeff Layton for their
    help to debug the issue.
    
    Change-Id: I8d3c2a844c9d1761051a883b5ebaeb84062a11c8
    BUG: 1043737
    Signed-off-by: Santosh Kumar Pradhan <spradhan>
    Reviewed-on: http://review.gluster.org/6523
    Reviewed-by: Rajesh Joseph <rjoseph>
    Reviewed-by: Niels de Vos <ndevos>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 5 Anand Avati 2013-12-17 12:03:42 UTC
REVIEW: http://review.gluster.org/6526 (gNFS: Client cache invalidation with bad fsid) posted (#1) for review on release-3.5 by Santosh Pradhan (spradhan)

Comment 6 Anand Avati 2013-12-17 12:05:25 UTC
REVIEW: http://review.gluster.org/6526 (gNFS: Client cache invalidation with bad fsid) posted (#2) for review on release-3.5 by Santosh Pradhan (spradhan)

Comment 7 Anand Avati 2013-12-17 18:26:24 UTC
COMMIT: http://review.gluster.org/6526 committed in release-3.5 by Vijay Bellur (vbellur) 
------
commit 54201dd0495e52e0722898ab7fdad007d28d1676
Author: Santosh Kumar Pradhan <spradhan>
Date:   Tue Dec 17 17:32:14 2013 +0530

    gNFS: Client cache invalidation with bad fsid
    
    1. Problem:
    Couple of issues are seen when NFS-ACL is turned ON. i.e.
    i) NFS directory access is too slow, impacting customer workflows
       with ACL
    ii)dbench fails with 100 directories.
    
    2. Root cause: Frequent cache invalidation in the client side when ACL
    is turned ON with NFS because NFS server getacl() code returns the
    wrong fsid to the client.
    
    3. This attr-cache invlaidation triggers the frequent LOOKUP ops for
    each file instead of relying on the readdir or readdirp data. As
    a result performance gets impacted.
    
    4. In case of dbench workload, the problem is more severe. e.g.
    
    Client side rpcdebug output:
    ===========================
    
    Dec 16 10:16:53 santosh-3 kernel: NFS:
             nfs_update_inode(0:1b/12061953567282551806 ct=2 info=0x7e7f)
    Dec 16 10:16:53 santosh-3 kernel: NFS:
             nfs_fhget(0:1b/12061953567282551806 ct=2)
    Dec 16 10:16:53 santosh-3 kernel: <-- nfs_xdev_get_sb() = -116 [splat]
    Dec 16 10:16:53 santosh-3 kernel: nfs_do_submount: done
    Dec 16 10:16:53 santosh-3 kernel: <-- nfs_do_submount() =
    ffffffffffffff8c
    Dec 16 10:16:53 santosh-3 kernel: <-- nfs_follow_mountpoint() =
    ffffffffffffff8c
    Dec 16 10:16:53 santosh-3 kernel: NFS: dentry_delete(clients/client77,
    20008)
    
    As per Jeff Layton, This occurs when the client detects that the fsid on
    a filehandle is different from its parent. At that point, it tries to
    do a new submount of the new filesystem onto the correct point. It means
    client got a superblock reference for the new fs and is now looking to
    set up the root of the mount. It calls nfs_get_root to do that, which
    basically takes the superblock and a filehandle and returns a dentry.
    The problem here is that the dentry->d_inode you're getting back looks
    wrong. It's not a directory as expected -- it's something else. So the
    client gives up and tosses back an ESTALE.
    
    Which clearly says that, In getacl() code while it does the stat() call
    to get the attrs, it forgets to populate the deviceid or fsid before
    going ahead and does getxattr().
    
    FIX:
    1. Fill the deviceid in iatt.
    2. Do bit more clean up for the confusing part of the code.
    
    NB: Many many thanks to Niels de Vos and Jeff Layton for their
    help to debug the issue.
    
    Change-Id: I44d8d2fa3ec7fb33a67dfdd4bbe2c45cdf67db8c
    BUG: 1043737
    Signed-off-by: Santosh Kumar Pradhan <spradhan>
    Reviewed-on: http://review.gluster.org/6526
    Reviewed-by: Niels de Vos <ndevos>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>


Note You need to log in before you can comment on or make changes to this bug.