1048111 – RHS: directory long listing performance over NFS very slow until remount

Bug 1048111 - RHS: directory long listing performance over NFS very slow until remount

Summary: RHS: directory long listing performance over NFS very slow until remount

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	santosh pradhan
QA Contact:	Sudhir D
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-01-03 08:14 UTC by Manoj Pillai
Modified:	2014-09-21 22:54 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-01-23 19:25:20 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Comparison of "ls -l" performance between gNFS and kernel NFS (18.68 KB, application/vnd.oasis.opendocument.spreadsheet) 2014-01-03 08:14 UTC, Manoj Pillai	no flags	Details
View All

Description Manoj Pillai 2014-01-03 08:14:42 UTC

Created attachment 844864 [details]
Comparison of "ls -l" performance between gNFS and kernel NFS

Description of problem:
After creating a large number of files in a gNFS mount point, time taken by "ls -l" on the mount point is extremely high. After the volume is unmounted and mounted again, performance of ls -l improves; in this case, NFS client seems to be able to benefit from caching.

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.52rhs-1.el6rhs

How reproducible:
Every time

Steps to Reproduce:
1. create and start replicated RHS volume
2. NFS mount on client with mount options "nfsvers=3,tcp"
3. create 50K files in the mount directory
4. time ls -l <gNFS-mnt-pt>
5. repeat step 4. multiple times
6. umount <gNFS-mnt-pt> on NFS client; mount again
7. time ls -l <gNFS-mnt-pt>
8. repeat multiple times

Actual results:

Before umount in step 6., "time ls -l" in step 4. and step 5. return numbers like below:

first instance:
real    3m34.550s
user    0m0.741s
sys     0m9.170s

2nd:
real    3m1.958s
user    0m0.603s
sys     0m7.851s

3rd:
real    3m1.194s
user    0m0.595s
sys     0m7.830s

After umount, "time ls -l" in step 7. and step 8. return numbers like below:

1st instance:
real    0m39.963s
user    0m0.596s
sys     0m2.378s

2nd:
real    0m22.741s
user    0m0.506s
sys     0m2.474s

later:
real    0m0.967s
user    0m0.269s
sys     0m0.697s


Expected results:

There should be no difference between performance of "ls -l" before and after umount. The performance that we are seeing after umount should be seen in both cases.


Additional info:
bz 965400 specifies glusterfs-3.4.0.52rhs-1.el6rhs as the build in which some acl-related defects were fixed. This problem was seen while trying to verify performance with build 52. Since it is not specifically related to acl, opening a new bz. 

The speadsheet attached also shows "ls -l" performance with the "noacl" client NFS mount option added. It also has results for the same client mounting an XFS filesystem exported by kernel NFS server. gNFS may not be able to match kNFS performance, but I expect to see the same trend in both cases.

Comment 2 Manoj Pillai 2014-01-03 11:16:39 UTC

Results in the spreadsheet are copied here for easy reference. Numbers are "real" time reported by the time command (is in seconds).

                                RHS   RHS-noacl kNFS  kNFS-noacl
SEQUENCE OF COMMANDS				
create 50K files in <mnt-pt>				
time ls -l <mnt-pt>             214.5   183.42  6.46    1.55
sleep 60; time ls -l <mnt-pt>	181.95	181.65	5.68	5.54
sleep 60; time ls -l <mnt-pt>	181.19	180.77	5.69	5.55
sleep 1; time ls -l <mnt-pt>	178.61	178.24	0.79	0.73
sleep 1; time ls -l <mnt-pt>	180.79	179.75	0.9	0.86
sleep 1; time ls -l <mnt-pt>	178.94	180.77	0.94	0.93
sleep 60; time ls -l <mnt-pt>	179.39	181.58	5.67	5.56
umount <mnt-pt>; mount again				
time ls -l <mnt-pt>             39.96	6.8	6.3	1.52
sleep 60; time ls -l <mnt-pt>	22.74	22.93	5.55	5.4
sleep 60; time ls -l <mnt-pt>	22.01	23.21	5.5	5.45
sleep 1; time ls -l <mnt-pt>	21.59	22.81	0.84	0.87
sleep 1; time ls -l <mnt-pt>	0.96	0.91	0.91	0.87
sleep 1; time ls -l <mnt-pt>	21.59	23.02	0.95	0.74
sleep 60; time ls -l <mnt-pt>	22.27	23.23	5.53	5.43

Some points to note:

* kNFS behavior for the sequence of commands before umount/mount and after umount/mount is the same. This is what we would expect for RHS/gNFS also. However, until the umount, performance with gNFS is terrible.

* acl/noacl: Time taken by first instance of ls -l after fresh mount is significantly more when acl is enabled compared to when it is not (39.96 v. 6.8). kNFS also shows penalty for acl first time after fresh mount (6.3 v. 1.52). Subsequent instances of "ls -l" do not pay a penalty in the acl case.

* caching effects: after fresh mount, "ls -l" times are either around 22 secs with gNFS or around 0.9 secs. This has to do with attribute caching timeouts in NFS client. We have verified that by setting actimeo=120 as a client NFS mount option, we can get most of these numbers below 1 sec. kNFS also shows similar behavior: in the case of kNFS, "ls -l" times are either around 5.5 sec or around 0.9 sec.

Comment 3 Manoj Pillai 2014-01-06 09:11:15 UTC

Here are the columns for gNFS and kNFS for the same test as above, but here time for each instance of "ls -l" is followed in the same column by the NFS calls that the client makes. This data was obtained on the NFS client by doing an "nfsstat -c" before and after each "ls -l" command. Some rows, that don't look very different, have not been annotated with the NFS calls information to avoid clutter.
  
                               RHS                           kNFS
                               Time                          Time
                               [NFS calls]                   [NFS calls]
SEQUENCE OF COMMANDS

create 50K files in <mnt-pt>

time ls -l <mnt-pt>             214.5                         6.46
                            [2 getattr, 50K lookup,      [1 getattr, 
                             98 access,                   1 access,
                             64642 readdirplus]           2441 readdirplus]

sleep 60; time ls -l <mnt-pt>   181.95                        5.68
                            [2 getattr, 50K lookup,      [50002 getattr, 
                             98 access,                   1 access]
                             64642 readdirplus]

sleep 60; time ls -l <mnt-pt>   181.19                        5.69
                            [2 getattr, 50K lookup,      [50002 getattr, 
                             98 access,                   1 access]
                             64642 readdirplus]

sleep 1; time ls -l <mnt-pt>    178.61                        0.79
                            [1 getattr, 50K lookup,       [1 getattr]
                             98 access,
                             64642 readdirplus]

sleep 1; time ls -l <mnt-pt>    180.79                        0.9
sleep 1; time ls -l <mnt-pt>    178.94                        0.94
sleep 60; time ls -l <mnt-pt>   179.39                        5.67

umount <mnt-pt>; mount again

time ls -l <mnt-pt>             39.96                         6.3
                            [1 getattr, 1 access,         [1 getattr,
                             1 access,                     1 access,
                             2451 readdirplus]             2441 readdirplus]

sleep 60; time ls -l <mnt-pt>   22.74                         5.55
                            [50002 getattr,               [50002 getattr,
                             1 access]                     1 access]

sleep 60; time ls -l <mnt-pt>   22.01                         5.5
sleep 1; time ls -l <mnt-pt>    21.59                         0.84

sleep 1; time ls -l <mnt-pt>    0.96                          0.91
                            [1 getattr]                      [1 getattr]

sleep 1; time ls -l <mnt-pt>    21.59                         0.95

sleep 60; time ls -l <mnt-pt>   22.27                         5.53
                            [50002 getattr,               [50002 getattr,
                             1 access]                       1 access]


Some points to note:

* after umount, calls made by the client for gNFS and kNFS are very similar. There seems to be a small difference in the the number of readdirplus calls in the first instance after fresh mount (2451 for gNFS v. 2441 for kNFS). But mostly, gNFS and kNFS are processing the same number of calls; kNFS is just faster.

* after the create phase, before umount, there is clearly some bug in the gNFS case that is causing the NFS client to send many additional requests including many lookups.

* in case of the sub-second times, when "ls -l" is issued in quick succession, almost everything is coming out of the client cache

* I'm not seeing any difference in the number of client NFS calls with the noacl mount option. So, the difference in times between acl and noacl (seen when "ls -l" is run after a fresh mount) should be coming from the auxilliary calls needed to implement acl.

Comment 4 santosh pradhan 2014-01-08 14:20:56 UTC

It sounds another cache inconsistency issue which is generating LOOKUP operations and takes longer. 

Thats why unmounting (drops the cache) and remounting gives better numbers. 

I guess there is another way to verify:

Mount the vol.
create the files.
drop the cache (eg. echo 3 > /proc/sys/vm/drop_caches)
ls -l <mount pt>

(drop cache followed by ls -l is exactly same as directory_crawl() in perf-test.sh).

Comment 5 santosh pradhan 2014-01-08 20:36:55 UTC

I could reproduce the issue with a simple test case:

Create a single brick volume and NFS mount it.
In mount pt, create a file e.g. testfile
ls -l <mnt pt>

This is what I get in client side rpcdebug:
==========================================
Jan  9 00:54:45 santosh-new-4 kernel: NFS: revalidating (0:1d/1)
Jan  9 00:54:45 santosh-new-4 kernel: NFS call  getattr
Jan  9 00:54:45 santosh-new-4 kernel: NFS reply getattr: 0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs_update_inode(0:1d/1 ct=2 info=0x7e7f)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs3_forget_cached_acls(0:1d/1)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: (0:1d/1) revalidation complete
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs3_get_cached_acl(0:1d/1, 32768) = fffffffffffffff5
Jan  9 00:54:45 santosh-new-4 kernel: NFS call getacl
Jan  9 00:54:45 santosh-new-4 kernel: NFS reply getacl: 0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs_update_inode(0:1d/1 ct=2 info=0x7e7f)
Jan  9 00:54:45 santosh-new-4 kernel: nfs3_cache_acls(0:1d/1, (null), (null))
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs3_get_cached_acl(0:1d/1, 16384) = (null)
Jan  9 00:54:45 santosh-new-4 kernel: NFS call  access
Jan  9 00:54:45 santosh-new-4 kernel: NFS reply access: 0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: permission(0:1d/1), mask=0x24, res=0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: open dir(/)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: revalidating (0:1d/1)
Jan  9 00:54:45 santosh-new-4 kernel: NFS call  getattr
Jan  9 00:54:45 santosh-new-4 kernel: NFS reply getattr: 0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs_update_inode(0:1d/1 ct=2 info=0x7e7f)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: (0:1d/1) revalidation complete
Jan  9 00:54:45 santosh-new-4 kernel: NFS: readdir(/) starting at cookie 0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: (0:1d/1) data cache invalidated


====>>> data cache gets invalidated HERE <<<=======
 
Jan  9 00:54:45 santosh-new-4 kernel: NFS call  readdirplus 0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs_update_inode(0:1d/1 ct=2 info=0x7e7f)
Jan  9 00:54:45 santosh-new-4 kernel: NFS reply readdirplus: 492
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs_update_inode(0:1d/-5753586402791623407 ct=1 info=0x7e7f)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: dentry_delete(/skdot, 8)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs_do_filldir() filling ended @ cookie 512; returning = 0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: readdir(/) returns 0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: permission(0:1d/1), mask=0x1, res=0
Jan  9 00:54:45 santosh-new-4 kernel: NFS call  lookup skdot
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs_update_inode(0:1d/1 ct=2 info=0x7e7f)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: mtime change on server for file 0:1d/1
Jan  9 00:54:45 santosh-new-4 kernel: NFS reply lookup: 0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs_update_inode(0:1d/-5753586402791623407 ct=1 info=0x7e7f)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs_lookup_revalidate(/skdot) is valid
Jan  9 00:54:45 santosh-new-4 kernel: NFS: dentry_delete(/skdot, 8)
Jan  9 00:54:45 santosh-new-4 kernel: NFS call  access
Jan  9 00:54:45 santosh-new-4 kernel: NFS reply access: 0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: permission(0:1d/1), mask=0x1, res=0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs_lookup_revalidate(/skdot) is valid
Jan  9 00:54:45 santosh-new-4 kernel: NFS: dentry_delete(/skdot, 8)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: permission(0:1d/1), mask=0x1, res=0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs_lookup_revalidate(/skdot) is valid
Jan  9 00:54:45 santosh-new-4 kernel: NFS: dentry_delete(/skdot, 8)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: permission(0:1d/1), mask=0x1, res=0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs_lookup_revalidate(/skdot) is valid
Jan  9 00:54:45 santosh-new-4 kernel: NFS: dentry_delete(/skdot, 8)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: permission(0:1d/1), mask=0x1, res=0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs_lookup_revalidate(/skdot) is valid
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs3_get_cached_acl(0:1d/-5753586402791623407, 32768) = (null)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: dentry_delete(/skdot, 8)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: permission(0:1d/1), mask=0x1, res=0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs_lookup_revalidate(/skdot) is valid
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs3_get_cached_acl(0:1d/-5753586402791623407, 16384) = (null)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: dentry_delete(/skdot, 8)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: readdir(/) starting at cookie 3
Jan  9 00:54:45 santosh-new-4 kernel: NFS: (0:1d/1) data cache invalidated
Jan  9 00:54:45 santosh-new-4 kernel: NFS call  readdirplus 0
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs_update_inode(0:1d/1 ct=2 info=0x7e7f)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: mtime change on server for file 0:1d/1


==========>>>> This seems to be the culprit <<<<========


Jan  9 00:54:45 santosh-new-4 kernel: NFS reply readdirplus: 492
Jan  9 00:54:45 santosh-new-4 kernel: NFS: nfs_update_inode(0:1d/-5753586402791623407 ct=1 info=0x7e7f)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: dentry_delete(/skdot, 8)
Jan  9 00:54:45 santosh-new-4 kernel: NFS: readdir(/) returns 0


Client just does a ls -l on the mount pt, STAT/GETXATTR/ACCESS calls wont change the mtime on mntpt. Seems to be a client issue. 

Jeff Layton pointed out a NFS client issue i.e. BZ 976879.

This does not need any code change in Gluster NFS server side. We can close this bug :)

Thanks,
Santosh

Comment 6 santosh pradhan 2014-01-08 20:38:46 UTC


Forgot to mention that  When I tested with Fedora 18 client box (3.11 kernel), the issue is not seen. The bug I mentioned in comment # 5, might have fixed in RHEL 6.5 client.

Comment 7 santosh pradhan 2014-01-09 03:07:50 UTC

>   
>                                RHS                           kNFS
>                                Time                          Time
>                                [NFS calls]                   [NFS calls]
> SEQUENCE OF COMMANDS
> 
> create 50K files in <mnt-pt>
> 
> time ls -l <mnt-pt>             214.5                         6.46
>                             [2 getattr, 50K lookup,      [1 getattr, 
>                              98 access,                   1 access,
>                              64642 readdirplus]           2441 readdirplus]
> 
> sleep 60; time ls -l <mnt-pt>   181.95                        5.68
>                             [2 getattr, 50K lookup,      [50002 getattr, 
>                              98 access,                   1 access]
>                              64642 readdirplus]
> 
> sleep 60; time ls -l <mnt-pt>   181.19                        5.69
>                             [2 getattr, 50K lookup,      [50002 getattr, 
>                              98 access,                   1 access]
>                              64642 readdirplus]



[Santosh] You correctly pointed out the no of readdirplus calls is the culprit. This is infact a NFS client readdir() performance issue which was invalidating the data cache frequently, generating more READDIRPLUS calls. Impacting the performance of ls -l.

> 
> sleep 1; time ls -l <mnt-pt>    178.61                        0.79
>                             [1 getattr, 50K lookup,       [1 getattr]
>                              98 access,
>                              64642 readdirplus]
> 
> sleep 1; time ls -l <mnt-pt>    180.79                        0.9
> sleep 1; time ls -l <mnt-pt>    178.94                        0.94
> sleep 60; time ls -l <mnt-pt>   179.39                        5.67
> 
> umount <mnt-pt>; mount again
> 
> time ls -l <mnt-pt>             39.96                         6.3
>                             [1 getattr, 1 access,         [1 getattr,
>                              1 access,                     1 access,
>                              2451 readdirplus]             2441 readdirplus]

After umount/mount, it shows the correct behavior.

Comment 8 santosh pradhan 2014-01-23 19:25:20 UTC

Lets close the bug as it does nt anymore work. Please feel free to reopenbif you dont agree with analysis.

Comment 9 Manoj Pillai 2014-06-09 11:32:47 UTC

clearing needinfo.

Note You need to log in before you can comment on or make changes to this bug.