Bug 1325766

Summary:	RHEL6.7: NFSv3 client performance regression where ls -l takes too long with "aggressive readdirplus" commit
Product:	Red Hat Enterprise Linux 6	Reporter:	Rinku <rkothiya>
Component:	kernel	Assignee:	Scott Mayhew <smayhew>
kernel sub component:	NFS	QA Contact:	Yongcheng Yang <yoyang>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	bcodding, bfields, chunwang, dmoessne, dwysocha, eguan, fhirtz, fsorenso, kcleveng, mthacker, rick.beldin, sauchter, smayhew, swhiteho, yoyang
Version:	6.7	Keywords:	Regression, Reproducer
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	kernel-2.6.32-682.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1442068 (view as bug list)		Environment:
Last Closed:	2017-03-21 12:43:33 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1269194, 1324930, 1374441, 1461138, 1507140

Comment 3 Dave Wysochanski 2016-04-11 11:09:21 UTC

I reviewed the case and agree with this bug and this is an easily reproducible problem.  Commit a3d203226a4f2d90a3f00803180681570a2d3536 has introduced a fairly significant performance regression if an NFS client tries to list a directory that is being modified (files added and removed) which is a very common scenario in customers environments.

Next steps are to determine if this regression still exists in later kernels (latest RHEL6.8, RHEL7, upstream) and if there's a patch we need backported or if we need some new fixup patch even for upstream.

Comment 6 J. Bruce Fields 2016-04-11 15:50:01 UTC

(In reply to Dave Wysochanski from comment #3)
> I reviewed the case and agree with this bug and this is an easily
> reproducible problem.  Commit a3d203226a4f2d90a3f00803180681570a2d3536 has
> introduced a fairly significant performance regression if an NFS client
> tries to list a directory that is being modified (files added and removed)
> which is a very common scenario in customers environments.
> 
> Next steps are to determine if this regression still exists in later kernels
> (latest RHEL6.8, RHEL7, upstream) and if there's a patch we need backported
> or if we need some new fixup patch even for upstream.

What server were you using to reproduce?

Comment 7 Dave Wysochanski 2016-04-11 15:54:15 UTC

(In reply to J. Bruce Fields from comment #6)
> (In reply to Dave Wysochanski from comment #3)
> > I reviewed the case and agree with this bug and this is an easily
> > reproducible problem.  Commit a3d203226a4f2d90a3f00803180681570a2d3536 has
> > introduced a fairly significant performance regression if an NFS client
> > tries to list a directory that is being modified (files added and removed)
> > which is a very common scenario in customers environments.
> > 
> > Next steps are to determine if this regression still exists in later kernels
> > (latest RHEL6.8, RHEL7, upstream) and if there's a patch we need backported
> > or if we need some new fixup patch even for upstream.
> 
> What server were you using to reproduce?

I used RHEL7.0 - not sure if that matters.  Customer originally reported on Aix 6.1 server but I was able to replicate the problem on a RHEL7.0 server.  The tcpdump I got matched what we saw in the customer's environment, namely:
1) READDIR ops vastly outweighed the # of entries in the directory
2) the 'cookie' value now gets reset on the RHEL6.7 kernel if the directory is changing

I'll attach my update from the case with the details.

Comment 9 J. Bruce Fields 2016-04-14 18:48:07 UTC

Reproduceable upstream following Dave's attachment above:

        mount -overs=3 f21-1:/exports/xfs /mnt/
        mkdir /mnt/TESTDIR
        for ((i=0; i<50000; i++)); do touch /mnt/TESTDIR/file.$i; done
        for i in $(seq 100000 200000);
                do touch /mnt/TESTDIR/file-$i; sleep 1;
                rm -f /mnt/TESTDIR/file-$i; done

Then time an "ls -l", capture with tcpdump and count READDIRPLUS's with 0 cookie in the argument using

        tshark -ntad -r tmp.pcap -Y 'nfs.procedure_v3 == 17
                && rpc.msgtyp == 0' -T fields -e nfs.cookie3

With recent upstream (4.6-rc1 plus some unrelated patches in my case), I see five READDIRPLUS's with cookie 0.  Then I reverted 311324ad1713666a6e803aecf0d4e1a136a5b34a "NFS: Be more aggressive in using readdirplus for 'ls -l' situations"--only two minor fixes were required to account for the introduction of file_inode() upstream. After that I only see one such READDIRPLUS.  The "ls -l" also appears to be a little faster (about 3 seconds as opposed to about 4) in my setup (two VM's running on the same host), though the results aren't completely consistent.  In each case I'm doing multiple "ls -l"'s and ignoring the first one (which is significantly slower).

I also tried mounting with nordirplus and acregmin=60 but found neither made a difference.

Comment 10 J. Bruce Fields 2016-04-14 19:17:06 UTC

(In reply to J. Bruce Fields from comment #9)
> I also tried mounting with nordirplus and acregmin=60 but found neither made
> a difference.

Which was unexpected given my hypothesis, which was that the 0 cookie results from a readdir following nfs_getattr()->nfs_request_parent_use_readdirplus()->nfs_force_use_readdirplus()->nfs_zap_mapping(), which on first glance looked to me like it shouldn't happen in the nordirplus case, or when the directory's children's attributes haven't expired.  So, I need to look harder....

Comment 11 J. Bruce Fields 2016-04-14 22:01:51 UTC

(In reply to J. Bruce Fields from comment #10)
> Which was unexpected given my hypothesis, which was that the 0 cookie
> results from a readdir following
> nfs_getattr()->nfs_request_parent_use_readdirplus()-
> >nfs_force_use_readdirplus()->nfs_zap_mapping(), which on first glance
> looked to me like it shouldn't happen in the nordirplus case, or when the
> directory's children's attributes haven't expired.  So, I need to look
> harder....

Well, but NFS_INO_INVALID_DATA is set in lots of other places.  And in practice experiments with modifying the sleep time confirm that what we're getting (as expected, I guess) is that the readdir restarts with cookie 0 each time the directory's modified.

So probably the important change was the extra NFS_INO_INVALID_DATA check in each nfs_readdir.

Comment 12 J. Bruce Fields 2016-05-03 20:33:13 UTC

Upstream post here: http://lkml.kernel.org/r/20160415142500.GB23176@fieldses.org
but that didn't get any discussion, so I need to dig some more.

Comment 14 Dave Wysochanski 2016-06-20 16:10:53 UTC

Recommending for priority in 6.9 based on:
- reproducer exists
- regression from previous kernels
- case still open

I can try to help out with this if needed.

Comment 15 J. Bruce Fields 2016-06-20 18:49:46 UTC

(In reply to Dave Wysochanski from comment #14)
> Recommending for priority in 6.9 based on:
> - reproducer exists
> - regression from previous kernels
> - case still open
> 
> I can try to help out with this if needed.

Sure, thanks, I haven't gotten back to it since my last comment, and likely won't for at least another week.

Comment 16 Dave Wysochanski 2016-07-01 21:28:51 UTC

In case it was not clear, this only affects NFSv3 - the default NFS version in RHEL6 (NFSv4) is not affected.  Also the 'nordirplus' mount option does not help as a workaround.

Here's a few tshark lines that are helpful to check if this problem occurs.  In all cases we should just get one line of output indicating only one READDIR or READDIRPLUS operation has been sent with 'cookie=0'.

NFSv3 (-overs=3): checks READDIRPLUS operation cookies == 0
$ tshark -r /tmp/tcpdump.pcap -R 'nfs.procedure_v3 == 17 && rpc.msgtyp == 0 && nfs.cookie3 == 0' -T fields -e frame.number -e nfs.cookie3
Result: FAIL

NFSv3 (-overs=3,nordirplus): checks READDIR operation cookies == 0 with 
$ tshark -r /tmp/tcpdump.pcap -R 'nfs.procedure_v3 == 16 && rpc.msgtyp == 0 && nfs.cookie3 == 0' -T fields -e frame.number -e nfs.cookie3
Result: FAIL

NFSv4: checks READDIR operation cookies == 0
$ tshark -r /tmp/tcpdump.pcap -R 'nfs.opcode == 26 && rpc.msgtyp == 0 && nfs.cookie4 == 0' -T fields -e frame.number -e nfs.cookie4
Result: PASS

Comment 22 J. Bruce Fields 2016-08-17 20:44:52 UTC

This is still reproduceable upstream (testing on a 4.8-rc1 based kernel).

Comment 24 Benjamin Coddington 2016-08-23 14:55:16 UTC

311324ad1713666a6e803aecf0d4e1a136a5b34a "NFS: Be more aggressive in using readdirplus for 'ls -l' situations" changed when nfs_readdir() decides to revalidate the directory's mapping, which contains all the entries.  In addition to just checking if the attribute cache has expired, it includes a check to see if NFS_INO_INVALID_DATA is set on the directory.

Whenever we do a CREATE, the response is going to send us through post_op_update_inode for the directory which will set NFS_INO_INVALID_DATA, and is going to have post-op directory attributes that update the mtime and ctime of the directory, which will have nfs_update_inode set NFS_INO_INVALID_DATA.

The result is that the next pass through nfs_readdir() will cause us to start over reading the directory entries.

It might be argued upstream that a readdir should ignore NFS_INO_INVALID_DATA, and instead go back to simply checking the attribute cache for expiration.  I'll write to linux-nfs.

Possibly any optimizations done here would need to also consider cases in bug 1248140.

Comment 26 Benjamin Coddington 2016-08-24 12:46:22 UTC

Scott Mayhew already fixed this problem once in RHEL6, see bug 976879.

Comment 27 Benjamin Coddington 2016-08-24 17:33:32 UTC

Upstream discussion:
http://marc.info/?t=147196504500004&r=1&w=2

Sent a patch:
http://marc.info/?l=linux-nfs&m=147205965132022&w=2

Comment 30 Dave Wysochanski 2016-09-26 17:23:13 UTC

(In reply to Benjamin Coddington from comment #27)
> Upstream discussion:
> http://marc.info/?t=147196504500004&r=1&w=2
> 
> Sent a patch:
> http://marc.info/?l=linux-nfs&m=147205965132022&w=2

Thanks for starting getting this going.  Is this still moving forward in the background or has it petered out?

I just got another ping about another customer potentially hitting either this issue or https://bugzilla.redhat.com/show_bug.cgi?id=1371611

Comment 31 Benjamin Coddington 2016-09-26 19:57:02 UTC

I haven't worked on it in a couple weeks.  The problems here and in bug 1248140 and bug 1371611 aren't trivially fixed.(In reply to Dave Wysochanski from comment #30)
> (In reply to Benjamin Coddington from comment #27)
> > Sent a patch:
> > http://marc.info/?l=linux-nfs&m=147205965132022&w=2

That patch was NACKed.. there's a v2 that might make 4.9:
http://marc.info/?l=linux-nfs&m=147222590709214&w=2

> Thanks for starting getting this going.  Is this still moving forward in the
> background or has it petered out?

I haven't worked on it in a couple weeks.  The problems here and in bug 1248140 and bug 1371611 aren't trivially fixed.  I have been tossing ideas around about how to optimize all these cases, but so far I haven't found an ideal solution.

If that v2 goes, we can at least help out the customers in this bug.

Comment 48 Phillip Lougher 2017-01-10 06:23:40 UTC

Patch(es) committed on kernel repository and kernel is undergoing testing

Comment 50 Phillip Lougher 2017-01-11 16:13:26 UTC

Patch(es) available on kernel-2.6.32-682.el6

Comment 58 Yongcheng Yang 2017-01-23 09:41:42 UTC

According to Comment #42 and Comment #54, the performance has been improved.

Moving to VERIFIED as SanityOnly.

Comment 65 errata-xmlrpc 2017-03-21 12:43:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0817.html

Comment 68 Dave Wysochanski 2017-10-30 11:25:30 UTC

*** Bug 1371611 has been marked as a duplicate of this bug. ***