RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1325766 - RHEL6.7: NFSv3 client performance regression where ls -l takes too long with "aggressive readdirplus" commit
Summary: RHEL6.7: NFSv3 client performance regression where ls -l takes too long with ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.7
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Scott Mayhew
QA Contact: Yongcheng Yang
URL:
Whiteboard:
: 1371611 (view as bug list)
Depends On:
Blocks: 1269194 1324930 1374441 1461138 1507140
TreeView+ depends on / blocked
 
Reported: 2016-04-11 07:06 UTC by Rinku
Modified: 2022-03-13 14:01 UTC (History)
15 users (show)

Fixed In Version: kernel-2.6.32-682.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1442068 (view as bug list)
Environment:
Last Closed: 2017-03-21 12:43:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1043771 1 None None None 2021-08-30 13:37:09 UTC
Red Hat Bugzilla 1248140 0 high CLOSED RHEL6: NFS files inside larger directories return d_type = 0 = DT_UNKNOWN for getdents calls, leading to poor performanc... 2021-03-11 14:22:07 UTC
Red Hat Bugzilla 1371611 0 medium CLOSED Multiple processes listing a directory over nfs concurrently takes a very long time 2022-03-13 14:05:55 UTC
Red Hat Bugzilla 1413485 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Knowledge Base (Solution) 2249321 0 None None None 2016-08-30 16:49:43 UTC
Red Hat Knowledge Base (Solution) 2592711 0 None None None 2017-10-30 11:25:29 UTC
Red Hat Product Errata RHSA-2017:0817 0 normal SHIPPED_LIVE Moderate: kernel security, bug fix, and enhancement update 2017-03-21 13:06:51 UTC

Internal Links: 1043771 1248140 1371611 1413485

Comment 3 Dave Wysochanski 2016-04-11 11:09:21 UTC
I reviewed the case and agree with this bug and this is an easily reproducible problem.  Commit a3d203226a4f2d90a3f00803180681570a2d3536 has introduced a fairly significant performance regression if an NFS client tries to list a directory that is being modified (files added and removed) which is a very common scenario in customers environments.

Next steps are to determine if this regression still exists in later kernels (latest RHEL6.8, RHEL7, upstream) and if there's a patch we need backported or if we need some new fixup patch even for upstream.

Comment 6 J. Bruce Fields 2016-04-11 15:50:01 UTC
(In reply to Dave Wysochanski from comment #3)
> I reviewed the case and agree with this bug and this is an easily
> reproducible problem.  Commit a3d203226a4f2d90a3f00803180681570a2d3536 has
> introduced a fairly significant performance regression if an NFS client
> tries to list a directory that is being modified (files added and removed)
> which is a very common scenario in customers environments.
> 
> Next steps are to determine if this regression still exists in later kernels
> (latest RHEL6.8, RHEL7, upstream) and if there's a patch we need backported
> or if we need some new fixup patch even for upstream.

What server were you using to reproduce?

Comment 7 Dave Wysochanski 2016-04-11 15:54:15 UTC
(In reply to J. Bruce Fields from comment #6)
> (In reply to Dave Wysochanski from comment #3)
> > I reviewed the case and agree with this bug and this is an easily
> > reproducible problem.  Commit a3d203226a4f2d90a3f00803180681570a2d3536 has
> > introduced a fairly significant performance regression if an NFS client
> > tries to list a directory that is being modified (files added and removed)
> > which is a very common scenario in customers environments.
> > 
> > Next steps are to determine if this regression still exists in later kernels
> > (latest RHEL6.8, RHEL7, upstream) and if there's a patch we need backported
> > or if we need some new fixup patch even for upstream.
> 
> What server were you using to reproduce?

I used RHEL7.0 - not sure if that matters.  Customer originally reported on Aix 6.1 server but I was able to replicate the problem on a RHEL7.0 server.  The tcpdump I got matched what we saw in the customer's environment, namely:
1) READDIR ops vastly outweighed the # of entries in the directory
2) the 'cookie' value now gets reset on the RHEL6.7 kernel if the directory is changing

I'll attach my update from the case with the details.

Comment 9 J. Bruce Fields 2016-04-14 18:48:07 UTC
Reproduceable upstream following Dave's attachment above:

        mount -overs=3 f21-1:/exports/xfs /mnt/
        mkdir /mnt/TESTDIR
        for ((i=0; i<50000; i++)); do touch /mnt/TESTDIR/file.$i; done
        for i in $(seq 100000 200000);
                do touch /mnt/TESTDIR/file-$i; sleep 1;
                rm -f /mnt/TESTDIR/file-$i; done

Then time an "ls -l", capture with tcpdump and count READDIRPLUS's with 0 cookie in the argument using

        tshark -ntad -r tmp.pcap -Y 'nfs.procedure_v3 == 17
                && rpc.msgtyp == 0' -T fields -e nfs.cookie3

With recent upstream (4.6-rc1 plus some unrelated patches in my case), I see five READDIRPLUS's with cookie 0.  Then I reverted 311324ad1713666a6e803aecf0d4e1a136a5b34a "NFS: Be more aggressive in using readdirplus for 'ls -l' situations"--only two minor fixes were required to account for the introduction of file_inode() upstream. After that I only see one such READDIRPLUS.  The "ls -l" also appears to be a little faster (about 3 seconds as opposed to about 4) in my setup (two VM's running on the same host), though the results aren't completely consistent.  In each case I'm doing multiple "ls -l"'s and ignoring the first one (which is significantly slower).

I also tried mounting with nordirplus and acregmin=60 but found neither made a difference.

Comment 10 J. Bruce Fields 2016-04-14 19:17:06 UTC
(In reply to J. Bruce Fields from comment #9)
> I also tried mounting with nordirplus and acregmin=60 but found neither made
> a difference.

Which was unexpected given my hypothesis, which was that the 0 cookie results from a readdir following nfs_getattr()->nfs_request_parent_use_readdirplus()->nfs_force_use_readdirplus()->nfs_zap_mapping(), which on first glance looked to me like it shouldn't happen in the nordirplus case, or when the directory's children's attributes haven't expired.  So, I need to look harder....

Comment 11 J. Bruce Fields 2016-04-14 22:01:51 UTC
(In reply to J. Bruce Fields from comment #10)
> Which was unexpected given my hypothesis, which was that the 0 cookie
> results from a readdir following
> nfs_getattr()->nfs_request_parent_use_readdirplus()-
> >nfs_force_use_readdirplus()->nfs_zap_mapping(), which on first glance
> looked to me like it shouldn't happen in the nordirplus case, or when the
> directory's children's attributes haven't expired.  So, I need to look
> harder....

Well, but NFS_INO_INVALID_DATA is set in lots of other places.  And in practice experiments with modifying the sleep time confirm that what we're getting (as expected, I guess) is that the readdir restarts with cookie 0 each time the directory's modified.

So probably the important change was the extra NFS_INO_INVALID_DATA check in each nfs_readdir.

Comment 12 J. Bruce Fields 2016-05-03 20:33:13 UTC
Upstream post here: http://lkml.kernel.org/r/20160415142500.GB23176@fieldses.org
but that didn't get any discussion, so I need to dig some more.

Comment 14 Dave Wysochanski 2016-06-20 16:10:53 UTC
Recommending for priority in 6.9 based on:
- reproducer exists
- regression from previous kernels
- case still open

I can try to help out with this if needed.

Comment 15 J. Bruce Fields 2016-06-20 18:49:46 UTC
(In reply to Dave Wysochanski from comment #14)
> Recommending for priority in 6.9 based on:
> - reproducer exists
> - regression from previous kernels
> - case still open
> 
> I can try to help out with this if needed.

Sure, thanks, I haven't gotten back to it since my last comment, and likely won't for at least another week.

Comment 16 Dave Wysochanski 2016-07-01 21:28:51 UTC
In case it was not clear, this only affects NFSv3 - the default NFS version in RHEL6 (NFSv4) is not affected.  Also the 'nordirplus' mount option does not help as a workaround.

Here's a few tshark lines that are helpful to check if this problem occurs.  In all cases we should just get one line of output indicating only one READDIR or READDIRPLUS operation has been sent with 'cookie=0'.

NFSv3 (-overs=3): checks READDIRPLUS operation cookies == 0
$ tshark -r /tmp/tcpdump.pcap -R 'nfs.procedure_v3 == 17 && rpc.msgtyp == 0 && nfs.cookie3 == 0' -T fields -e frame.number -e nfs.cookie3
Result: FAIL

NFSv3 (-overs=3,nordirplus): checks READDIR operation cookies == 0 with 
$ tshark -r /tmp/tcpdump.pcap -R 'nfs.procedure_v3 == 16 && rpc.msgtyp == 0 && nfs.cookie3 == 0' -T fields -e frame.number -e nfs.cookie3
Result: FAIL

NFSv4: checks READDIR operation cookies == 0
$ tshark -r /tmp/tcpdump.pcap -R 'nfs.opcode == 26 && rpc.msgtyp == 0 && nfs.cookie4 == 0' -T fields -e frame.number -e nfs.cookie4
Result: PASS

Comment 22 J. Bruce Fields 2016-08-17 20:44:52 UTC
This is still reproduceable upstream (testing on a 4.8-rc1 based kernel).

Comment 24 Benjamin Coddington 2016-08-23 14:55:16 UTC
311324ad1713666a6e803aecf0d4e1a136a5b34a "NFS: Be more aggressive in using readdirplus for 'ls -l' situations" changed when nfs_readdir() decides to revalidate the directory's mapping, which contains all the entries.  In addition to just checking if the attribute cache has expired, it includes a check to see if NFS_INO_INVALID_DATA is set on the directory.

Whenever we do a CREATE, the response is going to send us through post_op_update_inode for the directory which will set NFS_INO_INVALID_DATA, and is going to have post-op directory attributes that update the mtime and ctime of the directory, which will have nfs_update_inode set NFS_INO_INVALID_DATA.

The result is that the next pass through nfs_readdir() will cause us to start over reading the directory entries.

It might be argued upstream that a readdir should ignore NFS_INO_INVALID_DATA, and instead go back to simply checking the attribute cache for expiration.  I'll write to linux-nfs.

Possibly any optimizations done here would need to also consider cases in bug 1248140.

Comment 26 Benjamin Coddington 2016-08-24 12:46:22 UTC
Scott Mayhew already fixed this problem once in RHEL6, see bug 976879.

Comment 27 Benjamin Coddington 2016-08-24 17:33:32 UTC
Upstream discussion:
http://marc.info/?t=147196504500004&r=1&w=2

Sent a patch:
http://marc.info/?l=linux-nfs&m=147205965132022&w=2

Comment 30 Dave Wysochanski 2016-09-26 17:23:13 UTC
(In reply to Benjamin Coddington from comment #27)
> Upstream discussion:
> http://marc.info/?t=147196504500004&r=1&w=2
> 
> Sent a patch:
> http://marc.info/?l=linux-nfs&m=147205965132022&w=2

Thanks for starting getting this going.  Is this still moving forward in the background or has it petered out?

I just got another ping about another customer potentially hitting either this issue or https://bugzilla.redhat.com/show_bug.cgi?id=1371611

Comment 31 Benjamin Coddington 2016-09-26 19:57:02 UTC
I haven't worked on it in a couple weeks.  The problems here and in bug 1248140 and bug 1371611 aren't trivially fixed.(In reply to Dave Wysochanski from comment #30)
> (In reply to Benjamin Coddington from comment #27)
> > Sent a patch:
> > http://marc.info/?l=linux-nfs&m=147205965132022&w=2

That patch was NACKed.. there's a v2 that might make 4.9:
http://marc.info/?l=linux-nfs&m=147222590709214&w=2

> Thanks for starting getting this going.  Is this still moving forward in the
> background or has it petered out?

I haven't worked on it in a couple weeks.  The problems here and in bug 1248140 and bug 1371611 aren't trivially fixed.  I have been tossing ideas around about how to optimize all these cases, but so far I haven't found an ideal solution.

If that v2 goes, we can at least help out the customers in this bug.

Comment 48 Phillip Lougher 2017-01-10 06:23:40 UTC
Patch(es) committed on kernel repository and kernel is undergoing testing

Comment 50 Phillip Lougher 2017-01-11 16:13:26 UTC
Patch(es) available on kernel-2.6.32-682.el6

Comment 58 Yongcheng Yang 2017-01-23 09:41:42 UTC
According to Comment #42 and Comment #54, the performance has been improved.

Moving to VERIFIED as SanityOnly.

Comment 65 errata-xmlrpc 2017-03-21 12:43:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0817.html

Comment 68 Dave Wysochanski 2017-10-30 11:25:30 UTC
*** Bug 1371611 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.