Bug 1325766
| Summary: | RHEL6.7: NFSv3 client performance regression where ls -l takes too long with "aggressive readdirplus" commit | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Rinku <rkothiya> | |
| Component: | kernel | Assignee: | Scott Mayhew <smayhew> | |
| kernel sub component: | NFS | QA Contact: | Yongcheng Yang <yoyang> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | high | |||
| Priority: | high | CC: | bcodding, bfields, chunwang, dmoessne, dwysocha, eguan, fhirtz, fsorenso, kcleveng, mthacker, rick.beldin, sauchter, smayhew, swhiteho, yoyang | |
| Version: | 6.7 | Keywords: | Regression, Reproducer | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | kernel-2.6.32-682.el6 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1442068 (view as bug list) | Environment: | ||
| Last Closed: | 2017-03-21 12:43:33 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1269194, 1324930, 1374441, 1461138, 1507140 | |||
|
Comment 3
Dave Wysochanski
2016-04-11 11:09:21 UTC
(In reply to Dave Wysochanski from comment #3) > I reviewed the case and agree with this bug and this is an easily > reproducible problem. Commit a3d203226a4f2d90a3f00803180681570a2d3536 has > introduced a fairly significant performance regression if an NFS client > tries to list a directory that is being modified (files added and removed) > which is a very common scenario in customers environments. > > Next steps are to determine if this regression still exists in later kernels > (latest RHEL6.8, RHEL7, upstream) and if there's a patch we need backported > or if we need some new fixup patch even for upstream. What server were you using to reproduce? (In reply to J. Bruce Fields from comment #6) > (In reply to Dave Wysochanski from comment #3) > > I reviewed the case and agree with this bug and this is an easily > > reproducible problem. Commit a3d203226a4f2d90a3f00803180681570a2d3536 has > > introduced a fairly significant performance regression if an NFS client > > tries to list a directory that is being modified (files added and removed) > > which is a very common scenario in customers environments. > > > > Next steps are to determine if this regression still exists in later kernels > > (latest RHEL6.8, RHEL7, upstream) and if there's a patch we need backported > > or if we need some new fixup patch even for upstream. > > What server were you using to reproduce? I used RHEL7.0 - not sure if that matters. Customer originally reported on Aix 6.1 server but I was able to replicate the problem on a RHEL7.0 server. The tcpdump I got matched what we saw in the customer's environment, namely: 1) READDIR ops vastly outweighed the # of entries in the directory 2) the 'cookie' value now gets reset on the RHEL6.7 kernel if the directory is changing I'll attach my update from the case with the details. Reproduceable upstream following Dave's attachment above:
mount -overs=3 f21-1:/exports/xfs /mnt/
mkdir /mnt/TESTDIR
for ((i=0; i<50000; i++)); do touch /mnt/TESTDIR/file.$i; done
for i in $(seq 100000 200000);
do touch /mnt/TESTDIR/file-$i; sleep 1;
rm -f /mnt/TESTDIR/file-$i; done
Then time an "ls -l", capture with tcpdump and count READDIRPLUS's with 0 cookie in the argument using
tshark -ntad -r tmp.pcap -Y 'nfs.procedure_v3 == 17
&& rpc.msgtyp == 0' -T fields -e nfs.cookie3
With recent upstream (4.6-rc1 plus some unrelated patches in my case), I see five READDIRPLUS's with cookie 0. Then I reverted 311324ad1713666a6e803aecf0d4e1a136a5b34a "NFS: Be more aggressive in using readdirplus for 'ls -l' situations"--only two minor fixes were required to account for the introduction of file_inode() upstream. After that I only see one such READDIRPLUS. The "ls -l" also appears to be a little faster (about 3 seconds as opposed to about 4) in my setup (two VM's running on the same host), though the results aren't completely consistent. In each case I'm doing multiple "ls -l"'s and ignoring the first one (which is significantly slower).
I also tried mounting with nordirplus and acregmin=60 but found neither made a difference.
(In reply to J. Bruce Fields from comment #9) > I also tried mounting with nordirplus and acregmin=60 but found neither made > a difference. Which was unexpected given my hypothesis, which was that the 0 cookie results from a readdir following nfs_getattr()->nfs_request_parent_use_readdirplus()->nfs_force_use_readdirplus()->nfs_zap_mapping(), which on first glance looked to me like it shouldn't happen in the nordirplus case, or when the directory's children's attributes haven't expired. So, I need to look harder.... (In reply to J. Bruce Fields from comment #10) > Which was unexpected given my hypothesis, which was that the 0 cookie > results from a readdir following > nfs_getattr()->nfs_request_parent_use_readdirplus()- > >nfs_force_use_readdirplus()->nfs_zap_mapping(), which on first glance > looked to me like it shouldn't happen in the nordirplus case, or when the > directory's children's attributes haven't expired. So, I need to look > harder.... Well, but NFS_INO_INVALID_DATA is set in lots of other places. And in practice experiments with modifying the sleep time confirm that what we're getting (as expected, I guess) is that the readdir restarts with cookie 0 each time the directory's modified. So probably the important change was the extra NFS_INO_INVALID_DATA check in each nfs_readdir. Upstream post here: http://lkml.kernel.org/r/20160415142500.GB23176@fieldses.org but that didn't get any discussion, so I need to dig some more. Recommending for priority in 6.9 based on: - reproducer exists - regression from previous kernels - case still open I can try to help out with this if needed. (In reply to Dave Wysochanski from comment #14) > Recommending for priority in 6.9 based on: > - reproducer exists > - regression from previous kernels > - case still open > > I can try to help out with this if needed. Sure, thanks, I haven't gotten back to it since my last comment, and likely won't for at least another week. In case it was not clear, this only affects NFSv3 - the default NFS version in RHEL6 (NFSv4) is not affected. Also the 'nordirplus' mount option does not help as a workaround. Here's a few tshark lines that are helpful to check if this problem occurs. In all cases we should just get one line of output indicating only one READDIR or READDIRPLUS operation has been sent with 'cookie=0'. NFSv3 (-overs=3): checks READDIRPLUS operation cookies == 0 $ tshark -r /tmp/tcpdump.pcap -R 'nfs.procedure_v3 == 17 && rpc.msgtyp == 0 && nfs.cookie3 == 0' -T fields -e frame.number -e nfs.cookie3 Result: FAIL NFSv3 (-overs=3,nordirplus): checks READDIR operation cookies == 0 with $ tshark -r /tmp/tcpdump.pcap -R 'nfs.procedure_v3 == 16 && rpc.msgtyp == 0 && nfs.cookie3 == 0' -T fields -e frame.number -e nfs.cookie3 Result: FAIL NFSv4: checks READDIR operation cookies == 0 $ tshark -r /tmp/tcpdump.pcap -R 'nfs.opcode == 26 && rpc.msgtyp == 0 && nfs.cookie4 == 0' -T fields -e frame.number -e nfs.cookie4 Result: PASS This is still reproduceable upstream (testing on a 4.8-rc1 based kernel). 311324ad1713666a6e803aecf0d4e1a136a5b34a "NFS: Be more aggressive in using readdirplus for 'ls -l' situations" changed when nfs_readdir() decides to revalidate the directory's mapping, which contains all the entries. In addition to just checking if the attribute cache has expired, it includes a check to see if NFS_INO_INVALID_DATA is set on the directory. Whenever we do a CREATE, the response is going to send us through post_op_update_inode for the directory which will set NFS_INO_INVALID_DATA, and is going to have post-op directory attributes that update the mtime and ctime of the directory, which will have nfs_update_inode set NFS_INO_INVALID_DATA. The result is that the next pass through nfs_readdir() will cause us to start over reading the directory entries. It might be argued upstream that a readdir should ignore NFS_INO_INVALID_DATA, and instead go back to simply checking the attribute cache for expiration. I'll write to linux-nfs. Possibly any optimizations done here would need to also consider cases in bug 1248140. Scott Mayhew already fixed this problem once in RHEL6, see bug 976879. Upstream discussion: http://marc.info/?t=147196504500004&r=1&w=2 Sent a patch: http://marc.info/?l=linux-nfs&m=147205965132022&w=2 (In reply to Benjamin Coddington from comment #27) > Upstream discussion: > http://marc.info/?t=147196504500004&r=1&w=2 > > Sent a patch: > http://marc.info/?l=linux-nfs&m=147205965132022&w=2 Thanks for starting getting this going. Is this still moving forward in the background or has it petered out? I just got another ping about another customer potentially hitting either this issue or https://bugzilla.redhat.com/show_bug.cgi?id=1371611 I haven't worked on it in a couple weeks. The problems here and in bug 1248140 and bug 1371611 aren't trivially fixed.(In reply to Dave Wysochanski from comment #30) > (In reply to Benjamin Coddington from comment #27) > > Sent a patch: > > http://marc.info/?l=linux-nfs&m=147205965132022&w=2 That patch was NACKed.. there's a v2 that might make 4.9: http://marc.info/?l=linux-nfs&m=147222590709214&w=2 > Thanks for starting getting this going. Is this still moving forward in the > background or has it petered out? I haven't worked on it in a couple weeks. The problems here and in bug 1248140 and bug 1371611 aren't trivially fixed. I have been tossing ideas around about how to optimize all these cases, but so far I haven't found an ideal solution. If that v2 goes, we can at least help out the customers in this bug. Patch(es) committed on kernel repository and kernel is undergoing testing Patch(es) available on kernel-2.6.32-682.el6 According to Comment #42 and Comment #54, the performance has been improved. Moving to VERIFIED as SanityOnly. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0817.html *** Bug 1371611 has been marked as a duplicate of this bug. *** |