Bug 883285
| Summary: | find command is too slow with "-type f" option due to excessive newfstatat system calls | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Keigo Noha <knoha> | |
| Component: | findutils | Assignee: | Kamil Dudka <kdudka> | |
| Status: | CLOSED ERRATA | QA Contact: | Branislav Náter <bnater> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 6.2 | CC: | bnater, cww, deekej, dkutalek, dwysocha, fholec, kdudka, smayhew, thozza | |
| Target Milestone: | rc | Keywords: | Patch, Reopened, TestCaseProvided | |
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | findutils-4.4.2-7.el6 | Doc Type: | Bug Fix | |
| Doc Text: |
Cause:
The find utility did not use the file type information provided per directory entry by the kernel interface for listing directories.
Consequence:
Additional stat() system calls needed to be used to obtain the file type, which caused significant slowdown when traversing large NFS directories.
Fix:
An upstream patch has been applied on find to use the file type information provided by kernel.
Result:
Find uses fewer stat() system calls and traverses large NFS directories much faster.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1252549 (view as bug list) | Environment: | ||
| Last Closed: | 2016-01-14 17:24:36 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1172231, 1269889 | |||
| Attachments: | ||||
*** Bug 883287 has been marked as a duplicate of this bug. *** This is the cost of using gnulib's FTS module for traversing the file system. You can use the 'oldfind' command instead of 'find' to get back the RHEL-5 behavior (and performance). Created attachment 657584 [details] backport of the related upstream fixes upstream commits: http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=3270695f http://git.savannah.gnu.org/cgit/findutils.git/commit/?id=acb82fe4 http://git.savannah.gnu.org/cgit/findutils.git/commit/?id=e3bcac43 http://git.savannah.gnu.org/cgit/findutils.git/commit/?id=0b1acd33 http://git.savannah.gnu.org/cgit/findutils.git/commit/?id=b445af98 Dear Dudka Thank you for your response. I tried the patch which you provided in this case's attachment. The patch seems to work as what I expect. I took the starce, and newfstatat() function didn't call in the new find command. Best regards, Keigo NOHA This Bugzilla has been reviewed by Red Hat and is not planned on being addressed in Red Hat Enterprise Linux 6, and will be closed. If this bug is critical to production systems, please contact your Red Hat support representative and provide sufficient business justification. Can we reconsider this for a RHEL6 update? We have a patch and at least one confirmation. The technical details are not clear to met (yet) but it seems like a fairly egregious regression from RHEL5, though it does have a workaround of the 'oldfind' command. The 'find' command is used quite often to do various system adminstrative tasks. It's unclear to me why there's not more people caring about this bug. However, in my view quite possible this has gone unnoticed for a very long time, since people's expectation of 'find' performance may be very low and they may not be doing a direct "rhel5" to "rhel6" comparison. Provided the patches which fix this are not too intrusive and don't have a large risk of other regressions, it seems a good idea to reconsider fixing the default 'find' command. After self-review of the patch, I figured out that we need to backport one more patch from gnulib upstream: http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=d4b129b8 Note that the following upstream patch was intentionally skipped: http://git.savannah.gnu.org/cgit/findutils.git/commit/?id=214320ca ... because it introduced a significant change in behavior and triggered new bugs as a side effect. Created attachment 1056733 [details]
test case for this bug
Comment on attachment 1056733 [details] test case for this bug Example output: FAIL # ./t0_bz883285.sh Testing https://bugzilla.redhat.com/show_bug.cgi?id=883285 strace -c -o /tmp/test-883285-strace.txt find / -xdev -type f Found 129588 fstatat calls in /tmp/test-883285-strace.txt Found 100303 files Full strace file: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 78.20 0.024021 0 129588 newfstatat 16.36 0.005026 0 24213 getdents 1.96 0.000602 0 24222 6 open 1.86 0.000570 0 24218 close 0.75 0.000229 0 24209 fchdir 0.51 0.000157 0 12112 fstat 0.33 0.000101 0 1471 write 0.04 0.000013 0 49 brk 0.00 0.000000 0 8 read 0.00 0.000000 0 1 stat 0.00 0.000000 0 24 mmap 0.00 0.000000 0 13 mprotect 0.00 0.000000 0 3 munmap 0.00 0.000000 0 2 rt_sigaction 0.00 0.000000 0 1 rt_sigprocmask 0.00 0.000000 0 2 1 ioctl 0.00 0.000000 0 1 1 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 1 uname 0.00 0.000000 0 1 fcntl 0.00 0.000000 0 1 gettimeofday 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 2 statfs 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 2 1 futex 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.030719 240149 9 total TEST FAIL: on findutils-4.4.2-6.el6.x86_64 - excessive newfstatat system calls found Example output: PASS # ./t0_bz883285.sh Testing https://bugzilla.redhat.com/show_bug.cgi?id=883285 strace -c -o /tmp/test-883285-strace.txt find / -xdev -type f Found 24216 fstatat calls in /tmp/test-883285-strace.txt Found 100306 files Full strace file: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 54.12 0.007604 0 24216 newfstatat 33.48 0.004703 0 24215 getdents 4.53 0.000637 0 24224 6 open 4.03 0.000566 0 24220 close 2.76 0.000388 0 24211 fchdir 1.07 0.000151 0 12113 fstat 0.00 0.000000 0 8 read 0.00 0.000000 0 1471 write 0.00 0.000000 0 1 stat 0.00 0.000000 0 24 mmap 0.00 0.000000 0 13 mprotect 0.00 0.000000 0 3 munmap 0.00 0.000000 0 49 brk 0.00 0.000000 0 2 rt_sigaction 0.00 0.000000 0 1 rt_sigprocmask 0.00 0.000000 0 2 1 ioctl 0.00 0.000000 0 1 1 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 1 uname 0.00 0.000000 0 1 fcntl 0.00 0.000000 0 1 gettimeofday 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 2 statfs 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 2 1 futex 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.014049 134786 9 total TEST PASS: on findutils-4.4.2-6.2.test.bz883285.el6_7.x86_64 - did not find excessive newfstatat system calls Created attachment 1056956 [details]
[PATCH] Resolves: #883285 - do not stat() file if only its type is needed and already available
The updated version of the patch is attached.
Created attachment 1057208 [details]
latest test case which takes as input directory and creates files
Comment on attachment 1057208 [details] latest test case which takes as input directory and creates files This testcase fails on NFS though passes on local filesystem (ext4). The number of files matters as well, but as of yet I don't know why. I set FILE_COUNT=5000 since that seems to fail 100% of the time for me. A smaller number (500) passes the test. [root@rhel6u6-node1 01478764]# ./t1_bz883285.sh /mnt/nfs3/test Setting FIND_DIR to /mnt/nfs3/test Testing https://bugzilla.redhat.com/show_bug.cgi?id=883285 Creating files in /mnt/nfs3/test/883285 Filesystem 1K-blocks Used Available Use% Mounted on 192.168.122.37:/exports/nfs3 8910848 3300736 5610112 38% /mnt/nfs3 strace -c -o /tmp/test-883285-strace.txt find /mnt/nfs3/test -xdev -type f Found 5000 files Found 3884 fstatat calls in /tmp/test-883285-strace.txt Found 8 getdents calls in /tmp/test-883285-strace.txt Full strace file: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 86.04 0.024651 6 3884 newfstatat 13.96 0.003999 500 8 getdents 0.00 0.000000 0 8 read 0.00 0.000000 0 39 write 0.00 0.000000 0 20 6 open 0.00 0.000000 0 16 close 0.00 0.000000 0 1 stat 0.00 0.000000 0 11 fstat 0.00 0.000000 0 24 mmap 0.00 0.000000 0 13 mprotect 0.00 0.000000 0 3 munmap 0.00 0.000000 0 14 brk 0.00 0.000000 0 2 rt_sigaction 0.00 0.000000 0 1 rt_sigprocmask 0.00 0.000000 0 2 1 ioctl 0.00 0.000000 0 1 1 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 1 uname 0.00 0.000000 0 1 fcntl 0.00 0.000000 0 7 fchdir 0.00 0.000000 0 1 gettimeofday 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 2 statfs 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 2 1 futex 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.028650 4066 9 total TEST FAIL: on findutils-4.4.2-6.2.test.bz883285.el6_7.x86_64 - excessive newfstatat system calls found [root@rhel6u6-node1 01478764]# ./t1_bz883285.sh /tmp/test Setting FIND_DIR to /tmp/test Testing https://bugzilla.redhat.com/show_bug.cgi?id=883285 Creating files in /tmp/test/883285 Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup-lv_root 8649736 6977900 1225784 86% / strace -c -o /tmp/test-883285-strace.txt find /tmp/test -xdev -type f Found 5000 files Found 4 fstatat calls in /tmp/test-883285-strace.txt Found 8 getdents calls in /tmp/test-883285-strace.txt Full strace file: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000024 2 16 close 0.00 0.000000 0 8 read 0.00 0.000000 0 33 write 0.00 0.000000 0 20 6 open 0.00 0.000000 0 1 stat 0.00 0.000000 0 11 fstat 0.00 0.000000 0 24 mmap 0.00 0.000000 0 13 mprotect 0.00 0.000000 0 3 munmap 0.00 0.000000 0 14 brk 0.00 0.000000 0 2 rt_sigaction 0.00 0.000000 0 1 rt_sigprocmask 0.00 0.000000 0 2 1 ioctl 0.00 0.000000 0 1 1 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 1 uname 0.00 0.000000 0 1 fcntl 0.00 0.000000 0 8 getdents 0.00 0.000000 0 7 fchdir 0.00 0.000000 0 1 gettimeofday 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 2 statfs 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 2 1 futex 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 4 newfstatat 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000024 180 9 total TEST PASS: on findutils-4.4.2-6.2.test.bz883285.el6_7.x86_64 - did not find excessive newfstatat system calls Thank you for creating the test-case, Dave! I observed comparable results in my environment, too. The cause is pretty obvious IMO. readdir(3) returns d_type in the resulting 'struct dirent' object. If the field is set, fts and consequently find(1) use it to optimize out the call to stat () if only type of the file is required. However, the field is optional and each file system implementation may handle it differently. It is even correct for a file system implementation to always set the field to DT_UNKNOWN: http://marc.info/?l=netbsd-tech-kern&m=106753193617121 In this particular case, I guess, the NFS client implementation in kernel runs out of cache after a number of files and starts to return DT_UNKNOWN for the rest of them. Consequently find() has to call stat() to get that information, which causes additional round trips over the network and that is the delays. I am not sure whether the cache in kernel space is anyhow configurable. Created attachment 1057315 [details]
latest test case which takes 2 inputs: 1) directory, 2) number of files
(In reply to Kamil Dudka from comment #29) > Thank you for creating the test-case, Dave! I observed comparable results > in my environment, too. The cause is pretty obvious IMO. readdir(3) > returns d_type in the resulting 'struct dirent' object. If the field is > set, fts and consequently find(1) use it to optimize out the call to stat () > if only type of the file is required. > > However, the field is optional and each file system implementation may > handle it differently. It is even correct for a file system implementation > to always set the field to DT_UNKNOWN: > > http://marc.info/?l=netbsd-tech-kern&m=106753193617121 > > In this particular case, I guess, the NFS client implementation in kernel > runs out of cache after a number of files and starts to return DT_UNKNOWN > for the rest of them. Consequently find() has to call stat() to get that > information, which causes additional round trips over the network and that > is the delays. I am not sure whether the cache in kernel space is anyhow > configurable. Thanks for the info! I find this somewhat odd but I'll have to look and see if this is a bug in NFS or not. Keep in mind RHEL5 does not have this problem - the exact same test case passes on RHEL5. I changed the test case so it takes a number of files as input and uses the exact command the customer is using so it should be identical now. I'll have to see why 'getdents' does not return the correct information for NFS, even if the files are already there it should have the correct 'type' to return in getdents. To me this makes no sense since clearly we've done a lookup and have a dentry in the kernel so there's no obvious reason to me that the type should be unknown. I came in this morning having this bad feeling on top of these find issues it may be a kernel bug in the NFS client but we'll have to see. Wait a minute - my latest test case creates the files and at the end removes the. This does not change the outcome of the test. So how would getdents not know the type of the directory entry? This makes no sense. To be honest, I do not know how the NFS protocol works. It might not scale for directories with too many entries in them. What surprises me is that running 'oldfind' results in completely different (probably better?) strace statistics on the same kernel. So it most likely is a change in findutils what causes the difference. The issue does not seem to be fixed in Fedora yet either, so I am afraid that this could be some limitation of the 'fts' gnulib module. But it can be also just findutils using 'fts' in a suboptimal way. (In reply to Kamil Dudka from comment #29) > Thank you for creating the test-case, Dave! I observed comparable results > in my environment, too. The cause is pretty obvious IMO. readdir(3) > returns d_type in the resulting 'struct dirent' object. If the field is > set, fts and consequently find(1) use it to optimize out the call to stat () > if only type of the file is required. > > However, the field is optional and each file system implementation may > handle it differently. It is even correct for a file system implementation > to always set the field to DT_UNKNOWN: > > http://marc.info/?l=netbsd-tech-kern&m=106753193617121 > Did you actually observe this by running the test? > In this particular case, I guess, the NFS client implementation in kernel > runs out of cache after a number of files and starts to return DT_UNKNOWN > for the rest of them. Consequently find() has to call stat() to get that > information, which causes additional round trips over the network and that > is the delays. I am not sure whether the cache in kernel space is anyhow > configurable. If it's the NFS client kernel implementation, then why does 'oldfind' work just fine, regardless of the number of files? This must be another bug in find / or some library in userspace. (In reply to Kamil Dudka from comment #33) > To be honest, I do not know how the NFS protocol works. It might not scale > for directories with too many entries in them. What surprises me is that > running 'oldfind' results in completely different (probably better?) strace > statistics on the same kernel. So it most likely is a change in findutils > what causes the difference. The issue does not seem to be fixed in Fedora > yet either, so I am afraid that this could be some limitation of the 'fts' > gnulib module. But it can be also just findutils using 'fts' in a > suboptimal way. Yes I agree the data so far to me points at something in findutils or some library causing the problem. However I'm just getting up to speed. For example I did not realize even that 'getdents' does not return the 'type', so it's unclear to me how the 'readdir' library call even knows this at all! Also there is a 'getdents64' system call, but so far I've been unable to probe this via systemtap. (In reply to Dave Wysochanski from comment #34) > If it's the NFS client kernel implementation, then why does 'oldfind' work > just fine, regardless of the number of files? I have been tracing oldfind with gdb and it looks like oldfind treats DT_UNKNOWN as "not a directory", which is not a safe assumption IMO (file systems in general can be implemented to use DT_UNKNOWN for everything): http://git.savannah.gnu.org/cgit/findutils.git/tree/find/find.c?id=FINDUTILS_4_4_2-1#n1164 Not sure whether we are able to find a counterexample for NFS though... It seems to be related to the following (safe?) optimization: http://git.savannah.gnu.org/cgit/findutils.git/tree/find/find.c?id=FINDUTILS_4_4_2-1#n1434 (In reply to Kamil Dudka from comment #35) > (In reply to Dave Wysochanski from comment #34) > > If it's the NFS client kernel implementation, then why does 'oldfind' work > > just fine, regardless of the number of files? > > I have been tracing oldfind with gdb and it looks like oldfind treats > DT_UNKNOWN as "not a directory", which is not a safe assumption IMO (file > systems in general can be implemented to use DT_UNKNOWN for everything): > > http://git.savannah.gnu.org/cgit/findutils.git/tree/find/find. > c?id=FINDUTILS_4_4_2-1#n1164 > > Not sure whether we are able to find a counterexample for NFS though... Ok yes I am able to probe the kernel now in the 'filldir' and clearly see that files above 1019 return d_type = 0 with NFS. I'm still tracing the NFS code in the kernel to see if there's anything which can be done for NFS. Your explanation of 'oldfind' making an assumption about DT_UNKNOWN sounds plausible for sure. This would explain why RHEL5 does not have this problem either. # nohup stap -v -e 'probe kernel.function("filldir") { printf("%s name = %s d_type = %d\n", pp(), text_strn(kernel_string($name),$namlen,0), $d_type) }' & # cat nohup.out ... kernel.function("filldir@fs/readdir.c:149") name = file-1015.pdf^A d_type = 8 kernel.function("filldir@fs/readdir.c:149") name = file-1016.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:149") name = file-1017.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:149") name = file-1018.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:149") name = file-1018.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:149") name = file-1019.pdfÿÿÿ d_type = 0 kernel.function("filldir@fs/readdir.c:149") name = file-1020.pdf d_type = 0 kernel.function("filldir@fs/readdir.c:149") name = file-1021.pdf d_type = 0 kernel.function("filldir@fs/readdir.c:149") name = file-1022.pdf^A d_type = 0 kernel.function("filldir@fs/readdir.c:149") name = file-1023.pdfA d_type = 0 kernel.function("filldir@fs/readdir.c:149") name = file-1024.pdf d_type = 0 (In reply to Dave Wysochanski from comment #37) > (In reply to Kamil Dudka from comment #35) > > (In reply to Dave Wysochanski from comment #34) > > > If it's the NFS client kernel implementation, then why does 'oldfind' work > > > just fine, regardless of the number of files? > > > > I have been tracing oldfind with gdb and it looks like oldfind treats > > DT_UNKNOWN as "not a directory", which is not a safe assumption IMO (file > > systems in general can be implemented to use DT_UNKNOWN for everything): > > > > http://git.savannah.gnu.org/cgit/findutils.git/tree/find/find. > > c?id=FINDUTILS_4_4_2-1#n1164 > > > > Not sure whether we are able to find a counterexample for NFS though... > > Ok yes I am able to probe the kernel now in the 'filldir' and clearly see > that files above 1019 return d_type = 0 with NFS. I'm still tracing the NFS > code in the kernel to see if there's anything which can be done for NFS. > > Your explanation of 'oldfind' making an assumption about DT_UNKNOWN sounds > plausible for sure. This would explain why RHEL5 does not have this problem > either. > Correction. It looks like we do have a RHEL6 NFS kernel bug. On RHEL5, all files return d_type = 8 which is why RHEL5 does not have the problem. [root@rhel5u10-node1 01478764]# uname -a Linux rhel5u10-node1.dwysocha.net 2.6.18-371.11.1.el5 #1 SMP Mon Jun 30 04:51:39 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux kernel.function("filldir@fs/readdir.c:146") name = file-1016.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-1017.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-1018.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-1018.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-1019.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-1020.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-1021.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-1022.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-1023.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-1024.pdf d_type = 8 ... kernel.function("filldir@fs/readdir.c:146") name = file-4991.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-4992.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-4993.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-4994.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-4995.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-4996.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-4997.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-4998.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-4999.pdf d_type = 8 kernel.function("filldir@fs/readdir.c:146") name = file-5000.pdf d_type = 8 (In reply to Kamil Dudka from comment #36) > It seems to be related to the following (safe?) optimization: > > http://git.savannah.gnu.org/cgit/findutils.git/tree/find/find. > c?id=FINDUTILS_4_4_2-1#n1434 I learned myself that the above optimization is named "leaf optimization" and it is documented in find(1) man page. It can be disabled by the -noleaf option (and it indeed switches oldfind to the same "poor performance" mode). The problem is that the optimization is not implemented in RHEL-6 findutils. It is implemented in RHEL-7 and Fedora findutils: http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=97d5b665 ... however it is not enabled for NFS anyway: https://lists.gnu.org/archive/html/bug-gnulib/2009-01/msg00024.html I believe that *this* bug, as originally reported, is addressed by the proposed patch. With the patch applied, both find and oldfind now provide similar performance for the "-type f" predicate. The performance is still poor for name-only search on NFS with huge directories on RHEL-6+ but I would treat this as a separate issue. I propose to clone this bug and get the other problem fixed in upstream and Fedora first. Comment on attachment 1056956 [details]
[PATCH] Resolves: #883285 - do not stat() file if only its type is needed and already available
I tested the behavior of the "-type f" option myself and the patched version of find has significantly decreased number of newfstatat() calls.
However, I'm not familiar with the source code of the 'findutils' itself, but I double-checked the content of the commits inside the patch and theirs correspondence to the commits' messages, and everything looks good.
(In reply to Kamil Dudka from comment #40) > (In reply to Kamil Dudka from comment #36) > > It seems to be related to the following (safe?) optimization: > > > > http://git.savannah.gnu.org/cgit/findutils.git/tree/find/find. > > c?id=FINDUTILS_4_4_2-1#n1434 > > I learned myself that the above optimization is named "leaf optimization" > and it is documented in find(1) man page. It can be disabled by the -noleaf > option (and it indeed switches oldfind to the same "poor performance" mode). > The problem is that the optimization is not implemented in RHEL-6 findutils. > It is implemented in RHEL-7 and Fedora findutils: > > http://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=97d5b665 > > ... however it is not enabled for NFS anyway: > > https://lists.gnu.org/archive/html/bug-gnulib/2009-01/msg00024.html > > I believe that *this* bug, as originally reported, is addressed by the > proposed patch. With the patch applied, both find and oldfind now provide > similar performance for the "-type f" predicate. > > The performance is still poor for name-only search on NFS with huge > directories on RHEL-6+ but I would treat this as a separate issue. I > propose to clone this bug and get the other problem fixed in upstream and > Fedora first. I'm not sure what you're saying here. From what I can tell the bad 'find' perf is a RHEL6 only problem. What do you mean by "the other problem"? As far as I can tell there's no upstream / Fedora problem with NFS, per https://bugzilla.redhat.com/show_bug.cgi?id=1248140#c5 Do you see something different with upstream / Fedora? There are two issues I would like to evaluate separately: 1) RHEL-6 find currently does not use the d_type field returned by readdir() at all. This is the bug that was originally described in comment #0, is already fixed in RHEL-7 and Fedora, and can be fixed in RHEL-6 using attachment #1056956 [details]. 2) RHEL-6 find (unlike oldfind or RHEL-5 find) does not use the above mentioned leaf optimization, which is able to skip stat() calls in additional cases. The optimization relies the link count previously returned for the directory being traversed and, if all its subdirectories were already traversed, all other files are assumed to be non-directories. The leaf optimization is useless if the '-type f' predicate needs to be evaluated (as in comment #0) because this requires stat() to be called anyway. Your test case uses name-only search where the leaf optimization applies. The problem is that issue 2) is not yet addressed in upstream findutils, at least not for NFS. That is why I would like to treat it as a separate bug. (In reply to Kamil Dudka from comment #43) > There are two issues I would like to evaluate separately: > > 1) RHEL-6 find currently does not use the d_type field returned by readdir() > at all. This is the bug that was originally described in comment #0, is > already fixed in RHEL-7 and Fedora, and can be fixed in RHEL-6 using > attachment #1056956 [details]. > > 2) RHEL-6 find (unlike oldfind or RHEL-5 find) does not use the above > mentioned leaf optimization, which is able to skip stat() calls in > additional cases. The optimization relies the link count previously > returned for the directory being traversed and, if all its subdirectories > were already traversed, all other files are assumed to be non-directories. > > The leaf optimization is useless if the '-type f' predicate needs to be > evaluated (as in comment #0) because this requires stat() to be called > anyway. Your test case uses name-only search where the leaf optimization > applies. The problem is that issue 2) is not yet addressed in upstream > findutils, at least not for NFS. That is why I would like to treat it as a > separate bug. Ah ok, thanks for the clarification. Have you opened the second bug yet? So there's actually 3 bugs here, 2 in 'find' and one in the RHEL6 kernel nfs behavior (1248140), so the whole 'find performance' can be quite confusing and easily misdiagnosed. I still am not sure what, if anything can be done for the RHEL6 kernel behavior change in bug 1248140. (In reply to Dave Wysochanski from comment #44) > Have you opened the second bug yet? Not yet. I propose to open the second bug against RHEL-7 because the optimization, as implemented, requires the FTS_CWDFD mode to be enabled: http://git.savannah.gnu.org/cgit/findutils.git/commit/?id=214320ca ... and enabling would basically require a rebase of findutils, which is unlikely to be approved for RHEL-6 so late in its life time. > So there's actually 3 bugs here, 2 in 'find' and one in the RHEL6 kernel nfs > behavior (1248140), so the whole 'find performance' can be quite confusing > and easily misdiagnosed. I still am not sure what, if anything can be done > for the RHEL6 kernel behavior change in bug 1248140. I am afraid that the kernel change was intentional as the commit message says: NFS: Adapt readdirplus to application usage patterns While the use of READDIRPLUS is significantly more efficient than READDIR followed by many LOOKUP calls, it is still less efficient than just READDIR if the attributes are not required. This patch tracks when lookups are attempted on the directory, and uses that information to selectively disable READDIRPLUS on that directory. The first 'readdir' call is always served using READDIRPLUS. Subsequent calls only use READDIRPLUS if there was a successful lookup or revalidation on a child in the mean time. Credit for the original idea should go to Neil Brown. See: http://www.spinics.net/lists/linux-nfs/msg19996.html However, the implementation in this patch differs from Neil's in that it focuses on tracking lookups rather than calls to stat(). (In reply to Kamil Dudka from comment #45) > (In reply to Dave Wysochanski from comment #44) > > Have you opened the second bug yet? > > Not yet. I propose to open the second bug against RHEL-7 because the > optimization, as implemented, requires the FTS_CWDFD mode to be enabled: > > http://git.savannah.gnu.org/cgit/findutils.git/commit/?id=214320ca > > ... and enabling would basically require a rebase of findutils, which is > unlikely to be approved for RHEL-6 so late in its life time. > Ok that's unfortunate. > > So there's actually 3 bugs here, 2 in 'find' and one in the RHEL6 kernel nfs > > behavior (1248140), so the whole 'find performance' can be quite confusing > > and easily misdiagnosed. I still am not sure what, if anything can be done > > for the RHEL6 kernel behavior change in bug 1248140. > > I am afraid that the kernel change was intentional as the commit message > says: > > NFS: Adapt readdirplus to application usage patterns Right but I'm hoping there's some other patch(es) which may be possible for RHEL6 as a lot has changed in this area but needs further investigation. (In reply to Dave Wysochanski from comment #44) > Have you opened the second bug yet? Created just now: bug #1252549 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0038.html |
Description of problem: find-4.4.2 in RHEL6.2 is too slow with "-type f" option. find-4.2.27 in RHEL5.8 perform almost same time whether with the option or not. Version-Release number of selected component (if applicable): findutils-4.4.2 How reproducible: Always. Steps to Reproduce: 1.Execute the following command with the option. ~~~ # time find /ext4/ext4fs -depth -type f | wc -l 880003 real 0m9.900s user 0m0.920s sys 0m7.433s ~~~ 2. Execute the following command without the option. ~~~ # time find /ext4/ext4fs -depth | wc -l 880017 real 0m0.986s user 0m0.416s sys 0m0.565s ~~~ 3.When using find-4.2.27, the results are With the option. ~~~ # time find_4.2.27 /ext4/ext4fs -depth -type f | wc -l 880003 real 0m0.745s user 0m0.313s sys 0m0.428s ~~~ Without the option. # time find_4.2.27 /ext4/ext4fs -depth | wc -l 880017 real 0m0.738s user 0m0.301s sys 0m0.434s ~~~ Actual results: # time find /ext4/ext4fs -depth -type f | wc -l 880003 real 0m9.900s user 0m0.920s sys 0m7.433s Expected results: Almost same as the time without the option. Additional info: The strace log of both find-4.4.2 and find-4.2.-27 with the option. In find-4.4.2 ~~~ write(1, "/ext4/ext4fs/sub8/f4-7436\n", 26) = 26 newfstatat(AT_FDCWD, "f6-8094", {st_mode=S_IFREG|0644, st_size=1024, ...}, AT_SYMLINK_NOFOLLOW) = 0 write(1, "/ext4/ext4fs/sub8/f6-8094\n", 26) = 26 newfstatat(AT_FDCWD, "f4-5297", {st_mode=S_IFREG|0644, st_size=1024, ...}, AT_SYMLINK_NOFOLLOW) = 0 write(1, "/ext4/ext4fs/sub8/f4-5297\n", 26) = 26 newfstatat(AT_FDCWD, "f7-6402", {st_mode=S_IFREG|0644, st_size=1024, ...}, AT_SYMLINK_NOFOLLOW) = 0 write(1, "/ext4/ext4fs/sub8/f7-6402\n", 26) = 26 newfstatat(AT_FDCWD, "f5-7785", {st_mode=S_IFREG|0644, st_size=1024, ...}, AT_SYMLINK_NOFOLLOW) = 0 write(1, "/ext4/ext4fs/sub8/f5-7785\n", 26) = 26 newfstatat(AT_FDCWD, "f8-2494", {st_mode=S_IFREG|0644, st_size=1024, ...}, AT_SYMLINK_NOFOLLOW) = 0 write(1, "/ext4/ext4fs/sub8/f8-2494\n", 26) = 26 newfstatat(AT_FDCWD, "f4-6043", {st_mode=S_IFREG|0644, st_size=1024, ...}, AT_SYMLINK_NOFOLLOW) = 0 write(1, "/ext4/ext4fs/sub8/f4-6043\n", 26) = 26 ~~~ In find-4.2.27 ~~~ write(1, "/ext4/ext4fs/sub8/f4-7436\n", 26) = 26 write(1, "/ext4/ext4fs/sub8/f6-8094\n", 26) = 26 write(1, "/ext4/ext4fs/sub8/f4-5297\n", 26) = 26 write(1, "/ext4/ext4fs/sub8/f7-6402\n", 26) = 26 write(1, "/ext4/ext4fs/sub8/f5-7785\n", 26) = 26 write(1, "/ext4/ext4fs/sub8/f8-2494\n", 26) = 26 write(1, "/ext4/ext4fs/sub8/f4-6043\n", 26) = 26 ~~~ In find-4.4.2, newfstatat() is called each file. I think it makes the command too slow.