Description of problem: When performance.readdir-ahead is enabled and the "find" command is run on a volume with performance.readdir-ahead large CPU usage is seen for glusterfsd processes. $ grep PID -A5 tmp/top.out PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5167 root 20 0 3489m 174m 4276 S 1.9 0.3 5475:33 glusterfsd 19598 root 20 0 47348 6004 2044 S 1.9 0.0 0:04.48 python 23339 root 20 0 15556 1648 812 R 1.9 0.0 0:00.02 top 1 root 20 0 19360 1496 1224 S 0.0 0.0 0:02.27 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd -- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5118 root 20 0 3560m 201m 4304 S 17.2 0.3 2644:26 glusterfsd 5103 root 20 0 3320m 211m 4260 S 1.7 0.3 3924:37 glusterfsd 5095 root 20 0 2899m 85m 4244 S 1.3 0.1 24:39.20 glusterfsd 5167 root 20 0 3489m 174m 4276 S 1.3 0.3 5475:33 glusterfsd 5089 root 20 0 2889m 104m 4240 S 1.0 0.2 23:50.49 glusterfsd -- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5118 root 20 0 3560m 201m 4304 S 776.5 0.3 2644:50 glusterfsd 5128 root 20 0 3526m 207m 4308 S 53.2 0.3 2515:44 glusterfsd 23339 root 20 0 15564 1772 904 R 1.3 0.0 0:00.08 top 5103 root 20 0 3320m 211m 4260 S 1.0 0.3 3924:37 glusterfsd 5175 root 20 0 3140m 169m 4260 S 0.7 0.3 5129:07 glusterfsd -- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5118 root 20 0 3560m 201m 4304 S 1605.0 0.3 2645:39 glusterfsd 5128 root 20 0 3526m 207m 4308 S 287.6 0.3 2515:53 glusterfsd 5103 root 20 0 3320m 211m 4260 S 2.0 0.3 3924:38 glusterfsd 23339 root 20 0 15564 1772 904 R 1.6 0.0 0:00.13 top 8181 vdsm 0 -20 1778m 49m 9800 S 1.3 0.1 47:44.60 vdsm -- PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5118 root 20 0 3560m 201m 4304 S 1549.7 0.3 2646:27 glusterfsd 5128 root 20 0 3526m 207m 4308 S 349.2 0.3 2516:04 glusterfsd 23403 root 20 0 95272 14m 6328 R 3.6 0.0 0:00.11 gdb 23339 root 20 0 15564 1772 904 R 1.3 0.0 0:00.17 top 5103 root 20 0 3320m 211m 4260 S 1.0 0.3 3924:38 glusterfsd The perf output shows the following. 39.18% [kernel] [k] _spin_lock 6.81% [kernel] [k] __link_path_walk 6.65% [kernel] [k] _atomic_dec_and_lock 6.37% [kernel] [k] __d_lookup This looks like it could be spin_lock contention when doing directory lookups and that makes sense when looking at the bulk of the working threads in the stack traces. Many show the following stack. Thread 17 (Thread 0x7ff688ad5700 (LWP 4061)): #0 0x00007ff7465df6fa in lgetxattr () from /lib64/libc.so.6 #1 0x00007ff73a483193 in ?? () from /usr/lib64/glusterfs/3.7.1/xlator/storage/posix.so #2 0x00007ff73a483da1 in ?? () from /usr/lib64/glusterfs/3.7.1/xlator/storage/posix.so #3 0x00007ff747b82234 in dict_foreach_match () from /usr/lib64/libglusterfs.so.0 #4 0x00007ff747b822e8 in dict_foreach () from /usr/lib64/libglusterfs.so.0 #5 0x00007ff73a483723 in posix_xattr_fill () from /usr/lib64/glusterfs/3.7.1/xlator/storage/posix.so #6 0x00007ff73a46948b in posix_entry_xattr_fill () from /usr/lib64/glusterfs/3.7.1/xlator/storage/posix.so #7 0x00007ff73a47cedf in posix_readdirp_fill () from /usr/lib64/glusterfs/3.7.1/xlator/storage/posix.so #8 0x00007ff73a47d463 in posix_do_readdir () from /usr/lib64/glusterfs/3.7.1/xlator/storage/posix.so #9 0x00007ff73a47f03e in posix_readdirp () from /usr/lib64/glusterfs/3.7.1/xlator/storage/posix.so #10 0x00007ff747b926f3 in default_readdirp () from /usr/lib64/libglusterfs.so.0 #11 0x00007ff747b926f3 in default_readdirp () from /usr/lib64/libglusterfs.so.0 #12 0x00007ff747b926f3 in default_readdirp () from /usr/lib64/libglusterfs.so.0 #13 0x00007ff738f44a44 in br_stub_readdirp () from /usr/lib64/glusterfs/3.7.1/xlator/features/bitrot-stub.so #14 0x00007ff738d3776d in posix_acl_readdirp () from /usr/lib64/glusterfs/3.7.1/xlator/features/access-control.so #15 0x00007ff738b23755 in pl_readdirp () from /usr/lib64/glusterfs/3.7.1/xlator/features/locks.so #16 0x00007ff738909ea7 in up_readdirp () from /usr/lib64/glusterfs/3.7.1/xlator/features/upcall.so #17 0x00007ff747b95003 in default_readdirp_resume () from /usr/lib64/libglusterfs.so.0 #18 0x00007ff747bb5640 in call_resume () from /usr/lib64/libglusterfs.so.0 #19 0x00007ff738701541 in iot_worker () from /usr/lib64/glusterfs/3.7.1/xlator/performance/io-threads.so #20 0x00007ff746c75a51 in start_thread () from /lib64/libpthread.so.0 #21 0x00007ff7465df9ad in clone () from /lib64/libc.so.6 Note that these threads are doing readdir calls and that this sort of activity in the kernel is expected for readdir calls on large directories however, the amount of CPU usage does seem a little excessive. When setting performance.readdir-ahead off the the CPU spikes are not seen Version-Release number of selected component (if applicable): glusterfs-3.7.1-11.el6rhs.x86_64
With the introduction of option "rda-cache-limit", i do not see this anymore. I think the BZ can be closed as fixed in 3.3/3.2?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3249