Description of problem: ------------------------ Exported 100 Gluster volumes (Erasure Coded , shouldn't matter though) via Ganesha. None of those are mounted on a client => No I/O. I see Ganesha hogging up ~800% CPU on all my nodes. <snip> [root@gqas007 ~]# top -p 29828 top - 02:44:43 up 20:37, 1 user, load average: 11.38, 11.45, 12.80 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie %Cpu(s): 9.5 us, 20.1 sy, 0.0 ni, 70.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 49278508 total, 32934488 free, 14300084 used, 2043936 buff/cache KiB Swap: 24772604 total, 24772604 free, 0 used. 34189800 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 29828 root 20 0 23.4g 1.3g 5416 S 866.7 2.7 8237:40 ganesha.nfsd </snip> Unsure if this is a regression. Version-Release number of selected component (if applicable): -------------------------------------------------------------- glusterfs-ganesha-3.12.2-5.el7rhgs.x86_64 nfs-ganesha-gluster-2.5.5-3.el7rhgs.x86_64 How reproducible: ------------------ 2/2
I see > 1000 threads for NFS Epoll has been bumped up to 4. (gdb) t a a bt Thread 1279 (Thread 0x7eff32df0700 (LWP 29830)): #0 0x00007eff35d73cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000556199af8eb4 in fridgethr_freeze (thr_ctx=0x55619b821700, fr=0x55619b821590) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:416 #2 fridgethr_start_routine (arg=0x55619b821700) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:554 #3 0x00007eff35d6fdd5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007eff3543bb3d in clone () from /lib64/libc.so.6 Thread 1278 (Thread 0x7eff325ef700 (LWP 29831)): #0 0x00007eff35d73cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000556199af8eb4 in fridgethr_freeze (thr_ctx=0x55619b821be0, fr=0x55619b821590) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:416 #2 fridgethr_start_routine (arg=0x55619b821be0) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:554 #3 0x00007eff35d6fdd5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007eff3543bb3d in clone () from /lib64/libc.so.6 Thread 1277 (Thread 0x7eff31dae700 (LWP 29832)): #0 0x00007eff35d73cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000556199af8eb4 in fridgethr_freeze (thr_ctx=0x55619b823000, fr=0x55619b822e90) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:416 #2 fridgethr_start_routine (arg=0x55619b823000) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:554 #3 0x00007eff35d6fdd5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007eff3543bb3d in clone () from /lib64/libc.so.6 Thread 1276 (Thread 0x7eff375d8700 (LWP 29836)): #0 0x00007eff35d73cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007eff359499d9 in work_pool_thread () from /lib64/libntirpc.so.1.5 #2 0x00007eff35d6fdd5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007eff3543bb3d in clone () from /lib64/libc.so.6
Isn't there a poll thread per volume for Gluster? That would then be 100 threads polling at high frequency. For these circumstances, you'll need to turn down the poll rate.
Ok so after bumping up the polling interval to 10000 , I see a drastic drop in CPU% by NFS process (~40%) My use case and symptoms are similar to what's mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1481040 I think this bug can be closed as a DUPE of https://bugzilla.redhat.com/show_bug.cgi?id=1481040.
Looks like that's correct.