Bug 1559725

Summary: [Ganesha] : Ganesha hogs up ~800% CPU with a 100 passive exports.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ambarish <asoman>
Component: nfs-ganeshaAssignee: Kaleb KEITHLEY <kkeithle>
Status: CLOSED DUPLICATE QA Contact: Manisha Saini <msaini>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: amukherj, asoman, bturner, dang, ffilz, jthottan, kkeithle, mbenjamin, rhinduja, rhs-bugs, skoduri, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-03 04:54:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ambarish 2018-03-23 07:00:51 UTC
Description of problem:
------------------------

Exported 100 Gluster volumes (Erasure Coded , shouldn't matter though) via Ganesha.

None of those are mounted on a  client => No I/O.

I see Ganesha hogging up ~800% CPU on all my nodes.

<snip>


[root@gqas007 ~]# top -p 29828

top - 02:44:43 up 20:37,  1 user,  load average: 11.38, 11.45, 12.80
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  9.5 us, 20.1 sy,  0.0 ni, 70.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 49278508 total, 32934488 free, 14300084 used,  2043936 buff/cache
KiB Swap: 24772604 total, 24772604 free,        0 used. 34189800 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                          
29828 root      20   0   23.4g   1.3g   5416 S 866.7  2.7   8237:40 ganesha.nfsd 


</snip>

Unsure if this is a regression.


Version-Release number of selected component (if applicable):
--------------------------------------------------------------

glusterfs-ganesha-3.12.2-5.el7rhgs.x86_64
nfs-ganesha-gluster-2.5.5-3.el7rhgs.x86_64

How reproducible:
------------------

2/2

Comment 2 Ambarish 2018-03-23 07:06:37 UTC
I see > 1000 threads for NFS 

Epoll has been bumped up to 4.



(gdb) t a a bt

Thread 1279 (Thread 0x7eff32df0700 (LWP 29830)):
#0  0x00007eff35d73cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000556199af8eb4 in fridgethr_freeze (thr_ctx=0x55619b821700, fr=0x55619b821590) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:416
#2  fridgethr_start_routine (arg=0x55619b821700) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:554
#3  0x00007eff35d6fdd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007eff3543bb3d in clone () from /lib64/libc.so.6

Thread 1278 (Thread 0x7eff325ef700 (LWP 29831)):
#0  0x00007eff35d73cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000556199af8eb4 in fridgethr_freeze (thr_ctx=0x55619b821be0, fr=0x55619b821590) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:416
#2  fridgethr_start_routine (arg=0x55619b821be0) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:554
#3  0x00007eff35d6fdd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007eff3543bb3d in clone () from /lib64/libc.so.6

Thread 1277 (Thread 0x7eff31dae700 (LWP 29832)):
#0  0x00007eff35d73cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000556199af8eb4 in fridgethr_freeze (thr_ctx=0x55619b823000, fr=0x55619b822e90) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:416
#2  fridgethr_start_routine (arg=0x55619b823000) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:554
#3  0x00007eff35d6fdd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007eff3543bb3d in clone () from /lib64/libc.so.6

Thread 1276 (Thread 0x7eff375d8700 (LWP 29836)):
#0  0x00007eff35d73cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff359499d9 in work_pool_thread () from /lib64/libntirpc.so.1.5
#2  0x00007eff35d6fdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007eff3543bb3d in clone () from /lib64/libc.so.6

Comment 5 Daniel Gryniewicz 2018-03-23 13:08:17 UTC
Isn't there a poll thread per volume for Gluster?  That would then be 100 threads polling at high frequency.  For these circumstances, you'll need to turn down the poll rate.

Comment 10 Ambarish 2018-04-02 11:54:45 UTC
Ok so after bumping up the polling interval to 10000 , I see a drastic drop in CPU% by NFS process (~40%)

My use case and symptoms are similar to what's mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1481040

I think this bug can be closed as a DUPE of https://bugzilla.redhat.com/show_bug.cgi?id=1481040.

Comment 11 Daniel Gryniewicz 2018-04-02 13:01:03 UTC
Looks like that's correct.