Bug 1559725

Summary:	[Ganesha] : Ganesha hogs up ~800% CPU with a 100 passive exports.
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Ambarish <asoman>
Component:	nfs-ganesha	Assignee:	Kaleb KEITHLEY <kkeithle>
Status:	CLOSED DUPLICATE	QA Contact:	Manisha Saini <msaini>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.4	CC:	amukherj, asoman, bturner, dang, ffilz, jthottan, kkeithle, mbenjamin, rhinduja, rhs-bugs, skoduri, storage-qa-internal
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-04-03 04:54:00 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ambarish 2018-03-23 07:00:51 UTC

Description of problem:
------------------------

Exported 100 Gluster volumes (Erasure Coded , shouldn't matter though) via Ganesha.

None of those are mounted on a  client => No I/O.

I see Ganesha hogging up ~800% CPU on all my nodes.

<snip>


[root@gqas007 ~]# top -p 29828

top - 02:44:43 up 20:37,  1 user,  load average: 11.38, 11.45, 12.80
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  9.5 us, 20.1 sy,  0.0 ni, 70.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 49278508 total, 32934488 free, 14300084 used,  2043936 buff/cache
KiB Swap: 24772604 total, 24772604 free,        0 used. 34189800 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                          
29828 root      20   0   23.4g   1.3g   5416 S 866.7  2.7   8237:40 ganesha.nfsd 


</snip>

Unsure if this is a regression.


Version-Release number of selected component (if applicable):
--------------------------------------------------------------

glusterfs-ganesha-3.12.2-5.el7rhgs.x86_64
nfs-ganesha-gluster-2.5.5-3.el7rhgs.x86_64

How reproducible:
------------------

2/2

Comment 2 Ambarish 2018-03-23 07:06:37 UTC

I see > 1000 threads for NFS 

Epoll has been bumped up to 4.



(gdb) t a a bt

Thread 1279 (Thread 0x7eff32df0700 (LWP 29830)):
#0  0x00007eff35d73cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000556199af8eb4 in fridgethr_freeze (thr_ctx=0x55619b821700, fr=0x55619b821590) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:416
#2  fridgethr_start_routine (arg=0x55619b821700) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:554
#3  0x00007eff35d6fdd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007eff3543bb3d in clone () from /lib64/libc.so.6

Thread 1278 (Thread 0x7eff325ef700 (LWP 29831)):
#0  0x00007eff35d73cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000556199af8eb4 in fridgethr_freeze (thr_ctx=0x55619b821be0, fr=0x55619b821590) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:416
#2  fridgethr_start_routine (arg=0x55619b821be0) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:554
#3  0x00007eff35d6fdd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007eff3543bb3d in clone () from /lib64/libc.so.6

Thread 1277 (Thread 0x7eff31dae700 (LWP 29832)):
#0  0x00007eff35d73cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000556199af8eb4 in fridgethr_freeze (thr_ctx=0x55619b823000, fr=0x55619b822e90) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:416
#2  fridgethr_start_routine (arg=0x55619b823000) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:554
#3  0x00007eff35d6fdd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007eff3543bb3d in clone () from /lib64/libc.so.6

Thread 1276 (Thread 0x7eff375d8700 (LWP 29836)):
#0  0x00007eff35d73cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff359499d9 in work_pool_thread () from /lib64/libntirpc.so.1.5
#2  0x00007eff35d6fdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007eff3543bb3d in clone () from /lib64/libc.so.6

Comment 5 Daniel Gryniewicz 2018-03-23 13:08:17 UTC

Isn't there a poll thread per volume for Gluster?  That would then be 100 threads polling at high frequency.  For these circumstances, you'll need to turn down the poll rate.

Comment 10 Ambarish 2018-04-02 11:54:45 UTC

Ok so after bumping up the polling interval to 10000 , I see a drastic drop in CPU% by NFS process (~40%)

My use case and symptoms are similar to what's mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1481040

I think this bug can be closed as a DUPE of https://bugzilla.redhat.com/show_bug.cgi?id=1481040.

Comment 11 Daniel Gryniewicz 2018-04-02 13:01:03 UTC

Looks like that's correct.