Bug 1559725 - [Ganesha] : Ganesha hogs up ~800% CPU with a 100 passive exports.
Summary: [Ganesha] : Ganesha hogs up ~800% CPU with a 100 passive exports.
Keywords:
Status: CLOSED DUPLICATE of bug 1481040
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: nfs-ganesha
Version: rhgs-3.4
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Kaleb KEITHLEY
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-23 07:00 UTC by Ambarish
Modified: 2018-04-03 04:54 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-03 04:54:00 UTC
Embargoed:


Attachments (Terms of Use)

Description Ambarish 2018-03-23 07:00:51 UTC
Description of problem:
------------------------

Exported 100 Gluster volumes (Erasure Coded , shouldn't matter though) via Ganesha.

None of those are mounted on a  client => No I/O.

I see Ganesha hogging up ~800% CPU on all my nodes.

<snip>


[root@gqas007 ~]# top -p 29828

top - 02:44:43 up 20:37,  1 user,  load average: 11.38, 11.45, 12.80
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  9.5 us, 20.1 sy,  0.0 ni, 70.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 49278508 total, 32934488 free, 14300084 used,  2043936 buff/cache
KiB Swap: 24772604 total, 24772604 free,        0 used. 34189800 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                          
29828 root      20   0   23.4g   1.3g   5416 S 866.7  2.7   8237:40 ganesha.nfsd 


</snip>

Unsure if this is a regression.


Version-Release number of selected component (if applicable):
--------------------------------------------------------------

glusterfs-ganesha-3.12.2-5.el7rhgs.x86_64
nfs-ganesha-gluster-2.5.5-3.el7rhgs.x86_64

How reproducible:
------------------

2/2

Comment 2 Ambarish 2018-03-23 07:06:37 UTC
I see > 1000 threads for NFS 

Epoll has been bumped up to 4.



(gdb) t a a bt

Thread 1279 (Thread 0x7eff32df0700 (LWP 29830)):
#0  0x00007eff35d73cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000556199af8eb4 in fridgethr_freeze (thr_ctx=0x55619b821700, fr=0x55619b821590) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:416
#2  fridgethr_start_routine (arg=0x55619b821700) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:554
#3  0x00007eff35d6fdd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007eff3543bb3d in clone () from /lib64/libc.so.6

Thread 1278 (Thread 0x7eff325ef700 (LWP 29831)):
#0  0x00007eff35d73cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000556199af8eb4 in fridgethr_freeze (thr_ctx=0x55619b821be0, fr=0x55619b821590) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:416
#2  fridgethr_start_routine (arg=0x55619b821be0) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:554
#3  0x00007eff35d6fdd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007eff3543bb3d in clone () from /lib64/libc.so.6

Thread 1277 (Thread 0x7eff31dae700 (LWP 29832)):
#0  0x00007eff35d73cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000556199af8eb4 in fridgethr_freeze (thr_ctx=0x55619b823000, fr=0x55619b822e90) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:416
#2  fridgethr_start_routine (arg=0x55619b823000) at /usr/src/debug/nfs-ganesha-2.5.5/src/support/fridgethr.c:554
#3  0x00007eff35d6fdd5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007eff3543bb3d in clone () from /lib64/libc.so.6

Thread 1276 (Thread 0x7eff375d8700 (LWP 29836)):
#0  0x00007eff35d73cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff359499d9 in work_pool_thread () from /lib64/libntirpc.so.1.5
#2  0x00007eff35d6fdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007eff3543bb3d in clone () from /lib64/libc.so.6

Comment 5 Daniel Gryniewicz 2018-03-23 13:08:17 UTC
Isn't there a poll thread per volume for Gluster?  That would then be 100 threads polling at high frequency.  For these circumstances, you'll need to turn down the poll rate.

Comment 10 Ambarish 2018-04-02 11:54:45 UTC
Ok so after bumping up the polling interval to 10000 , I see a drastic drop in CPU% by NFS process (~40%)

My use case and symptoms are similar to what's mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1481040

I think this bug can be closed as a DUPE of https://bugzilla.redhat.com/show_bug.cgi?id=1481040.

Comment 11 Daniel Gryniewicz 2018-04-02 13:01:03 UTC
Looks like that's correct.


Note You need to log in before you can comment on or make changes to this bug.