Bug 1481040 - ganesha.nfsd processes steadily using high CPU on 2 nodes.
ganesha.nfsd processes steadily using high CPU on 2 nodes.
Status: POST
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha (Show other bugs)
3.2
x86_64 Linux
unspecified Severity urgent
: ---
: RHGS 3.4.0
Assigned To: Soumya Koduri
Manisha Saini
:
Depends On: 1485836 1497139
Blocks: 1503134
  Show dependency treegraph
 
Reported: 2017-08-13 15:15 EDT by Cal Calhoun
Modified: 2017-12-08 08:31 EST (History)
25 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
ganesha_gstack_output_qe_setup (301.26 KB, text/plain)
2017-09-18 11:26 EDT, Soumya Koduri
no flags Details
ganesha_top_output_qe_setup (2.66 KB, text/plain)
2017-09-18 11:27 EDT, Soumya Koduri
no flags Details

  None (edit)
Comment 49 Soumya Koduri 2017-09-18 11:26 EDT
Created attachment 1327513 [details]
ganesha_gstack_output_qe_setup
Comment 50 Soumya Koduri 2017-09-18 11:27 EDT
Created attachment 1327514 [details]
ganesha_top_output_qe_setup
Comment 52 Soumya Koduri 2017-09-19 06:11:04 EDT
At-least from QE setup, since the high CPU usage seems to be from upcall polling, this particular issue shall be addressed as part of bug1485836
Comment 57 Kaleb KEITHLEY 2017-09-25 15:20:03 EDT
Yes, please install the -debuginfo rpm(s) so that the next core and backtrace will be more helpful.

(You don't need me on-line with you to do that, right?)
Comment 59 Daniel Gryniewicz 2017-09-26 08:47:46 EDT
Ganesha does not need to be restarted after debuginfo is installed.  debuginfo is just for gdb, not for ganesha itself.
Comment 62 Soumya Koduri 2017-09-26 10:22:37 EDT
Thats right Raghavendra. If we match the thread IDs (consuming CPU) in http://collab-shell.usersys.redhat.com/01909671/x-image/sh_remote_session_26092017.png against the ones in gdb stack (http://collab-shell.usersys.redhat.com/01909671/x-text/ganeshagdb.log), they are all upcall threads. That confirms our theory that upcall polling is adding up to high CPU usage even on customer environment. 

This is being addressed as part of bug1485836. We need to support on glusterfs to provide APIs to be able to register for asynchronous upcalls to be sent to application. Post that those APIs need to be consumed by ganesha.

Changes needed in gluster are sent upstream for review - 
https://review.gluster.org/#/c/18349/
Comment 82 Bipin Kunal 2017-11-02 03:14:10 EDT
Based on comment #80. We(GSS) will make sure that customer doesn't upgrade to 3.3.1.

@Damian: Please make sure this information is conveyed well to the customer.
Comment 83 Bipin Kunal 2017-11-02 03:18:14 EDT
Accidently cleared needinfo on Rahul. Setting it back.
Comment 109 Manisha Saini 2017-11-28 01:53:15 EST
Tested setting UP_CALL thread to 100ms i.e 100000 micro seconds.

--------------------
Attempt 1st:
--------------------
Ganesha got crashed on one of the node from which the volumes were mounted.Unable to collect the cores.But following message were observed in /var/log/ganesha.log-

27/11/2017 21:12:32 : epoch c4210000 : rhs-gp-srv3.lab.eng.blr.redhat.com : ganesha.nfsd-27589[work-197] nfs_dupreq_free_dupreq :RW LOCK :CRIT :Error 16, Destroy mutex 0x7fbaaa528f20 (&dv->mtx) at /builddir/build/BUILD/nfs-ganesha-2.4.1/src/RPCAL/nfs_dupreq.c:788


/var/log/messages

Nov 27 21:29:13 rhs-gp-srv3 abrt-hook-ccpp: Failed to create core_backtrace: waitpid failed: No child processes
Nov 27 21:29:13 rhs-gp-srv3 abrt-server: Package 'nfs-ganesha' isn't signed with proper key
Nov 27 21:29:13 rhs-gp-srv3 abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2017-11-27-21:12:32-27589' exited with 1
Nov 27 21:29:13 rhs-gp-srv3 abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2017-11-27-21:12:32-27589'
Nov 27 21:29:15 rhs-gp-srv3 systemd: nfs-ganesha.service: main process exited, code=killed, status=6/ABRT
Nov 27 21:29:15 rhs-gp-srv3 systemd: Unit nfs-ganesha.service entered failed state.
Nov 27 21:29:15 rhs-gp-srv3 systemd: nfs-ganesha.service failed.

------------------
Attempt 2nd
------------------

Test got completed successfully.Post workload CPU utilization in idle setup was almost 0-2%

Memory utilization ~33GB after 12 hrs of test on idle setup

One node from which the Volumes were mounted was in fail over state.Ganesha service is up and running.
Pacemaker service has the following failures "rror: Result of start operation for rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1 on rhs-gp-srv3.lab.eng.blr.redhat.com: Timed Out"

------

[root@rhs-gp-srv3 tmp]# service pacemaker status
Redirecting to /bin/systemctl status  pacemaker.service
● pacemaker.service - Pacemaker High Availability Cluster Manager
   Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-11-27 15:39:10 IST; 20h ago
     Docs: man:pacemakerd
           http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html
 Main PID: 16261 (pacemakerd)
   CGroup: /system.slice/pacemaker.service
           ├─16261 /usr/sbin/pacemakerd -f
           ├─16262 /usr/libexec/pacemaker/cib
           ├─16263 /usr/libexec/pacemaker/stonithd
           ├─16264 /usr/libexec/pacemaker/lrmd
           ├─16265 /usr/libexec/pacemaker/attrd
           ├─16266 /usr/libexec/pacemaker/pengine
           └─16267 /usr/libexec/pacemaker/crmd

Nov 28 03:44:27 rhs-gp-srv3.lab.eng.blr.redhat.com lrmd[16264]:   notice: rhs-gp-srv3.lab.eng.blr.redhat.com-nfs_unblock_stop_0:4286:stderr [ 0+1 records out ]
Nov 28 03:44:27 rhs-gp-srv3.lab.eng.blr.redhat.com lrmd[16264]:   notice: rhs-gp-srv3.lab.eng.blr.redhat.com-nfs_unblock_stop_0:4286:stderr [ 199 bytes (199 B) copied, 1.42317 s, 0.1 kB/s ]
Nov 28 03:44:27 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]:   notice: Result of stop operation for rhs-gp-srv3.lab.eng.blr.redhat.com-nfs_unblock on rhs-gp-srv3.lab.eng.blr.redhat.com: 0 (ok)
Nov 28 03:44:30 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]:   notice: Result of start operation for nfs-mon on rhs-gp-srv3.lab.eng.blr.redhat.com: 0 (ok)
Nov 28 03:44:33 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]:   notice: Result of stop operation for rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1 on rhs-gp-srv3.lab.eng.blr.redhat.com: 0 (ok)
Nov 28 03:45:12 rhs-gp-srv3.lab.eng.blr.redhat.com lrmd[16264]:  warning: rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1_start_0 process (PID 4557) timed out
Nov 28 03:45:12 rhs-gp-srv3.lab.eng.blr.redhat.com lrmd[16264]:  warning: rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1_start_0:4557 - timed out after 20000ms
Nov 28 03:45:12 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]:    error: Result of start operation for rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1 on rhs-gp-srv3.lab.eng.blr.redhat.com: Timed Out
Nov 28 03:45:13 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]:   notice: Result of stop operation for rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1 on rhs-gp-srv3.lab.eng.blr.redhat.com: 0 (ok)
Nov 28 03:45:13 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]:   notice: Result of stop operation for rhs-gp-srv3.lab.eng.blr.redhat.com-nfs_block on rhs-gp-srv3.lab.eng.blr.redhat.com: 0 (ok)


-------


I will trigger the same run with default threads to compare CPU and memory utilization.
Comment 110 Manisha Saini 2017-11-28 01:55:36 EST
For the 1st attempt failure ganesha messages,

27/11/2017 21:12:32 : epoch c4210000 : rhs-gp-srv3.lab.eng.blr.redhat.com : ganesha.nfsd-27589[work-197] nfs_dupreq_free_dupreq :RW LOCK :CRIT :Error 16, Destroy mutex 0x7fbaaa528f20 (&dv->mtx) at /builddir/build/BUILD/nfs-ganesha-2.4.1/src/RPCAL/nfs_dupreq.c:788

I can see following Bug raised-
https://bugzilla.redhat.com/show_bug.cgi?id=1394717#c4.
Not sure if its completely same as I was unable to correct cores.
Comment 111 Daniel Gryniewicz 2017-11-28 10:36:14 EST
The dupreq crash seems likely to be this on upstream next:

commit 9cf5429fda1bfd07d98c8807d51754aadb26d8a0
Author:     Malahal Naineni <malahal@us.ibm.com>
AuthorDate: Wed Jan 25 09:52:31 2017 +0530
Commit:     Malahal Naineni <malahal@us.ibm.com>
CommitDate: Thu Jan 26 08:41:00 2017 +0530

    Fix use after free of a dupreq
Comment 112 Manisha Saini 2017-11-29 01:01:16 EST
Tested with default threads-

---------------
Attempt 1st
---------------

Ganesha got crashed with defaults threads as well on one of the node from which the volumes were mounted. This time I was able to collect the cores.
Following same message were observed in /var/log/ganesha.log-

28/11/2017 20:55:51 : epoch c4210000 : rhs-gp-srv3.lab.eng.blr.redhat.com : ganesha.nfsd-30259[work-221] nfs_dupreq_free_dupreq :RW LOCK :CRIT :Error 16, Destroy mutex 0x7f94d6419f50 (&dv->mtx) at /builddir/build/BUILD/nfs-ganesha-2.4.1/src/RPCAL/nfs_dupreq.c:788



(gdb) bt
#0  0x00007f97143d01d7 in raise () from /lib64/libc.so.6
#1  0x00007f97143d18c8 in abort () from /lib64/libc.so.6
#2  0x00007f97168d27f6 in nfs_dupreq_finish (req=req@entry=0x7f95e63f4168, res_nfs=res_nfs@entry=0x7f8fa11abf80)
    at /usr/src/debug/nfs-ganesha-2.4.1/src/RPCAL/nfs_dupreq.c:1092
#3  0x00007f9716858067 in nfs_rpc_execute (reqdata=reqdata@entry=0x7f95e63f4140) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1358
#4  0x00007f971685955a in worker_run (ctx=0x7f9716e99430) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548
#5  0x00007f97168e32f9 in fridgethr_start_routine (arg=0x7f9716e99430) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550
#6  0x00007f9714dc3dc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f971449273d in clone () from /lib64/libc.so.6


I will attach the full core dump file.

---------------------
2nd Attempt
---------------------

Test got completed successfully.CPU Utilization on idle setup ~292.3% 
Memory utilization on post 12 hrs of workload ~30GB.
Comment 113 Soumya Koduri 2017-11-29 01:37:42 EST
I think you mean the above results are with default up_poll_interval (10us). From the above results, we can summarize as below -

For this particular testcase (i.e,  Create Ganesha cluster, Create 40 volumes , Export all the volumes via NFS-Ganesha ,  6 clients (physical machines) in total, 3 client having 20 volumes mounted and 3 client having another 20 volumes mounted, The workload is linux untars from 6 clients, Keep monitoring the CPU and Memory Utilization)

- with default up_poll_interval (10us) - CPU (~292.3%), Memory usage (~30GB)
- with up_poll_interval (100us) - CPU (~159%), Memory usage (~34GB)
- with up_poll_interval (100000us/100ms) - CPU (~2%), Memory usage (~33GB)

Hence we can confirm that as we increase poll_interval, the CPU usage has decreased. The memory consumption observed is not related to upcalls or this particular fix. I request to file a separate bug if this usage is not normal compared to other access protocols. 

wrt crash observed while testing, as commented by Dan above, we have a fix available as part of rebase done in the upcoming 3.4 release. Please test the same use-case with the latest bits available and file a separate bug if the issue still persists. Thanks!

Note You need to log in before you can comment on or make changes to this bug.