Bug 1481040 - ganesha.nfsd processes steadily using high CPU on 2 nodes.
Summary: ganesha.nfsd processes steadily using high CPU on 2 nodes.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: RHGS 3.4.0
Assignee: Soumya Koduri
QA Contact: Manisha Saini
URL:
Whiteboard:
: 1559725 (view as bug list)
Depends On: 1485836 1497139 1695072
Blocks: 1503134
TreeView+ depends on / blocked
 
Reported: 2017-08-13 19:15 UTC by Cal Calhoun
Modified: 2019-04-02 12:42 UTC (History)
23 users (show)

Fixed In Version: nfs-ganesha-2.5.5-1
Doc Type: Bug Fix
Doc Text:
Currently, the default interval (in microseconds) between upcall polls is 10 microseconds. For large numbers of threads, this results in lot of CPU consumption. With this fix, the default polling interval has been increased to 100 milliseconds (100000 microseconds) and helps reduce CPU consumption.
Clone Of:
Environment:
Last Closed: 2018-09-04 06:53:36 UTC
Target Upstream Version:


Attachments (Terms of Use)
ganesha_gstack_output_qe_setup (301.26 KB, text/plain)
2017-09-18 15:26 UTC, Soumya Koduri
no flags Details
ganesha_top_output_qe_setup (2.66 KB, text/plain)
2017-09-18 15:27 UTC, Soumya Koduri
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:2610 None None None 2018-09-04 06:54:41 UTC

Comment 49 Soumya Koduri 2017-09-18 15:26:39 UTC
Created attachment 1327513 [details]
ganesha_gstack_output_qe_setup

Comment 50 Soumya Koduri 2017-09-18 15:27:17 UTC
Created attachment 1327514 [details]
ganesha_top_output_qe_setup

Comment 52 Soumya Koduri 2017-09-19 10:11:04 UTC
At-least from QE setup, since the high CPU usage seems to be from upcall polling, this particular issue shall be addressed as part of bug1485836

Comment 57 Kaleb KEITHLEY 2017-09-25 19:20:03 UTC
Yes, please install the -debuginfo rpm(s) so that the next core and backtrace will be more helpful.

(You don't need me on-line with you to do that, right?)

Comment 59 Daniel Gryniewicz 2017-09-26 12:47:46 UTC
Ganesha does not need to be restarted after debuginfo is installed.  debuginfo is just for gdb, not for ganesha itself.

Comment 62 Soumya Koduri 2017-09-26 14:22:37 UTC
Thats right Raghavendra. If we match the thread IDs (consuming CPU) in http://collab-shell.usersys.redhat.com/01909671/x-image/sh_remote_session_26092017.png against the ones in gdb stack (http://collab-shell.usersys.redhat.com/01909671/x-text/ganeshagdb.log), they are all upcall threads. That confirms our theory that upcall polling is adding up to high CPU usage even on customer environment. 

This is being addressed as part of bug1485836. We need to support on glusterfs to provide APIs to be able to register for asynchronous upcalls to be sent to application. Post that those APIs need to be consumed by ganesha.

Changes needed in gluster are sent upstream for review - 
https://review.gluster.org/#/c/18349/

Comment 82 Bipin Kunal 2017-11-02 07:14:10 UTC
Based on comment #80. We(GSS) will make sure that customer doesn't upgrade to 3.3.1.

@Damian: Please make sure this information is conveyed well to the customer.

Comment 83 Bipin Kunal 2017-11-02 07:18:14 UTC
Accidently cleared needinfo on Rahul. Setting it back.

Comment 109 Manisha Saini 2017-11-28 06:53:15 UTC
Tested setting UP_CALL thread to 100ms i.e 100000 micro seconds.

--------------------
Attempt 1st:
--------------------
Ganesha got crashed on one of the node from which the volumes were mounted.Unable to collect the cores.But following message were observed in /var/log/ganesha.log-

27/11/2017 21:12:32 : epoch c4210000 : rhs-gp-srv3.lab.eng.blr.redhat.com : ganesha.nfsd-27589[work-197] nfs_dupreq_free_dupreq :RW LOCK :CRIT :Error 16, Destroy mutex 0x7fbaaa528f20 (&dv->mtx) at /builddir/build/BUILD/nfs-ganesha-2.4.1/src/RPCAL/nfs_dupreq.c:788


/var/log/messages

Nov 27 21:29:13 rhs-gp-srv3 abrt-hook-ccpp: Failed to create core_backtrace: waitpid failed: No child processes
Nov 27 21:29:13 rhs-gp-srv3 abrt-server: Package 'nfs-ganesha' isn't signed with proper key
Nov 27 21:29:13 rhs-gp-srv3 abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2017-11-27-21:12:32-27589' exited with 1
Nov 27 21:29:13 rhs-gp-srv3 abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2017-11-27-21:12:32-27589'
Nov 27 21:29:15 rhs-gp-srv3 systemd: nfs-ganesha.service: main process exited, code=killed, status=6/ABRT
Nov 27 21:29:15 rhs-gp-srv3 systemd: Unit nfs-ganesha.service entered failed state.
Nov 27 21:29:15 rhs-gp-srv3 systemd: nfs-ganesha.service failed.

------------------
Attempt 2nd
------------------

Test got completed successfully.Post workload CPU utilization in idle setup was almost 0-2%

Memory utilization ~33GB after 12 hrs of test on idle setup

One node from which the Volumes were mounted was in fail over state.Ganesha service is up and running.
Pacemaker service has the following failures "rror: Result of start operation for rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1 on rhs-gp-srv3.lab.eng.blr.redhat.com: Timed Out"

------

[root@rhs-gp-srv3 tmp]# service pacemaker status
Redirecting to /bin/systemctl status  pacemaker.service
● pacemaker.service - Pacemaker High Availability Cluster Manager
   Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-11-27 15:39:10 IST; 20h ago
     Docs: man:pacemakerd
           http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html
 Main PID: 16261 (pacemakerd)
   CGroup: /system.slice/pacemaker.service
           ├─16261 /usr/sbin/pacemakerd -f
           ├─16262 /usr/libexec/pacemaker/cib
           ├─16263 /usr/libexec/pacemaker/stonithd
           ├─16264 /usr/libexec/pacemaker/lrmd
           ├─16265 /usr/libexec/pacemaker/attrd
           ├─16266 /usr/libexec/pacemaker/pengine
           └─16267 /usr/libexec/pacemaker/crmd

Nov 28 03:44:27 rhs-gp-srv3.lab.eng.blr.redhat.com lrmd[16264]:   notice: rhs-gp-srv3.lab.eng.blr.redhat.com-nfs_unblock_stop_0:4286:stderr [ 0+1 records out ]
Nov 28 03:44:27 rhs-gp-srv3.lab.eng.blr.redhat.com lrmd[16264]:   notice: rhs-gp-srv3.lab.eng.blr.redhat.com-nfs_unblock_stop_0:4286:stderr [ 199 bytes (199 B) copied, 1.42317 s, 0.1 kB/s ]
Nov 28 03:44:27 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]:   notice: Result of stop operation for rhs-gp-srv3.lab.eng.blr.redhat.com-nfs_unblock on rhs-gp-srv3.lab.eng.blr.redhat.com: 0 (ok)
Nov 28 03:44:30 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]:   notice: Result of start operation for nfs-mon on rhs-gp-srv3.lab.eng.blr.redhat.com: 0 (ok)
Nov 28 03:44:33 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]:   notice: Result of stop operation for rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1 on rhs-gp-srv3.lab.eng.blr.redhat.com: 0 (ok)
Nov 28 03:45:12 rhs-gp-srv3.lab.eng.blr.redhat.com lrmd[16264]:  warning: rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1_start_0 process (PID 4557) timed out
Nov 28 03:45:12 rhs-gp-srv3.lab.eng.blr.redhat.com lrmd[16264]:  warning: rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1_start_0:4557 - timed out after 20000ms
Nov 28 03:45:12 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]:    error: Result of start operation for rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1 on rhs-gp-srv3.lab.eng.blr.redhat.com: Timed Out
Nov 28 03:45:13 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]:   notice: Result of stop operation for rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1 on rhs-gp-srv3.lab.eng.blr.redhat.com: 0 (ok)
Nov 28 03:45:13 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]:   notice: Result of stop operation for rhs-gp-srv3.lab.eng.blr.redhat.com-nfs_block on rhs-gp-srv3.lab.eng.blr.redhat.com: 0 (ok)


-------


I will trigger the same run with default threads to compare CPU and memory utilization.

Comment 110 Manisha Saini 2017-11-28 06:55:36 UTC
For the 1st attempt failure ganesha messages,

27/11/2017 21:12:32 : epoch c4210000 : rhs-gp-srv3.lab.eng.blr.redhat.com : ganesha.nfsd-27589[work-197] nfs_dupreq_free_dupreq :RW LOCK :CRIT :Error 16, Destroy mutex 0x7fbaaa528f20 (&dv->mtx) at /builddir/build/BUILD/nfs-ganesha-2.4.1/src/RPCAL/nfs_dupreq.c:788

I can see following Bug raised-
https://bugzilla.redhat.com/show_bug.cgi?id=1394717#c4.
Not sure if its completely same as I was unable to correct cores.

Comment 111 Daniel Gryniewicz 2017-11-28 15:36:14 UTC
The dupreq crash seems likely to be this on upstream next:

commit 9cf5429fda1bfd07d98c8807d51754aadb26d8a0
Author:     Malahal Naineni <malahal@us.ibm.com>
AuthorDate: Wed Jan 25 09:52:31 2017 +0530
Commit:     Malahal Naineni <malahal@us.ibm.com>
CommitDate: Thu Jan 26 08:41:00 2017 +0530

    Fix use after free of a dupreq

Comment 112 Manisha Saini 2017-11-29 06:01:16 UTC
Tested with default threads-

---------------
Attempt 1st
---------------

Ganesha got crashed with defaults threads as well on one of the node from which the volumes were mounted. This time I was able to collect the cores.
Following same message were observed in /var/log/ganesha.log-

28/11/2017 20:55:51 : epoch c4210000 : rhs-gp-srv3.lab.eng.blr.redhat.com : ganesha.nfsd-30259[work-221] nfs_dupreq_free_dupreq :RW LOCK :CRIT :Error 16, Destroy mutex 0x7f94d6419f50 (&dv->mtx) at /builddir/build/BUILD/nfs-ganesha-2.4.1/src/RPCAL/nfs_dupreq.c:788



(gdb) bt
#0  0x00007f97143d01d7 in raise () from /lib64/libc.so.6
#1  0x00007f97143d18c8 in abort () from /lib64/libc.so.6
#2  0x00007f97168d27f6 in nfs_dupreq_finish (req=req@entry=0x7f95e63f4168, res_nfs=res_nfs@entry=0x7f8fa11abf80)
    at /usr/src/debug/nfs-ganesha-2.4.1/src/RPCAL/nfs_dupreq.c:1092
#3  0x00007f9716858067 in nfs_rpc_execute (reqdata=reqdata@entry=0x7f95e63f4140) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1358
#4  0x00007f971685955a in worker_run (ctx=0x7f9716e99430) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548
#5  0x00007f97168e32f9 in fridgethr_start_routine (arg=0x7f9716e99430) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550
#6  0x00007f9714dc3dc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f971449273d in clone () from /lib64/libc.so.6


I will attach the full core dump file.

---------------------
2nd Attempt
---------------------

Test got completed successfully.CPU Utilization on idle setup ~292.3% 
Memory utilization on post 12 hrs of workload ~30GB.

Comment 113 Soumya Koduri 2017-11-29 06:37:42 UTC
I think you mean the above results are with default up_poll_interval (10us). From the above results, we can summarize as below -

For this particular testcase (i.e,  Create Ganesha cluster, Create 40 volumes , Export all the volumes via NFS-Ganesha ,  6 clients (physical machines) in total, 3 client having 20 volumes mounted and 3 client having another 20 volumes mounted, The workload is linux untars from 6 clients, Keep monitoring the CPU and Memory Utilization)

- with default up_poll_interval (10us) - CPU (~292.3%), Memory usage (~30GB)
- with up_poll_interval (100us) - CPU (~159%), Memory usage (~34GB)
- with up_poll_interval (100000us/100ms) - CPU (~2%), Memory usage (~33GB)

Hence we can confirm that as we increase poll_interval, the CPU usage has decreased. The memory consumption observed is not related to upcalls or this particular fix. I request to file a separate bug if this usage is not normal compared to other access protocols. 

wrt crash observed while testing, as commented by Dan above, we have a fix available as part of rebase done in the upcoming 3.4 release. Please test the same use-case with the latest bits available and file a separate bug if the issue still persists. Thanks!

Comment 126 Jiffin 2018-04-03 04:54:00 UTC
*** Bug 1559725 has been marked as a duplicate of this bug. ***

Comment 128 Manisha Saini 2018-08-22 15:15:41 UTC
Verified this with

# rpm -qa | grep ganesha
nfs-ganesha-gluster-2.5.5-10.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.5.5-10.el7rhgs.x86_64
nfs-ganesha-2.5.5-10.el7rhgs.x86_64
glusterfs-ganesha-3.12.2-16.el7rhgs.x86_64


Steps performed for verification
1.Create 6 node ganesha cluster
2.Created 40 Distributed-Replicate 2*3 Volumes
3.Exported the volume via ganesha in loop
4.Edited the volume export file for all the volumes and set upcall thread to 100000 micro seconds

---
EXPORT{
      Export_Id= 11 ;
      Path = "/mani10";
      FSAL {
           name = GLUSTER;
           hostname="localhost";
          volume="mani10";
          up_poll_usec = 100000;
           }
      Access_type = RW;
      Disable_ACL = true;
      Squash="No_root_squash";
      Pseudo="/mani10";
      Protocols = "3", "4" ;
      Transports = "UDP","TCP";
      SecType = "sys";
     }
---

5.Performed refresh-config in loop on all the volumes
6.Check CPU utilization

Default up_call value : ~ 256 %
After changing upcall value in export file : ~ 0.7%

7. Mounted 20 volumes to 3 clients and another 20 volumes to other 3 clients via same VIP (v4)

8.Create 3 directories in each volume mount on all 40 volumes
9. Trigger linux untars from all 6 clients in different directories of volumes

Vol1:Client1:Dir1 -> Linux untar
vol1:Client2:Dir2 -> Linux untar
vol1:Client3:Dir3 -> Linux untar

Perform the above IO pattern for all 40 volumes.

While IO's were in process CPU utilization reached to 740%.

Post IO's are stopped on mount point ,cpu utilization was ~  1.0 % 

Moving this BZ to verified state.

Comment 130 Kaleb KEITHLEY 2018-08-31 13:52:23 UTC
doc text looks good to me

Comment 132 errata-xmlrpc 2018-09-04 06:53:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2610


Note You need to log in before you can comment on or make changes to this bug.