Created attachment 1327513 [details] ganesha_gstack_output_qe_setup
Created attachment 1327514 [details] ganesha_top_output_qe_setup
At-least from QE setup, since the high CPU usage seems to be from upcall polling, this particular issue shall be addressed as part of bug1485836
Yes, please install the -debuginfo rpm(s) so that the next core and backtrace will be more helpful. (You don't need me on-line with you to do that, right?)
Ganesha does not need to be restarted after debuginfo is installed. debuginfo is just for gdb, not for ganesha itself.
Thats right Raghavendra. If we match the thread IDs (consuming CPU) in http://collab-shell.usersys.redhat.com/01909671/x-image/sh_remote_session_26092017.png against the ones in gdb stack (http://collab-shell.usersys.redhat.com/01909671/x-text/ganeshagdb.log), they are all upcall threads. That confirms our theory that upcall polling is adding up to high CPU usage even on customer environment. This is being addressed as part of bug1485836. We need to support on glusterfs to provide APIs to be able to register for asynchronous upcalls to be sent to application. Post that those APIs need to be consumed by ganesha. Changes needed in gluster are sent upstream for review - https://review.gluster.org/#/c/18349/
Based on comment #80. We(GSS) will make sure that customer doesn't upgrade to 3.3.1. @Damian: Please make sure this information is conveyed well to the customer.
Accidently cleared needinfo on Rahul. Setting it back.
Tested setting UP_CALL thread to 100ms i.e 100000 micro seconds. -------------------- Attempt 1st: -------------------- Ganesha got crashed on one of the node from which the volumes were mounted.Unable to collect the cores.But following message were observed in /var/log/ganesha.log- 27/11/2017 21:12:32 : epoch c4210000 : rhs-gp-srv3.lab.eng.blr.redhat.com : ganesha.nfsd-27589[work-197] nfs_dupreq_free_dupreq :RW LOCK :CRIT :Error 16, Destroy mutex 0x7fbaaa528f20 (&dv->mtx) at /builddir/build/BUILD/nfs-ganesha-2.4.1/src/RPCAL/nfs_dupreq.c:788 /var/log/messages Nov 27 21:29:13 rhs-gp-srv3 abrt-hook-ccpp: Failed to create core_backtrace: waitpid failed: No child processes Nov 27 21:29:13 rhs-gp-srv3 abrt-server: Package 'nfs-ganesha' isn't signed with proper key Nov 27 21:29:13 rhs-gp-srv3 abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2017-11-27-21:12:32-27589' exited with 1 Nov 27 21:29:13 rhs-gp-srv3 abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2017-11-27-21:12:32-27589' Nov 27 21:29:15 rhs-gp-srv3 systemd: nfs-ganesha.service: main process exited, code=killed, status=6/ABRT Nov 27 21:29:15 rhs-gp-srv3 systemd: Unit nfs-ganesha.service entered failed state. Nov 27 21:29:15 rhs-gp-srv3 systemd: nfs-ganesha.service failed. ------------------ Attempt 2nd ------------------ Test got completed successfully.Post workload CPU utilization in idle setup was almost 0-2% Memory utilization ~33GB after 12 hrs of test on idle setup One node from which the Volumes were mounted was in fail over state.Ganesha service is up and running. Pacemaker service has the following failures "rror: Result of start operation for rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1 on rhs-gp-srv3.lab.eng.blr.redhat.com: Timed Out" ------ [root@rhs-gp-srv3 tmp]# service pacemaker status Redirecting to /bin/systemctl status pacemaker.service ● pacemaker.service - Pacemaker High Availability Cluster Manager Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled) Active: active (running) since Mon 2017-11-27 15:39:10 IST; 20h ago Docs: man:pacemakerd http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html Main PID: 16261 (pacemakerd) CGroup: /system.slice/pacemaker.service ├─16261 /usr/sbin/pacemakerd -f ├─16262 /usr/libexec/pacemaker/cib ├─16263 /usr/libexec/pacemaker/stonithd ├─16264 /usr/libexec/pacemaker/lrmd ├─16265 /usr/libexec/pacemaker/attrd ├─16266 /usr/libexec/pacemaker/pengine └─16267 /usr/libexec/pacemaker/crmd Nov 28 03:44:27 rhs-gp-srv3.lab.eng.blr.redhat.com lrmd[16264]: notice: rhs-gp-srv3.lab.eng.blr.redhat.com-nfs_unblock_stop_0:4286:stderr [ 0+1 records out ] Nov 28 03:44:27 rhs-gp-srv3.lab.eng.blr.redhat.com lrmd[16264]: notice: rhs-gp-srv3.lab.eng.blr.redhat.com-nfs_unblock_stop_0:4286:stderr [ 199 bytes (199 B) copied, 1.42317 s, 0.1 kB/s ] Nov 28 03:44:27 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]: notice: Result of stop operation for rhs-gp-srv3.lab.eng.blr.redhat.com-nfs_unblock on rhs-gp-srv3.lab.eng.blr.redhat.com: 0 (ok) Nov 28 03:44:30 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]: notice: Result of start operation for nfs-mon on rhs-gp-srv3.lab.eng.blr.redhat.com: 0 (ok) Nov 28 03:44:33 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]: notice: Result of stop operation for rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1 on rhs-gp-srv3.lab.eng.blr.redhat.com: 0 (ok) Nov 28 03:45:12 rhs-gp-srv3.lab.eng.blr.redhat.com lrmd[16264]: warning: rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1_start_0 process (PID 4557) timed out Nov 28 03:45:12 rhs-gp-srv3.lab.eng.blr.redhat.com lrmd[16264]: warning: rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1_start_0:4557 - timed out after 20000ms Nov 28 03:45:12 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]: error: Result of start operation for rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1 on rhs-gp-srv3.lab.eng.blr.redhat.com: Timed Out Nov 28 03:45:13 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]: notice: Result of stop operation for rhs-gp-srv3.lab.eng.blr.redhat.com-cluster_ip-1 on rhs-gp-srv3.lab.eng.blr.redhat.com: 0 (ok) Nov 28 03:45:13 rhs-gp-srv3.lab.eng.blr.redhat.com crmd[16267]: notice: Result of stop operation for rhs-gp-srv3.lab.eng.blr.redhat.com-nfs_block on rhs-gp-srv3.lab.eng.blr.redhat.com: 0 (ok) ------- I will trigger the same run with default threads to compare CPU and memory utilization.
For the 1st attempt failure ganesha messages, 27/11/2017 21:12:32 : epoch c4210000 : rhs-gp-srv3.lab.eng.blr.redhat.com : ganesha.nfsd-27589[work-197] nfs_dupreq_free_dupreq :RW LOCK :CRIT :Error 16, Destroy mutex 0x7fbaaa528f20 (&dv->mtx) at /builddir/build/BUILD/nfs-ganesha-2.4.1/src/RPCAL/nfs_dupreq.c:788 I can see following Bug raised- https://bugzilla.redhat.com/show_bug.cgi?id=1394717#c4. Not sure if its completely same as I was unable to correct cores.
The dupreq crash seems likely to be this on upstream next: commit 9cf5429fda1bfd07d98c8807d51754aadb26d8a0 Author: Malahal Naineni <malahal.com> AuthorDate: Wed Jan 25 09:52:31 2017 +0530 Commit: Malahal Naineni <malahal.com> CommitDate: Thu Jan 26 08:41:00 2017 +0530 Fix use after free of a dupreq
Tested with default threads- --------------- Attempt 1st --------------- Ganesha got crashed with defaults threads as well on one of the node from which the volumes were mounted. This time I was able to collect the cores. Following same message were observed in /var/log/ganesha.log- 28/11/2017 20:55:51 : epoch c4210000 : rhs-gp-srv3.lab.eng.blr.redhat.com : ganesha.nfsd-30259[work-221] nfs_dupreq_free_dupreq :RW LOCK :CRIT :Error 16, Destroy mutex 0x7f94d6419f50 (&dv->mtx) at /builddir/build/BUILD/nfs-ganesha-2.4.1/src/RPCAL/nfs_dupreq.c:788 (gdb) bt #0 0x00007f97143d01d7 in raise () from /lib64/libc.so.6 #1 0x00007f97143d18c8 in abort () from /lib64/libc.so.6 #2 0x00007f97168d27f6 in nfs_dupreq_finish (req=req@entry=0x7f95e63f4168, res_nfs=res_nfs@entry=0x7f8fa11abf80) at /usr/src/debug/nfs-ganesha-2.4.1/src/RPCAL/nfs_dupreq.c:1092 #3 0x00007f9716858067 in nfs_rpc_execute (reqdata=reqdata@entry=0x7f95e63f4140) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1358 #4 0x00007f971685955a in worker_run (ctx=0x7f9716e99430) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548 #5 0x00007f97168e32f9 in fridgethr_start_routine (arg=0x7f9716e99430) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550 #6 0x00007f9714dc3dc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f971449273d in clone () from /lib64/libc.so.6 I will attach the full core dump file. --------------------- 2nd Attempt --------------------- Test got completed successfully.CPU Utilization on idle setup ~292.3% Memory utilization on post 12 hrs of workload ~30GB.
I think you mean the above results are with default up_poll_interval (10us). From the above results, we can summarize as below - For this particular testcase (i.e, Create Ganesha cluster, Create 40 volumes , Export all the volumes via NFS-Ganesha , 6 clients (physical machines) in total, 3 client having 20 volumes mounted and 3 client having another 20 volumes mounted, The workload is linux untars from 6 clients, Keep monitoring the CPU and Memory Utilization) - with default up_poll_interval (10us) - CPU (~292.3%), Memory usage (~30GB) - with up_poll_interval (100us) - CPU (~159%), Memory usage (~34GB) - with up_poll_interval (100000us/100ms) - CPU (~2%), Memory usage (~33GB) Hence we can confirm that as we increase poll_interval, the CPU usage has decreased. The memory consumption observed is not related to upcalls or this particular fix. I request to file a separate bug if this usage is not normal compared to other access protocols. wrt crash observed while testing, as commented by Dan above, we have a fix available as part of rebase done in the upcoming 3.4 release. Please test the same use-case with the latest bits available and file a separate bug if the issue still persists. Thanks!
*** Bug 1559725 has been marked as a duplicate of this bug. ***
Verified this with # rpm -qa | grep ganesha nfs-ganesha-gluster-2.5.5-10.el7rhgs.x86_64 nfs-ganesha-debuginfo-2.5.5-10.el7rhgs.x86_64 nfs-ganesha-2.5.5-10.el7rhgs.x86_64 glusterfs-ganesha-3.12.2-16.el7rhgs.x86_64 Steps performed for verification 1.Create 6 node ganesha cluster 2.Created 40 Distributed-Replicate 2*3 Volumes 3.Exported the volume via ganesha in loop 4.Edited the volume export file for all the volumes and set upcall thread to 100000 micro seconds --- EXPORT{ Export_Id= 11 ; Path = "/mani10"; FSAL { name = GLUSTER; hostname="localhost"; volume="mani10"; up_poll_usec = 100000; } Access_type = RW; Disable_ACL = true; Squash="No_root_squash"; Pseudo="/mani10"; Protocols = "3", "4" ; Transports = "UDP","TCP"; SecType = "sys"; } --- 5.Performed refresh-config in loop on all the volumes 6.Check CPU utilization Default up_call value : ~ 256 % After changing upcall value in export file : ~ 0.7% 7. Mounted 20 volumes to 3 clients and another 20 volumes to other 3 clients via same VIP (v4) 8.Create 3 directories in each volume mount on all 40 volumes 9. Trigger linux untars from all 6 clients in different directories of volumes Vol1:Client1:Dir1 -> Linux untar vol1:Client2:Dir2 -> Linux untar vol1:Client3:Dir3 -> Linux untar Perform the above IO pattern for all 40 volumes. While IO's were in process CPU utilization reached to 740%. Post IO's are stopped on mount point ,cpu utilization was ~ 1.0 % Moving this BZ to verified state.
doc text looks good to me
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2610